If I were designing Python's import from scratch
Talk to any developer that inherits some large, old code base that has developed semantics as time has gone on and they will always have something they wished they could change about the code they inherited. After inheriting import
in Python, I too have a list of things I would love to see changed in how it works to make it a bit more sane and easier to work with. This blog post is basically a brain dump/wishlist of what I would love to see changed in import some day.
No global state
As import
currently stands, all of its state is stored in the sys
module. This makes growing the API rather painful as it means expanding a module's API surface rather than adding another attribute on an object. For me, I would rather have import
be a fully self-contained object that stored all of its own state.
This has been proposed before in PEP 406 and under the name of "import engine". It unfortunately has not gone anywhere simply due to the fact that it would take time to design the API for a fully encapsulated import
class and it doesn't buy people anything today. Now in the future it could open up some unique possibilities for import
itself -- which will be discussed later -- as well as simply be cleaner to maintain as it would allow for cleaner separation between interpreters in a single process.
Making this actually happen would occur over stages. A new ImportEngine
class would created which would define the API we wished import
would have from scratch. That API would then delegate under the hood to the sys
module so that semantics stayed the same, including making instances of the class callable and assigning such an instance to builtins.__import__
. At some point the objects that were stored in the instance of builtins.__import__
would be set in the sys
module instead of the object delegating to the sys
module itself. After a proper amount of time, once everyone had moved over to using the object's API instead of the sys
module then we could consider cutting out the import-related parts from the sys
module.
Make __import__
more sane
In case you didn't know, the signature for builtins.__import__()
is a bit nuts:
def __import__(name, globals=None, locals=None, fromlist=(), level=0): pass
The locals
argument isn't used. The globals
argument is only used for calculating relative imports and thus only needs __package__
(technically __name__
and __path__
are also used, but only when __package__
isn't defined and that only happens if you do something bad). The fromlist
parameter has to do with how the bytecode operates -- which I will talk about later -- and level
is just the number of leading dots in a relative import.
If I had my way, the function would be defined as:
def __import__(name, spec=None): pass
This is the almost the same signature as importlib.import_module()
, but with passing in the spec of the calling module instead of just its __package__
; nice, simple, and easy to comprehend. The only thing I might consider changing is keeping the level
argument since that is a bit of string parsing that can be done ahead of time and baked into the bytecode, but I don't know if it really would make that much of a performance difference.
You can only import modules
Having the ability to import attributes off of a module really sucks from an implementation perspective. The bytecode itself doesn't handle that bit of detail and instead hoists it upon import
. It also leads to people getting into circular import problems. Finally, it causes people to separate from afar what namespace an object belongs to which can make code easier to read by keeping the association of an object and its containing module together). Plus you can easily replace from module import attr
with import module; attr = module.att
; TOOWTDI.
So if I had my way, when you said from foo import bar
, it would mean Python did import foo.bar; bar = foo.bar
and nothing else. No more from ... import *
, no more __all__
for modules, etc.; you wouldn't be allowed to import anything that didn't end up in sys.modules
(and I'm sure some teacher is saying "but import *
makes things easier", but in my opinion the cost of that little shortcut is too costly to keep it around). It makes thing cleaner to implement which helps eliminate edge cases. It makes code easier to analyze as you would be able to tell what modules you were after (mostly) statically. It just seems better to me both from my end in terms of implementing import and just simplifying the semantics for everyone to comprehend.
Looking up __import__
like anything else
Like a lot of syntax in Python, the import
statement is really just syntactic sugar for calling the builtins.__import__
function. But if we changed the semantics to follow normal name lookup instead of short-circuiting directly the builtins
namespace, some opportunities open up.
For instance, would you like to have dependencies unique to your package, e.g. have completely separate copies of your dependencies so you eliminate having to share the same dependency version with all other installed packages? Well, if you changed Python's semantics to look up __import__
like any other object then along with the import engine idea mentioned earlier you can have a custom sys.path
and sys.modules
for your package by having a package-specific __import__
. Basically you would need a loader that injected into the module's __dict__
its own instance of __import__
that knew how to look up dependencies unique to the package. So you could have a .dependencies
directory directly in your package's top-level directory and have __import__
put that at the front of its own sys.path
for handling top-level imports. That way if you needed version 1.3 of a package but other code needed 2.0 you could then put the project's 1.3 version in the .dependencies
directory and have that on your private sys.path
before site-packages
, making everything fall through. It does away with the whole explicit vendoring some projects do to lock down their dependencies.
Now I don't know how truly useful this would be. Vendoring is not hard thanks to relative imports and most projects don't seem to need it. It also complicates things as it means modules wouldn't be shared across packages and so anything that relied on object identity like an except
clause for matching caught exceptions could go south really fast (as the requests project learned the hard way). And then thanks to the venv
module and the concept of virtual environments the whole clashing dependency problem is further minimized. But since I realized this could be made possible I at least wanted to write it down. :)
I doubt any of this will ever change
While I may be able to create an object for __import__
that people use, getting people to use that instead of the sys
module would be tough, especially thanks to not being able to detect when someone replaced the objects on sys
entirely instead of simply mutating them. Changing the signature of __import__
would also be somewhat tough, although if an object for __import__
was used then the bytecode could call a method on that object and then __import__.__call__
would just be a shim for backwards-compatibility (and honestly people should not be calling or mucking with __import__
directly anyway; use importlib.import_module()
or all of the other various hooks that importlib
provides instead). Only importing modules is basically a dead-end due to backwards-compatibility, but I may be able to make the bytecode do more of the work rather than doing it in __import__
itself. And getting Python to follow normal lookup instead of going straight to builtins
when looking for __import__
probably isn't worth the hassle and potential compatibility issues.