Talk to any developer that inherits some large, old code base that has developed semantics as time has gone on and they will always have something they wished they could change about the code they inherited. After inheriting
import in Python, I too have a list of things I would love to see changed in how it works to make it a bit more sane and easier to work with. This blog post is basically a brain dump/wishlist of what I would love to see changed in import some day.
No global state
import currently stands, all of its state is stored in the
sys module. This makes growing the API rather painful as it means expanding a module's API surface rather than adding another attribute on an object. For me, I would rather have
import be a fully self-contained object that stored all of its own state.
This has been proposed before in PEP 406 and under the name of "import engine". It unfortunately has not gone anywhere simply due to the fact that it would take time to design the API for a fully encapsulated
import class and it doesn't buy people anything today. Now in the future it could open up some unique possibilities for
import itself -- which will be discussed later -- as well as simply be cleaner to maintain as it would allow for cleaner separation between interpreters in a single process.
Making this actually happen would occur over stages. A new
ImportEngine class would created which would define the API we wished
import would have from scratch. That API would then delegate under the hood to the
sys module so that semantics stayed the same, including making instances of the class callable and assigning such an instance to
builtins.__import__. At some point the objects that were stored in the instance of
builtins.__import__ would be set in the
sys module instead of the object delegating to the
sys module itself. After a proper amount of time, once everyone had moved over to using the object's API instead of the
sys module then we could consider cutting out the import-related parts from the
__import__ more sane
In case you didn't know, the signature for
builtins.__import__() is a bit nuts:
def __import__(name, globals=None, locals=None, fromlist=(), level=0): pass
locals argument isn't used. The
globals argument is only used for calculating relative imports and thus only needs
__path__ are also used, but only when
__package__ isn't defined and that only happens if you do something bad). The
fromlist parameter has to do with how the bytecode operates -- which I will talk about later -- and
level is just the number of leading dots in a relative import.
If I had my way, the function would be defined as:
def __import__(name, spec=None): pass
This is the almost the same signature as
importlib.import_module(), but with passing in the spec of the calling module instead of just its
__package__; nice, simple, and easy to comprehend. The only thing I might consider changing is keeping the
level argument since that is a bit of string parsing that can be done ahead of time and baked into the bytecode, but I don't know if it really would make that much of a performance difference.
You can only import modules
Having the ability to import attributes off of a module really sucks from an implementation perspective. The bytecode itself doesn't handle that bit of detail and instead hoists it upon
import. It also leads to people getting into circular import problems. Finally, it causes people to separate from afar what namespace an object belongs to which can make code easier to read by keeping the association of an object and its containing module together). Plus you can easily replace
from module import attr with
import module; attr = module.att; TOOWTDI.
So if I had my way, when you said
from foo import bar, it would mean Python did
import foo.bar; bar = foo.bar and nothing else. No more
from ... import *, no more
__all__ for modules, etc.; you wouldn't be allowed to import anything that didn't end up in
sys.modules (and I'm sure some teacher is saying "but
import * makes things easier", but in my opinion the cost of that little shortcut is too costly to keep it around). It makes thing cleaner to implement which helps eliminate edge cases. It makes code easier to analyze as you would be able to tell what modules you were after (mostly) statically. It just seems better to me both from my end in terms of implementing import and just simplifying the semantics for everyone to comprehend.
__import__ like anything else
Like a lot of syntax in Python, the
import statement is really just syntactic sugar for calling the
builtins.__import__ function. But if we changed the semantics to follow normal name lookup instead of short-circuiting directly the
builtins namespace, some opportunities open up.
For instance, would you like to have dependencies unique to your package, e.g. have completely separate copies of your dependencies so you eliminate having to share the same dependency version with all other installed packages? Well, if you changed Python's semantics to look up
__import__ like any other object then along with the import engine idea mentioned earlier you can have a custom
sys.modules for your package by having a package-specific
__import__. Basically you would need a loader that injected into the module's
__dict__ its own instance of
__import__ that knew how to look up dependencies unique to the package. So you could have a
.dependencies directory directly in your package's top-level directory and have
__import__ put that at the front of its own
sys.path for handling top-level imports. That way if you needed version 1.3 of a package but other code needed 2.0 you could then put the project's 1.3 version in the
.dependencies directory and have that on your private
site-packages, making everything fall through. It does away with the whole explicit vendoring some projects do to lock down their dependencies.
Now I don't know how truly useful this would be. Vendoring is not hard thanks to relative imports and most projects don't seem to need it. It also complicates things as it means modules wouldn't be shared across packages and so anything that relied on object identity like an
except clause for matching caught exceptions could go south really fast (as the requests project learned the hard way). And then thanks to the
venv module and the concept of virtual environments the whole clashing dependency problem is further minimized. But since I realized this could be made possible I at least wanted to write it down. :)
I doubt any of this will ever change
While I may be able to create an object for
__import__ that people use, getting people to use that instead of the
sys module would be tough, especially thanks to not being able to detect when someone replaced the objects on
sys entirely instead of simply mutating them. Changing the signature of
__import__ would also be somewhat tough, although if an object for
__import__ was used then the bytecode could call a method on that object and then
__import__.__call__ would just be a shim for backwards-compatibility (and honestly people should not be calling or mucking with
__import__ directly anyway; use
importlib.import_module() or all of the other various hooks that
importlib provides instead). Only importing modules is basically a dead-end due to backwards-compatibility, but I may be able to make the bytecode do more of the work rather than doing it in
__import__ itself. And getting Python to follow normal lookup instead of going straight to
builtins when looking for
__import__ probably isn't worth the hassle and potential compatibility issues.