An approach to lazy importing in Python 3.7
[Please note that the code in this blog post is now up on PyPI as part of the modutil
library]
One of the new features in Python 3.7 is PEP 562 which adds support for __getattr__()
and __dir__()
on modules. Both open up some interesting possibilities. For instance, with __dir__()
you can now have dir()
only show what __all__
defines.
But being so immersed in Python's import system, my interest lies with __getattr__()
and how it can be used to do lazy importing. Now I'm going to start this post off by stating that most people do not need lazy importing. Only when start-up costs are paramount should this come into play, e.g. CLI apps that have a short running time. For most people, the negatives to lazy loading are not worth it, e.g. knowing much later when an import fails instead of at application launch.
The old ways
Traditionally there have been two ways to do lazy/delayed importing. The oldest one is doing a local import instead of a global one (i.e. importing within your function instead of at the top of your module). While this does work to postpone importing until you run code that actually needs the module you're importing, it does have a detriment of having to write the same import statement over and over again. And if you only make some imports locals it become rather easy to forget which ones you were trying to avoid and then accidentally import the module globally. So this approach works, it just isn't ideal.
The other approach is using the lazy loader provided in importlib
. Now various people like Google, Facebook, and Mercurial has successfully used this lazy loader. The first two love it to minimize overhead when running tests while the last one wants a fast start-up. One perk to the lazy loader over the local import is you can trigger a ModuleNotFoundError
early as finding a module is done eagerly, it's just the loading that is postponed.
Most people also set it up so that everything is lazily loaded. Now that's a good and bad thing. It's good in that you have to do very little to implicitly make everything lazily load. It's bad in that when you make something implicit you can end up breaking expectations that code has (there is a reason that "explicit is better than implicit"). If a module expects to be loaded eagerly then it can break badly when loaded lazily. Mercurial actually developed a blacklist of modules to not load lazily to work around this, but they have to make sure to keep it updated so it isn't a perfect solution either.
The new way
In Python 3.7, modules can now have __getattr__()
defined on them, allowing one to write a function which will import a module when it isn't available as an attribute on the module. This does have the drawback of making it a lazy import instead of a load and thus finding out very late if a ModuleNotFoundError
will be raised. But it is explicit and still globally defined for your module, so it's easier to control.
The code itself is actually not that complicated:
import importlib
def lazy_import(importer_name, to_import):
"""Return the importing module and a callable for lazy importing.
The module named by importer_name represents the module performing the
import to help facilitate resolving relative imports.
to_import is an iterable of the modules to be potentially imported (absolute
or relative). The `as` form of importing is also supported,
e.g. `pkg.mod as spam`.
This function returns a tuple of two items. The first is the importer
module for easy reference within itself. The second item is a callable to be
set to `__getattr__`.
"""
module = importlib.import_module(importer_name)
import_mapping = {}
for name in to_import:
importing, _, binding = name.partition(' as ')
if not binding:
_, _, binding = importing.rpartition('.')
import_mapping[binding] = importing
def __getattr__(name):
if name not in import_mapping:
message = f'module {importer_name!r} has no attribute {name!r}'
raise AttributeError(message)
importing = import_mapping[name]
# imortlib.import_module() implicitly sets submodules on this module as
# appropriate for direct imports.
imported = importlib.import_module(importing,
module.__spec__.parent)
setattr(module, name, imported)
return imported
return module, __getattr__
To use it, you can do the following:
# In pkg/__init__.py with a pkg/sub.py.
mod, __getattr__ = lazy_import(__name__, {'sys', '.sub as thingy'})
def test1():
return mod.sys
def test2():
return mod.thingy.answer
In designing this, the trickiest bit was how to simulate the import ... as ...
syntax to avoid name clashes. I ended up accepting a string which closely resembles the import statement you would have written had you done a global import. I could have broken it out into a third argument which took a mapping, but I thought that was unnecessary and I preferred to have a more unified API.
Anyway, I'm always pleased when I can do something like this in only 20 lines of Python code and feel like it isn't a total hack. 😉