An approach to lazy importing in Python 3.7

[Please note that the code in this blog post is now up on PyPI as part of the modutil library]

One of the new features in Python 3.7 is PEP 562 which adds support for __getattr__() and __dir__() on modules. Both open up some interesting possibilities. For instance, with __dir__() you can now have dir() only show what __all__ defines.

But being so immersed in Python's import system, my interest lies with __getattr__() and how it can be used to do lazy importing. Now I'm going to start this post off by stating that most people do not need lazy importing. Only when start-up costs are paramount should this come into play, e.g. CLI apps that have a short running time. For most people, the negatives to lazy loading are not worth it, e.g. knowing much later when an import fails instead of at application launch.

The old ways

Traditionally there have been two ways to do lazy/delayed importing. The oldest one is doing a local import instead of a global one (i.e. importing within your function instead of at the top of your module). While this does work to postpone importing until you run code that actually needs the module you're importing, it does have a detriment of having to write the same import statement over and over again. And if you only make some imports locals it become rather easy to forget which ones you were trying to avoid and then accidentally import the module globally. So this approach works, it just isn't ideal.

The other approach is using the lazy loader provided in importlib. Now various people like Google, Facebook, and Mercurial has successfully used this lazy loader. The first two love it to minimize overhead when running tests while the last one wants a fast start-up. One perk to the lazy loader over the local import is you can trigger a ModuleNotFoundError early as finding a module is done eagerly, it's just the loading that is postponed.

Most people also set it up so that everything is lazily loaded. Now that's a good and bad thing. It's good in that you have to do very little to implicitly make everything lazily load. It's bad in that when you make something implicit you can end up breaking expectations that code has (there is a reason that "explicit is better than implicit"). If a module expects to be loaded eagerly then it can break badly when loaded lazily. Mercurial actually developed a blacklist of modules to not load lazily to work around this, but they have to make sure to keep it updated so it isn't a perfect solution either.

The new way

In Python 3.7, modules can now have __getattr__() defined on them, allowing one to write a function which will import a module when it isn't available as an attribute on the module. This does have the drawback of making it a lazy import instead of a load and thus finding out very late if a ModuleNotFoundError will be raised. But it is explicit and still globally defined for your module, so it's easier to control.

The code itself is actually not that complicated:

import importlib


def lazy_import(importer_name, to_import):
    """Return the importing module and a callable for lazy importing.

    The module named by importer_name represents the module performing the
    import to help facilitate resolving relative imports.

    to_import is an iterable of the modules to be potentially imported (absolute
    or relative). The `as` form of importing is also supported,
    e.g. `pkg.mod as spam`.

    This function returns a tuple of two items. The first is the importer
    module for easy reference within itself. The second item is a callable to be
    set to `__getattr__`.
    """
    module = importlib.import_module(importer_name)
    import_mapping = {}
    for name in to_import:
        importing, _, binding = name.partition(' as ')
        if not binding:
            _, _, binding = importing.rpartition('.')
        import_mapping[binding] = importing

    def __getattr__(name):
        if name not in import_mapping:
            message = f'module {importer_name!r} has no attribute {name!r}'
            raise AttributeError(message)
        importing = import_mapping[name]
        # imortlib.import_module() implicitly sets submodules on this module as
        # appropriate for direct imports.
        imported = importlib.import_module(importing,
                                           module.__spec__.parent)
        setattr(module, name, imported)
        return imported

    return module, __getattr__

To use it, you can do the following:

# In pkg/__init__.py with a pkg/sub.py.
mod, __getattr__ = lazy_import(__name__, {'sys', '.sub as thingy'})

def test1():
    return mod.sys

def test2():
    return mod.thingy.answer

In designing this, the trickiest bit was how to simulate the import ... as ... syntax to avoid name clashes. I ended up accepting a string which closely resembles the import statement you would have written had you done a global import. I could have broken it out into a third argument which took a mapping, but I thought that was unnecessary and I preferred to have a more unified API.

Anyway, I'm always pleased when I can do something like this in only 20 lines of Python code and feel like it isn't a total hack. 😉