How virtual environments work
After needing to do a deep dive on the venv
module (which I will explain later in this blog post as to why), I thought I would explain how virtual environments work to help demystify them.
Why do virtual environments exist?
Back in my the day, there was no concept of environments in Python: all you had was your Python installation and the current directory. That meant when you installed something you either installed it globally into your Python interpreter or you just dumped it into the current directory. Both of these approaches had their drawbacks.
Installing globally meant you didn't have any isolation between your projects. This led to issues like version conflicts between what one of your projects might need compared to another one. It also meant you had no idea what requirements your project actually had since you had no way of actually testing your assumptions of what you needed. This was an issue if you needed to share you code with someone else as you didn't have a way to test that you weren't accidentally wrong about what your dependencies were.
Installing into your local directory didn't isolate your installs based on Python version or interpreter version (or even interpreter build type, back when you had to compile your extension modules differently for debug and release builds of Python). So while you could install everything into the same directory as your own code (which you did, and thus didn't use src
directory layouts for simplicity), there wasn't a way to install different wheels for each Python interpreter you had on your machine so you could have multiple environments per project (I'm glossing over the fact that back in my the day you also didn't have wheels or editable installs).
Enter virtual environments. Suddenly you had a way to install projects as a group that was tied to a specific Python interpreter. That got us the isolation/separation of only installing things you depend on (and being able to verify that through your testing), as well has having as many environments as you want to go with your projects (e.g. an environment for each version of Python that you support). So all sorts of wins! It's an important feature to have while doing development (which is why it can be rather frustrating for users when Python distributors leave venv
out).
How do virtual environments work?
conda run
). This is why you are always expected to activate a conda environment, as some conda packages require those shell scripts to be run. I won't be covering conda environments in this post.Their structure
There are two parts to virtual environments: their directories and their configuration file. As a running example, I'm going to assume you ran the command py -m venv --without-pip .venv
in some directory on a Unix-based OS (you can substitute py
with whatever Python interpreter you want, including the Python Launcher for Unix).
A virtual environment has 3 directories and potentially a symlink in the virtual environment directory (i.e. within .venv
):
bin
(Scripts
on Windows)include
(Include
on Windows)lib/pythonX.Y/site-packages
whereX.Y
is the Python version (Lib/site-packages
on Windows)lib64
symlinked tolib
if you're using a 64-bit build of Python that's on a POSIX-based OS that's not macOS
The Python executable for the virtual environment ends up in bin
as various symlinks back to the original interpreter (e.g. .venv/bin/python
is a symlink; Windows has a different story). The site-packages
directory is where projects get installed into the virtual environment (including pip
if you choose to have it installed into the virtual environment). The include
directory is for any header files that might get installed for some reason from a project. The lib64
symlink is for consistency on those Unix OSs where they have such directories.
The configuration file is pyvenv.cfg
and it lives at the top of your virtual environment directory (e.v. .venv/pyvenv.cfg
). As of Python 3.11, it contains a few entries:
home
(the directory where the executable used to create the virtual environment lives;os.path.dirname(sys._base_executable)
)include-system-packages
(should the globalsite-packages
be included, effectively turning off isolation?)version
(the Python version down to the micro version, but not with the release level, e.g.3.12.0
, but not3.12.0a6
)executable
(the executable used to create the virtual environment;os.path.realpath(sys._base_executable)
)command
(the CLI command that could have recreated the virtual environment)
On my machine, the pyvenv.cfg
contents are:
One interesting thing to note is pyvenv.cfg
is not a valid INI file according to the configparser
module due to lacking any sections. To read fields in the file you are expected to use line.partition("=")
and to strip the resulting key and value.
And that's all there is to a virtual environment! When you don't install pip
they are extremely fast to create: 3 files, a symlink, and a single file. And they are simple enough you can probably create one manually.
One point I would like to make is how virtual environments are designed to be disposable and not relocatable. Because of their simplicity, virtual environments are viewed as something you can throw away and recreate quickly (if it takes your OS a long time to create 3 directories, a symlink, and a file consisting of 292 bytes like on my machine, you have bigger problems to worry about than virtual environment relocation 😉). Unfortunately, people tend to conflate environment creation with package installation, when they are in fact two separate things. What projects you choose to install with which installer is actually separate from environment creation and probably influences your "getting started" time the most.
How Python uses a virtual environment
During start-up, Python automatically calls the site.main()
function (unless you specify the -S
flag). That function calls site.venv()
which handles setting up your Python executable to use the virtual environment appropriately. Specifically, the site
module:
- Looks for
pyvenv.cfg
in either the same or parent directory as the running executable (which is not resolved, so the location of the symlink is used) - Looks for
include-system-site-packages
inpyvenv.cfg
to decide whether the systemsite-packages
ends up onsys.path
- Sets
sys._home
ifhome
is found inpyvenv.cfg
(sys._home
is used bysysconfig
)
That's it! It's a surprisingly simple mechanism for what it accomplishes.
One thing to notice here about how all of this works is virtual environment activation is optional. Because the site
module works off of the symlink to the executable in the virtual environment to resolve everything, activation is just a convenience. Honestly, all the activation scripts do are:
- Puts the
bin/
(orScripts/
) directory at the front of yourPATH
environment variable - Sets
VIRTUAL_ENV
to the directory containing your virtual environment - Tweaks your shell prompt to let you know your
PATH
has been changed - Registers a
deactivate
shell function which undoes the other steps
In the end, whether you type python
after activation or .venv/bin/python
makes no difference to Python. Some tooling like the Python extension for VS Code or the Python Launcher for Unix may check for VIRTUAL_ENV
to pick up on your intent to use a virtual environment, but it doesn't influence Python itself.
Introducing microvenv
In the Python extension for VS Code, we have an issue where Python beginners end up on Debian or a Debian-based distro like Ubuntu and want to create a virtual environment. Due to Debian removing venv
from the default Python install and beginners not realizing there was more to install than python3
, they often end up failing at creating a virtual environment (at least initially as you can install python3-venv
separately; in the next version of Debian there will be a python3-full
package you can install which will include venv
and pip
, but it will probably take a while for all the instructions online to be updated to suggest that over python3
). We believe the lack of venv
is a problem as beginners should be using environments, but asking them to install yet more software can be a barrier to getting started (I'm also ignoring the fact pip isn't installed by default on Debian either which also complicates the getting started experience for beginners).
But venv
is not shipped as a separate part of Python's stdlib, so we can't simply install it from PyPI somehow or easily ship it as part of the Python extension to work around this. Since venv
is in the stdlib, it's developed along with the version of Python it ships with, so there's no single copy which is fully compatible with all maintained versions of Python (e.g. Python 3.11 added support to use sysconfig
to get the directories to create for a virtual environment, various fields in pyvenv.cfg
have been added over time, use new language features may be used, etc.). While we could ship a copy of venv
for every maintained version of Python, we potentially would have to ship for every micro release to guarantee we always had a working copy, and that's a lot of upstream tracking to do. And even if we only shipped copies from minor release of Python, we would still have to track every micro release in case a bug in venv
was fixed.
Hence I have created microvenv. It is a project which provides a single .py
file which you use to create a minimal virtual environment. You can either execute it as a script or call its create()
function that is analogous to venv.create()
. It's also compatible with all maintained versions of Python. As I (hopefully) showed above, creating a virtual environment is actually straight-forward, so I was able to replicate the necessary bits in less than 100 lines of Python code (specifically 87 lines in the 2023.1.1 release). That actually makes it small enough to pass in via python -c
, which means it could be embedded in a binary as a string constant and passed as an argument when executing a Python executable as a subprocess if you wanted to (directly executing microvenv.py
works). Hopefully that means a tool could guarantee it can always construct a virtual environment somehow.
To keep microvenv
simple, small, and maintainable, it does not contain any activation scripts. I personally don't want to be a shell script expert for multiple shells, nor do I want to track the upstream activation scripts (and they do change in case you were thinking "it shouldn't be that hard to track"). Also, in VS Code we are actually working towards implicitly activating virtual environments by updating your environment variables directly instead of executing any activation shell scripts, so the shell scripts aren't needed for our use case (we are actively moving away from using any activation scripts where we can as we have run into race condition problems with them when sending the command to the shell; thank goodness of conda run
, but we also know people still want an activated terminal).
I'm also skipping Windows support because we have found the lack of venv
to be a unique problem for Linux in general, and Debian-based distros specifically.
I honestly don't expect anyone except tool providers to use microvenv
, but since it could be useful to others beyond VS Code, I decided it was worth releasing on its own. I also expect anyone using the project to only use it as a fallback when venv
is not available (which you can deduce by running py -c "from importlib.util import find_spec; print(find_spec('venv') is not None)"
). And before anyone asks why we don't just use virtualenv
, its wheel is 8.7MB compared to microvenv
at 3.9KB; 0.05% the size, or 2175x smaller. Granted, a good chunk of what makes up virtualenv
's wheel is probably from shipping pip
and setuptools
in the wheel for fast installation of those projects after virtual environment creation, but we also acknowledge our need for a small, portable, single-file virtual environment creator is rather niche and something virtualenv
currently doesn't support (for good reason).
Our plan for the Python extension for VS Code is to use microvenv
as a fallback mechanism for our Python: Create Environment command (FYI we also plan to bootstrap pip
via its pip.pyz
file from bootstrap.pypa.io by downloading it on-demand, which is luckily less than 2MB). That way we can start suggesting to users in various UX flows to create and use an environment when one isn't already being used (as appropriate, of course). We want beginners to learn about environments if they don't already know about them and also remind experienced users when they may have accidentally forgotten to create an environment for their workspace. That way people get the benefit of (virtual) environments with as little friction as possible.