Classifying Python virtual environment workflows

I have been spending some time as of late thinking, and asking the community via the fediverse, about how people deal with virtual environments in Python. I have ended up with various ways of classifying people's virtual environment management and I wanted to write it all down to both not forget and to explain to all the nice people answering my various polls on the topic why I was asking those questions.

When I talk about virtual environments, I am not talking about conda environments. To me, the key difference between the two is that conda environments must be activated to consistently work. Thomas Caswell of matplotlib gave a good explanation as to why this is (even if some people occasionally "cheat" and skip activating a conda environment before and had it still work).
This entire post is assuming that using virtual environments is something everyone should be doing, and so there will be no time spent justifying their use.

Who manages the virtual environment(s)?

The first dimension in classifying how people work with virtual environments is who is in charging of managing them? I think there are three possible options.

The first option is a tool that fully manages the virtual environment life cycle. Examples of this type are tools like Hatch, Poetry, and pipenv. With these tools you get a default environment that they create and manage. You might get to control the Python version, but otherwise environment management is meant to be transparent to you. Often these tools will create virtual environments as a side-effect of some other command you use with the tool (e.g. a shell or run command). It effectively makes virtual environments a behind-the-scenes implementation detail of the tool.

The next option are what I consider virtual environment helpers. Examples of tools that fit this definition are virtualenvwrapper and pyenv-virtualenv. Unlike the management tools mentioned above, these tools require a bit more participation from you as they are not trying to hide virtual environments from you. These tools want you to tell them which environment to create and use by naming your virtual environments.

Lastly, there is manual virtual environment management. Examples of this are virtualenv and venv itself. With these tools you manage everything, including where the virtual environment is stored.

Where are the virtual environments kept?

Regardless of how much effort you put into managing your virtual environments, they have to be kept somewhere. One place to store them is locally in the workspace/directory where your source code is. This approach is typically preferred by folks who want everything related to a project in a single spot.

The other place to store all virtual environments is in some central directory for a user. Some people do this because they want their workspace to be devoid of any transient files and only represent the code in version control or what they would ship to production. Others like having all virtual environments in a single directory so it's easy to delete them all with a single rm command. Some people also like to reuse their virtual environments across various projects (which the environment helpers and manual management allow for), so they need a project-agnostic place to keep their virtual environments (this motivation can also bleed into folks trying to save space via virtual environment reuse). Finally, some people keep their code somewhere that is automatically backed up (e.g. a Dropbox-managed directory), and so they don't want the virtual environment backed up as well.

When I did a poll via Mastodon to figure out why people used a central directory approach, the majority of people did it that way because their tool happened to work that way or it was just habit (53%). The next biggest group kept their environments in a central directory for environment reuse (24%).

Obviously there are pros and cons to either approach, hence why pretty much every management tool lets you choose which storage location style you prefer. In regards to whether people store their environment(s) locally or globally, a Mastodon poll suggests 58% of folks keep it locally (with 12% not knowing or caring).

How many virtual environments are needed?

Another dimension to classifying a person's virtual environment workflow is how many virtual environments do they end up creating? And to be clear, this is in reference to environments that get directly created in one of the above mentioned ways and not by e.g. nox or tox as a side-effect of running some other tool that isn't related to virtual environment management.

One approach is having a single virtual environment. If it's kept in the workspace it's in a .venv directory by convention (as much as a convention can be established for this, but all of the fully managed tools use this convention). If you work with only a single version of Python this can obviously suffice. People who have to work on multiple versions of Python also can use this approach and rely on tools like nox or tox to handle other Python versions when e.g. running tests. You can even "cheat" and use the virtual environments that nox and tox create when you suddenly need another virtual environment for another version of Python than the primary one you created.

Another approach is to create multiple virtual environments for various Python versions. The key point with this approach is that the only differing thing between the virtual environments is the Python version; the list of top-level dependencies is consistent across the various virtual environments.

The last approach is multiple virtual environments with differing dependencies. At this point there's technically nothing tying the environments together beyond perhaps the code you are working on potentially being installed in all of the environments as an editable install (but that's not required).

When I polled on Mastodon about these three options, a single environment was on top by a lot (68%). The second largest group of people have multiple environments that differ by Python version (18%).

Why do I care about this?

You might be wondering why I put so much time into a somewhat esoteric topic where people have already found workflows that they are happy with? It turns out I have a personal project and a work project whose usefulness to people is directly impacted by how their virtual environment workflow is ultimately classified.

The personal project is Python Launcher for Unix. That tool tries to do the lazy/smart thing and run the appropriate Python interpreter when you type py at your terminal. In order to make that happen it needs some way to deduce what that lazy/smart choice should be, else it isn't useful to a person.

The work project is the Python extension for VS Code. A key thing the extension does is lets someone specify what environment they are using for their workspace. That requires not only deducing what environment is the best one to use important for a smooth experience by automatically selecting the appropriate virtual environment, but it also requires finding all of your applicable environments so you can switch between them as desired.

Both of these projects can only do the lazy/smart thing for their users if they can figure out what environment(s) you want to use implicitly. The easiest one to support is when you have one environment in your workspace in a directory named .venv (whether you use a full management tool or manage by hand). The absolute worst is when you have multiple environments that are all have different dependencies that are stored in a central directory that you manage manually. Let's look at each classification to see how it helps/hurts in finding what environments are useful to your workspace.

When you use a fully managed tool, you can typically ask that tool to tell you where the environment is. When you use a helper, there is usually some algorithm they follow to decide where to store the environment. In both cases, though, you need to write custom code to support that tool and its unique way to be queried as to whether the environments exist (let alone even knowing that the tool exists and thus needs to be supported). Manually creating environments lends no support for this unless you put in a directory named .venv in your workspace as that's supported by both tools.

Where you store your environments impacts how the tool is supposed to know where to look. In the workspace is the easiest since there's a direct connection of the environment(s) to the workspace, especially if it follows the .venv naming convention. Putting the environments in a central directory can only be supported if you used a tool that follows some algorithm that you can also follow. That also requires custom code to re-implement that tool's algorithm or at least know that the tool exists so you can ask it about the environment(s) it created. There's also no standard reverse mapping of environment to workspace, so if you didn't use a full management tool then you can't infer what environment is associated with what workspace(s) simply by looking at the virtual environment.

The number of environments you have for your workspace mostly becomes a problem when you can't disambiguate between why they exist. When there's a single environment, that's easy as your options are rather limited. 😉 When your environments differ by only Python version, you can probably infer that the one with the newest Python version is probably what you want (unless you somehow signal you want a different Python version). But if the environments differ by what's installed, how is a tool to know what you want without you actively participating in the selection?

Having lined this all out, you can see why a single environment in the workspace is the easiest to support, while having no way to differentiate between multiple environments stored somewhere non-standard is the hardest.

So what can be done to support more workflows beyond everyone keeping a single virtual environment in their workspace in a directory named .venv? One idea is to have the concept of a locator where tools and people can provide commands that can be discovered and run to get all the relevant environment details. That would at least solve the issue of finding all of the pertinent virtual environments for a workspace. And in the VS Code case, we are also considering letting other extensions tell us about what environments they know about (and even selecting them), so that would be another way to help users more easily work with their environments. There's also been the suggestion to support a .venv file that would contain nothing but an absolute path to a virtual environment (ignoring bikeshedding on the name).

But none of this is standard. Right now the best option I have is to propose these ideas, see if any of them get traction, and then implement them. Hopefully after that there's enough continued support a PEP or something can be written to make some of this a true standard (e.g. the locator idea).