11 Aug 2023 3 min read packaging

Differentiating between writing down dependencies to use packages and for packages themselves

When my teammate Courtney evaluated various workflow tools for Python development to see how pip + venv compared, she came back with a recommendation on using pip's requirements files as a way to record what people installed (and thus needed to run their code). As someone who prefers to use standards rather than tool-specific or conventional solutions, I lamented how pip didn't have a way to install just dependencies listed in pyproject.toml and its project.dependencies key.

The immediate issue with my wish is the specification says that project.name and project.version are required. That makes using the [project] table in pyproject.toml feel clunky if all you want to do is use it to store the dependencies your code relies on to run since you would need at lest placeholder values for those two keys. The other issue is that it's fundamentally a misuse of that data, and for some reason it took a while for it to "click" in my head as to why. This blog post is meant to write down how I came to this conclusion.

For me, a lot of my coding is for packages that others will install. That means I'm regularly listing the dependencies of my code in project.dependencies and then installing those dependencies to get my code to run (i.e., py -m pip install -e .). It even gets to the point that my testing requirements typically get listed in project.optional-dependencies.tests since pyproject.toml doesn't have an innate way to list development requirements. So writing down my dependencies in project.dependencies initially made sense to me as a way to record runtime dependencies for any code.

But then I thought about what the [project] table in pyproject.toml is truly meant for: as a TOML representation for the core metadata of a distribution. In that regard, the fact that I install what's listed in project.dependencies during development is just something I do, but that's not how users of my code use that information. For users, this information is just statically written in a wheel via a METADATA file. As such, what's in the [project] table is just data in a human-writable format that gets written into another format for inclusion in a wheel by some build back-end, all without the build back-end actually doing anything based on what's listed. This is the "for packages" bit we often think about when talk about packaging up some code.

That's very different compared to specifying the dependencies your code needs to run. Those sorts of dependencies end up being input to a resolver to figure out the full dependency graph of things you need installed. Once you have that complete list of dependencies you can then pass that list to an installer so you have everything you need to run your code. This is the "using packages" bit we often think about when we talk about using packages with our code.

💡

A resolver is something that figures out, from a list of requirements, what all of your actual dependencies are. Every installer has one since the dependencies you specify have their own dependencies, which have their own dependencies, etc. So a resolver figures out everything you need to install to make sure all the code you want installed can run. Pip's resolver uses resolvelib.

So, one purpose of the list of dependencies is to just write down some metadata that ends up in your wheel about what you need to make some package run, but that's it; it's something for build back-ends to write out in some file in a different format. The other is how to list what you need to be installed for your code to run; it's something for an installer to use as input into a resolver to figure out the complete list of dependencies your code needs. One is written down in some file as-is, the other is used as input into an algorithm to expand on the list.

The funny thing is the reasoning for why these similar-looking, but very different meaning, bits of data are different was explained to me over a decade ago. Back in 2013, Donald Stufft wrote a blog post entitled, "setup.py vs requirements.txt", where he explained when to use which file. In the post, Donald argued that setup.py was where you put metadata for your package, while requirements.txt is where you wrote down what your app depended on. That made sense to me back when Donald wrote his post. But it turns out I had not thought about the fundamental differences because Donald's case involved totally different file formats compared to my case where it was going to be pyproject.toml regardless of the purpose.

So where does this realization leave us? At minimum, it means we still don't have a standardized replacement for pip's requirements files which has become the way the community writes down the requirements some code has on various packages. It might also mean we either need to define a new table for pyproject.toml for specifying the requirements necessary to run your code or we need a new file entirely separate from pyproject.toml for the the purposes of writing down what's necessary to run your code.

You might also like...