Differentiating between writing down dependencies to use packages and for packages themselves
When my teammate Courtney evaluated various workflow tools for Python development to see how pip + venv compared, she came back with a recommendation on using pip's requirements files as a way to record what people installed (and thus needed to run their code). As someone who prefers to use standards rather than tool-specific or conventional solutions, I lamented how pip didn't have a way to install just dependencies listed in pyproject.toml
and its project.dependencies
key.
The immediate issue with my wish is the specification says that project.name
and project.version
are required. That makes using the [project]
table in pyproject.toml
feel clunky if all you want to do is use it to store the dependencies your code relies on to run since you would need at lest placeholder values for those two keys. The other issue is that it's fundamentally a misuse of that data, and for some reason it took a while for it to "click" in my head as to why. This blog post is meant to write down how I came to this conclusion.
For me, a lot of my coding is for packages that others will install. That means I'm regularly listing the dependencies of my code in project.dependencies
and then installing those dependencies to get my code to run (i.e., py -m pip install -e .
). It even gets to the point that my testing requirements typically get listed in project.optional-dependencies.tests
since pyproject.toml
doesn't have an innate way to list development requirements. So writing down my dependencies in project.dependencies
initially made sense to me as a way to record runtime dependencies for any code.
But then I thought about what the [project]
table in pyproject.toml
is truly meant for: as a TOML representation for the core metadata of a distribution. In that regard, the fact that I install what's listed in project.dependencies
during development is just something I do, but that's not how users of my code use that information. For users, this information is just statically written in a wheel via a METADATA
file. As such, what's in the [project]
table is just data in a human-writable format that gets written into another format for inclusion in a wheel by some build back-end, all without the build back-end actually doing anything based on what's listed. This is the "for packages" bit we often think about when talk about packaging up some code.
That's very different compared to specifying the dependencies your code needs to run. Those sorts of dependencies end up being input to a resolver to figure out the full dependency graph of things you need installed. Once you have that complete list of dependencies you can then pass that list to an installer so you have everything you need to run your code. This is the "using packages" bit we often think about when we talk about using packages with our code.
So, one purpose of the list of dependencies is to just write down some metadata that ends up in your wheel about what you need to make some package run, but that's it; it's something for build back-ends to write out in some file in a different format. The other is how to list what you need to be installed for your code to run; it's something for an installer to use as input into a resolver to figure out the complete list of dependencies your code needs. One is written down in some file as-is, the other is used as input into an algorithm to expand on the list.
The funny thing is the reasoning for why these similar-looking, but very different meaning, bits of data are different was explained to me over a decade ago. Back in 2013, Donald Stufft wrote a blog post entitled, "setup.py vs requirements.txt", where he explained when to use which file. In the post, Donald argued that setup.py
was where you put metadata for your package, while requirements.txt
is where you wrote down what your app depended on. That made sense to me back when Donald wrote his post. But it turns out I had not thought about the fundamental differences because Donald's case involved totally different file formats compared to my case where it was going to be pyproject.toml
regardless of the purpose.
So where does this realization leave us? At minimum, it means we still don't have a standardized replacement for pip's requirements files which has become the way the community writes down the requirements some code has on various packages. It might also mean we either need to define a new table for pyproject.toml
for specifying the requirements necessary to run your code or we need a new file entirely separate from pyproject.toml
for the the purposes of writing down what's necessary to run your code.