12 Jun 2020 5 min read project management

Why I don't like SemVer anymore

Back in 2017 I wrote a blog post on how I manage version numbers. In that post I mentioned how I tried to follow semantic versioning. Over the subsequent 3 years I have come to the conclusion I actually don't like SemVer for my projects. It turns out I am not the only person to hold this opinion; Donald, Hynek and Bernat seem to agree with the general sentiment.

"But why don't you like it?"

Here's a thought experiment: you need to add a new warning to your Python package that tries to follow SemVer. Would that single change cause you to increase the major, minor, or micro version number? You might think a micro number bump since it isn't a new feature or breaking anything. You might think it's a minor version bump because it isn't exactly a bugfix. And you might think it's a major version bump because if you ran your Python code with -W error you suddenly introduced a new exception which could break people's code. I did a poll on Twitter and there was no consensus as to what the right answer was.

For a Python project using SemVer, adding a warning is:
— Brett Cannon (@brettsky) May 17, 2020

To me that speaks volumes to why SemVer does not inherently work: someone's bugfix may be someone else's breaking change. Because in Python we can't statically define what an API change is there will always be a disagreement between you and your dependencies as to what a "feature" or "bugfix" truly is.

Exceptions are especially a tricky case. They don't outwardly change an API, but they certainly can break code if a user was being careful about what exceptions they were catching (this is why Java makes exceptions part of the declared API). And this isn't a hypothetical issue, either: Python's CI broke once because a project we relied on introduced a new warning in a bugfix release and we run most code with -X dev or -W error to make sure we don't ship stale warnings out with Python itself. Since we pinned to minor/feature version only, CI pulled the latest bugfix, and 💥, CI suddenly started failing for everyone.

And even if you were very diligent/broad with your interpretation to avoid accidentally breaking people with a bugfix release, bugs can still happen in a bugfix release. Just today a popular project accidentally did a micro release with a bug in it that broke a bunch of people. It obviously wasn't intentional, but it does happen which means SemVer can't protect you from having to test your code to see if a micro version is compatible with your code.

This also applies to avoiding major version changes. There's no guarantee that a major version will actually break you, it just might break you. But as I just mentioned, micro releases can do that, too. So then why do we try to contort ourselves into fitting into SemVer and trying to rely on it when defining our acceptable dependency versions when the numbers don't really have a consistent meaning between projects, making the concept somewhat of a moot point?

Hopefully you're running CI to catch bugs in your project. But one kind of bug is not specifying your dependencies as you need to in order to keep your code from breaking and helping to smooth out this potential disagreement between you and your dependencies as to what a "bugfix" is. Now it may be frustrating when your CI turns red due to an external project (and this is why running your test suite on a cron job is a good thing; FYI GitHub Actions supports cron jobs), but you still have to check your dependencies don't break you if you don't pin to exact versions of your dependencies (which you should be doing for app; libraries/packages have to do as wide a range as possible, basically setting a floor and skipping known buggy, intermediate versions).

Version numbers are your branching strategy as a sequence of numbers

I had a bit of an epiphany while thinking on this topic: version numbers are just a mapping of a sequence of digits to our branching strategy in source control. For instance, if you are doing SemVer then your X.Y.Z version maps a branch to X.Y branch where you're doing your current feature work, an X.Y.Z+1 branch for any bugfixes, and potentially an X+1.0.0 branch where you doing some crazy new stuff. So you got your next branch, main branch, and bugfix branch. And all three of those branches are alive and receiving updates.

And for projects that have those 3 kinds of branches going, the concept of SemVer makes much more sense. But how many projects are doing that? You have to be a pretty substantial project typically to have the throughput to justify that much project overhead. And you still have the disagreement of what a "bugfix" is.

I suspect there are a lot more projects that have a single bugfix branch and a main branch which has all feature work, whether it be massively backwards-incompatible or not. In that case why carry around two version numbers? This is how you end up with ZeroVer where you major number stays 0 forever. But if you're doing that why not just drop a digit and have your version be X.Y? PEP 440 supports it, and it would more truthfully represent your branching strategy appropriately in your version number. And I bet if you did this most people would recognize what the version represents due to the lack of the third digit.

And what about projects that only have a main branch? At that point you really just have a X version number that is monotonically increasing. Once again PEP 440 supports it, so why not! It still communicates your branch strategy of there being only a single branch at any one time. Now I know this is a bit too unconventional for some people, and so this is when people reach for CalVer and set have a YYYY.X version numbering scheme. And if you are taking an approach like pip where you make one major release a year, that makes sense! I just personally don't know if I would want to shoehorn in CalVer if I didn't stick to such an annual release cadence. I have heard some people say they still like CalVer to know how long it has been since there has been a release, but if stuff is working does that really matter?

And just as a reminder in case you're looking at all of this and thinking it's a bit too much if you needed to do a release to fix a simple spelling mistake, PEP 440 has the concept of post releases for that exact situation.

Summary

SemVer isn't as straightforward as it sounds; we don't all agree on what a major, minor, or micro change really is
Your version number represents your branching strategy, so you choose a versioning scheme that's appropriate your branching and release strategy
Rely on CI, potentially on a cron job, to detect when a project breaks for you instead of leaving it up to the project to try and make that call based on their interpretation of SemVer; will inevitably disagree
Remember to pin your dependencies in your apps if you really don't want to have to worry about a dependency breaking you unexpectedly
Libraries/packages should be setting a floor, and if necessary excluding known buggy versions, but otherwise don't cap the maximum version as you can't predict future compatibility

This doesn't necessarily apply to other ecosystems

All of this advice coming from me does not necessarily apply to all other packaging ecosystems. Python's flat dependency management has its pros and cons, hence why some other ecosystems do things differently. For instance, with npm installing transient dependencies independently for your direct dependencies it makes SemVer potentially more useful, but you still have a potential disagreement between you and your dependency, so pinning is still better (and thus why npm has package-lock.json).

Some other ecosystems can also statically enforce SemVer in a more structured way. For instance, Elm's compiler can statically tell when no API changed and whether an API was added or not. That allows them to compute how to bump a version number compared to the previous version. In that instance you're just checking if a bugfix broke you rather than whether an API change will cause you issue.

"But why don't you like it?"

Version numbers are your branching strategy as a sequence of numbers

Summary

This doesn't necessarily apply to other ecosystems

You might also like...