Python 3 support on PyPI

At (and since) PyCon 2015, there has been interest in trying to get quantified numbers in relation to Python 3 adoption (see PyPI download numbers and uptake in the astronomy community). One number I am personally interested in is per-project adoption of Python 3. While the Python 3 Wall of Superpowers shows wide support for Python 3 with the top projects by download, I wanted to take a larger look at PyPI as a whole to measure project adoption. So I wrote a script that downloads the JSON data for every project on PyPI and then analyzes the data in various ways.

Methodology

The methodology I used was to classify a project as supporting Python 3 if its trove classifier said it did (in other words, putting a project in the "Python 3" bin means it at least supports Python 3 and may support Python 2 as well), the same goes for Python 2 classifying (although if the project supports Python 3 it goes in that bin instead), and all other cases being bucketed as "unknown". Unfortunately you will discover that the majority of projects don't have a trove classifier specifying what versions of Python they support. Because of this and knowing that Python 2 is still the dominant version of Python it might be tempting to automatically clump all unknown projects into the Python 2 bucket, but as caniusepython3's overrides file shows, Python 3 projects also don't always specify their trove classifiers properly. So while you can group all unknown projects into Python 2 as a simple worst-case scenario, it is also conservative and thus not totally accurate (as is anything where human beings are relied upon to provide the data). I just unfortunately don't know what the possible rate is for people leaving the trove classifier off of Python 3 projects without doing a random sample, so I have erred on the side of conservative and just clumped all unknown projects into the Python 2 bucket in any adoption rate numbers cited below (which are calculated by simply doing Python 3 / (Python 2 + unknown) so it's a ratio more than an overall percentage which means 100% would indicate half of all projects are in Python 3 and the other half are in Python 3).

Another thing to realize about these numbers is that when one project switches to Python 3 it doesn't necessarily lead to only one other project porting, but possibly many. Think of when Django added Python 3 support: it didn't open the possibility of just one other project to port to Python 3 but hundreds. That means the impact of these numbers can be a little misleading since it is not going to be a linear porting rate due to network effects.

And finally, Python 3 will never hit 100%. This isn't just because there will be people who never port to Python 3 (because there will be), but also because some people have created separate projects for their Python 3 ports. I don't know how widespread this practice is, but it does mean in some instances the numbers actually cancel each other out because one project counts once for Python 2 support and then again for Python 3 support under a different name. In other words whatever number you want to consider meaning Python 3 has been adopted by the community, you need to make sure it's below 100%.

Some numbers

By release date

First I looked at every project on PyPI that has ever uploaded a release (i.e., created a project and uploaded some file):

  • Unknown: 34,447
  • Python 2: 8,064
  • Python 3: 11,377

This is a rather junk number since Python 2 is heavily weighted thanks to simply existing longer and thus having more abandoned projects. But even in this instance, Python 3 is supported by over 26% of projects.

The next number I looked at is the number of projects which released a version in the last 2 years:

  • Unknown: 19,760
  • Python 2: 5,898
  • Python 3: 10,295

40% of projects support Python 3 with this filter. I would argue this is the bare minimum release cadence to consider a project not fully abandoned.

Now it's probably more reasonable to claim that a project that has been updated in the last year is a better metric:

  • Unknown: 12,864
  • Python 2: 4,091
  • Python 3: 8,329

At about 49%, there is a clear increase in Python 3 support over the past year. This might correspond with Python 3.4.0 being released on 2014-03-16 which falls closer to being a year ago than two years ago.

What happens if you make the cut-off at 6 months?

  • Unknown: 8,183
  • Python 2: 2,809
  • Python 3: 6,134

That's over 55% for Python 3. This seems to suggest that more and more people are using Python 3 and that the growth from 2 years ago versus this past year is simply increased Python 3 support.

Now looking at just the latest release still runs into the issue of new projects that were simply uploaded and then never touched again, which is a similar issue as just looking at all PyPI projects without any sort of time horizon. So what does the numbers look like if 2 releases have to be made within the time horizon? For instance, what if a project has to make 2 releases within the past year?

  • Unknown: 8,033
  • Python 2: 2,779
  • Python 3: 5,889

That puts Python 3 adoption at over 54% compared to Python 2. That's a measurable increase compared to projects that had to have only a single release within the past year.

How about 2 releases in the last six months?

  • Unknown: 4,747
  • Python 2: 1,732
  • Python 3: 3,920

This gets us to 60% adoption for Python 3 compared to Python 2. Once again, an increase compared to the single release case. This continues to suggest that a decent of active projects are supporting Python 3.

By downloads

Having looked at release dates and how recently two releases have occurred, you might have noticed that popularity has still not come into the picture. What happens if you take monthly download counts into consideration? Take for instance all projects that are downloaded more than 1,440 time a month (which is equal to if the project was downloaded twice an hour):

  • Unknown: 4,115
  • Python 2: 1,215
  • Python 3: 3,332

Over 62%. Now download numbers are really unreliable as continuous integration can muck with the numbers, so this should be taken with a grain of salt. And to reiterate, most people who tell me they have a dependency blocking them from moving to Python 3 is more of a long tail issue and not major projects.

Now what if you use all of the measurement angles that I have suggested? What if you only look at projects that have had 2 releases in the past year which were downloaded at least 1,440 times in the past month?

  • Unknown: 2,241
  • Python 2: 868
  • Python 3: 2,465

Over 79% in this instance. Obviously the same issues with the other download-based numbers apply here, especially in the instance of people saying the long tail is their blocking dependency and not major projects. This also aligns with the Python 3 Wall of Superpowers suggesting that very active, popular projects have mostly added Python 3 support.

By some definition of "actively maintained"

All these numbers are nice, but the definition of an "actively maintained" project is still rather loose. What if we tightened it up a bit more? One could argue that a project that has made two releases over the past year that were at least 90 days apart seems like an active project. This suggests a rough average of a release at least every six months and weeds out any quick releases due to bugs being found immediately after release. Those numbers look like:

  • Unknown: 3,158
  • Python 2: 1,188
  • Python 3: 2,786

That's 64% and more in the range of measurements based on downloads rather than the strict latest release measurements. This suggests to me that if a good number of fairly active projects have been ported (which probably also tend to be popular, either because of their activity or their activity is because of their popularity).

New projects

And finally, the last way to slice the PyPI data is to look at new projects. Way back when Python 3.0 was released, Guido said that he hoped a majority of new projects would be using Python 3 within 5 years. It's now over 6 years since Python 3.0's release in December 2008 (which based on how I'm calculating ratios means hitting 100%). How has that hope panned out?

Let's first look at projects created over the past year:

  • Unknown: 8,242
  • Python 2: 2,589
  • Python 3: 5,050

That's over 46%. It should also be mentioned that since there is no historical data to compare against this is entirely based on the oldest release for a project. If a project deletes old releases -- which would be bad as it would break people using older versions for some reason -- then that could cause them to look newer than they actually are (you also used to be able to re-upload a file under the same release number, but PyPI no longer supports that). But using this definition of project creation does allow for calculating the rate throughout the past which may be a nice way to graph the adoption rate (it also makes it easier to re-calculate values in case more robust adoption measurements are used, e.g., checking whether any uploaded wheel files support Python 3).

What happens if we limit it to projects created in the past 6 months?

  • Unknown: 4,260
  • Python 2: 1,401
  • Python 3: 2,952

52%. Is this increasing rate of new projects supporting Python 3? Let's just consider projects created in the past 30 days (which might be too noisy due to the number of projects under consideration):

  • Unknown: 761
  • Python 2: 247
  • Python 3: 534

Over 52% still. So at the moment the percentage is close to constant, but we are also diving into numbers that are small enough that little fluctuations will throw things off.

Conclusion

A whole bunch of numbers that vary from 26% to 79% comparing Python 3 to Python 2 (remember: 100% would mean a 50/50 split between the two versions).

Based on all of this, I think there are a couple of statistics we can use to measure Python 3 adoption as a community. For new members of the Python community, the percentage of projects created within the last year seems like a good measure. For long tail acceptance, you can either measure from a set date like the release of Python 3.1.0 or a sliding window like any project that has released in the past e.g., 5 years; I'm not sure which way to measure will be the best nor what values to use to detect long tail adoption (feel free to hit me up on social media if you have any input on this). Major project support is mostly there so I don't think there is a need to be measuring that separate from the other two measurements. Maybe toss in one or two other metrics -- downloads from python.org? -- and we should have the metrics suite we want to measure Python 3 adoption as an overall community.