23 Feb 2014 8 min read project management

My experience creating caniusepython3

Having been so heavily involved with the creation of Python 3, I'm constantly trying to figure out ways to help more of the community make the switch. While doing what I can to make Python 3 an obviously superior version to Python 2, that only goes so far. At some point you have to start addressing the issues that are acting as blockers for people in general.

At this point I believe the biggest blocker for people is other people. =) What I mean by this is that I have stopped hearing people tell me that they don't see a reason to switch to Python 3, but instead they tell me they are blocked by a dependency that has not switched over. This then becomes a perpetual issue as people check their dependencies once and then never think to check them again to see if anything has changed; having to manually go through a bunch of PyPI project pages to look at trove classifiers and such is not exactly fun and can be time-consuming. Because of this I decided it was worth my time to automate this as best as I could.

That led to me creating caniusepython3. The script will take either a requirements file, a metadata file, or simply a comma-separated list of projects and then using distlib it figures out what dependencies -- explicit and implicit -- still need to be ported. It will also tell you what leaf nodes of your dependency graph can be ported today so you can go to those projects and request they switch to Python 3 (or if they are inactive projects then start looking for alternatives or fork the project and port it yourself). And since it's just a command-line tool you can run it regularly without much hassle to see what projects are currently holding you up.

But this blog post isn't so much about caniusepython3 as my experience creating a Python project from scratch and deploying to PyPI.

GitHub + Travis is nice

With me being the impetus behind Python moving to Mercurial you might be surprised that I used GitHub for this. While I still do not regret using Mercurial for Python at all, I realize that GitHub has more community support. And since the GitHub for Mac app is nice and does exactly what I want to shield me from ever having to even realize that git is being used, I'm okay with using GitHub.

But toss in Travis and it sweetens the pot. I have previously used drone.io for my continuous integration solution, but I decided to give Travis a try this time and it was a rather smooth process. Setup was simple and they have enough Python version coverage to make me happy (2.6, 2.7, 3.2, and 3.3).

Python 3.4 + venv + ensurepip is VERY nice

I suspect most people know of virtualenv and pip at this point. Just in case you don't, though, virtualenv lets you create an isolated sandbox/virtual environment in which to install dependencies for your project (so it doesn't pollute your global installation), while pip is a Python project installer. One convenient feature of virtualenv is that it installs pip for you so that creating a virtualenv to work in gets you the basic tools needed to start populating your sandbox with your dependencies and get to work.

In Python 3.3 we actually pulled in virtualenv, cleaned things up a bit, and created the venv module in Python's standard library. While it's fantastic to be able to do python3 -m venv my_sandbox to create a virtual environment to play in, it doesn't pull in pip like the external project does. I have heard from people they explicit avoid venv because it doesn't pull in pip.

And this is one of the reasons why Python 3.4 introduced the ensurepip module which will be triggered during the installation of Python to make sure that pip is also installed. This means that venv now pulls in pip by default as of Python 3.4 which is rather handy and makes getting started that much simpler.

Go ahead and use setuptools

I used to avoid setuptools and tried to always just use distutils. This was mostly because distutils was always available while setuptools wasn't. Then there was the whole setuptools not supporting Python 3, which led to the forking of the project into distribute and then the later unforking (so in case you didn't know, use setuptools and not distribute!). But at this point setuptools is used by practically everyone and so there's no reason to not use it and rely on its added feature set.

People don't always specify their Python version support

Speaking of metadata, not everyone gets it right. In order to show up on the list of projects that support Python 3, the project needs to set the Programming Language :: Python :: 3 trove classifier. Unfortunately some projects don't set the generic Python 3 classifier but do set a more specific one, e.g. Programming Language :: Python :: 3.3. Other projects don't set either of them which requires me to go in and find out if I need to manually override them. And then of course there are the projects that have been added to Python 3 at some point, e.g. unittest2. All of this only applies to released code (e.g. boto has Python 3 support in their VCS but it has not been officially released yet).

All of this has led to an overrides.json file in caniusepython3 so that I can manually override what the metadata for the project says about Python 3 support. I obviously wish I didn't have to do this, so if you know of a project that needs to have its metadata updated then file a bug for that project to fix their setup (I plan to for the projects I have already come across but I just have not had the time to do it yet).

PSA: read package data through the loader

And because overrides.json exists (and will always exist do to stdlib additions), I need a way to read data files in the project's package. Typically people just use open(), but that doesn't work if you're in a zip file (this is why setuptools has its zip_safe flag).

Well, something that most people don't realize is that import loaders introduced optional APIs to let loaders read data. Formalized in importlib.abc.ResourceLoader, loaders can define a get_data() method which allows code to read data files no matter how everything is stored. Since zipimport has existed it has supported this API and importlib officially completed the picture for loader support, although pkgutil had shims to support it as well for quite a while.

But what's key is `pkgutil.get_data() which masks having to make things absolute compared to a package's file location, making sure to use pkgutil's shims in versions of Python that pre-date importlib, etc. This means that you can conveniently read in package data while staying zip safe. It's something every project should be using if they aren't and need to read data files.

Python 3 sure is great

Whenever I do projects I always write them using the latest version of Python. Being involved in Python's development means that by the time a release comes out I have potentially been living with a new feature for 18 months, so it's old news to me and thus I just use it out of habit. Then when I set a backwards-compatibility goal I begin to work backwards towards that version support. This allows my code to use as many new features as possible instead of going in the reverse direction and potentially leaving out new features which will carry forward in the future. And specifically in the case of supporting both Python 2 and 3 in a new project, it lets me get my str/bytes situation straightened out from the beginning.

In the case of caniusepython3, I scaled back all the way to Python 2.6. People asked for support to at least 2.7, but since the Python 2/3 porting HOWTO says Python 2.6 is a reasonable target I figured I should follow my own advice and go that far back (disclaimer: I wrote the HOWTO; I should also mention the Python 3.4 version of the HOWTO is a thorough rewrite). So I started with supporting Python 3.3 exclusively and then worked backwards.

Supporting Python 3.2

Getting to Python 3.2 wasn't hard as it required was adding a dependency on mock since it was added to the stdlib. Since Travis doesn't support Python 3.1 (nor does Django), I didn't worry about explicit Python 3.1 support. But thanks to the language moratorium (PEP 3003 which I have to disclaim that I co-wrote), there are only library differences which I have to deal with for Python 2.7 anyway.

Supporting Python 2.7

Getting down to Python 2.7 wasn't difficult either. It required more dependencies (argparse and concurrent.futures), but that's no big deal. I also had to add the appropriate __future__ statements for unicode literals and the print function:

from __future__ import print_function
from __future__ import unicode_literals

The only real discrepancy I had to work around was the fact that io.StringIO wants a unicode object but gets a str object when using a bare print() call when I mocked out sys.stdout for testing purposes. All I had to do was change to print('') and that solved everything thanks to the unicode_literals import.

Supporting Python 2.6

If you have not switched passed Python 2.6, I feel bad for you. This was when I really had to scale back serious code in order to support an older version of Python. Having to drop set literals along with set and dict comprehensions sucked.

I also had to work around losing the richer assertion methods from unittest2. Instead of just adding another test dependency such as mock I just added appropriate hasattr() checks so that I didn't lose the much better failure output.

Finally, losing automatic field numbering in str.format() was annoying. All it required doing was going back and inserting a number, but it's one of those things that you think you have fixed, you run your tests, and then you discover that you actually just fail farther down your code. This is why having good test coverage is important.

I probably should have used a template to start from

Having not used setuptools before there was a little bit of learning and basic boilerplate to go through. To make my life easier I probably should have just used cookiecutter and cookiecutter-pypackage to bootstrap my setup.py and such and worked from there. Another option would have been starting from the PyPA sample project.

Doing the right thing for putting a project on PyPI

Since the whole Python packaging ecosystem is actively being fixed, I needed a refresher on how to properly debut a new project on PyPI. The Python packaging user guide has a section about it but there were a couple of bits I had to figure out (I have filed bugs to get the doc updated so mentioning this here should be moot at some point).

What kinds of files to put up

If you project is pure Python code, you definitely want the traditional sdist package, created by python setup.py sdist. But the new thing to do is to also create a wheel. This requires using setuptools in your setup.py and installing wheel. With that you can do python setup.py bdist_wheel and that will create for you a .whl file (which is basically a zip file with a very specific directory and file layout).

If your project happens to support both Python 2 and 3, then make sure to add a section to your setup.cfg declaring this fact as wheel files declare their version/API support:

[wheel]
universal = 1

Registering and uploading to PyPI

Once you have created the files you want to upload you will want to register your project name which is a one-time event that you should do through the web UI on PyPI (unfortunately python setup.py register doesn't guarantee the use of SSL so it will send your password insecurely).

Once all of that is done you will want to upload your project's files using twine. The reason to use Twine over the old stand by of e.g. python setup.py sdist upload is for three reasons:

It conveniently separates the upload step from the project file creation step
Twine makes sure to use a secure connection to PyPI
If you use GPG, it makes signing project files dead-simple (either use its -a argument or pass in .asc files as arguments along with the files you want to upload)

With all of this you end up with a project listed on PyPI that supports the old and new ways of hosting projects along with supporting both Python 2 and 3.