The history behind the decision to move Python to GitHub

I asked on Twitter if people would be interested in having me write down the history behind my decision to choose GitHub for Python's future development process and people said "yes"(some literally), hence this blog post.

Background

My heavy participation with python-dev started with me writing the Python-Dev Summaries, with the first edition authored by me coming out on September 1, 2002. Writing those summaries while I took a year off from school between my bachelors and masters degrees allowed me to really dive into Python's development process (I ended up authoring the Summaries for over two years in total). Not only did it lead me to knowing about everything going on with Python in great detail, but also led to me helping out since I noticed instantly when something needed to be done. This was beneficial as it meant I was constantly able to pick up tasks that grew in complexity as I became more and more comfortable contributing.

This led to two key things for me. One, I came to deeply appreciate the opportunity that open source projects in general -- and Python specifically -- offered people like me who didn't have tons of programming experience but who had the drive, energy, and demeanour to contribute and learn from those contributions. Two, it led to me caring a lot about how Python's development process worked so as to not only make it easy for someone to contribute for the first time and gain the excitement and experience of that first open source contribution, but also wanting to make it as easy as possible for a core developer to work with someone and eventually accept their contribution.

All of this led to me ending up in charge of moving Python's Subversion repository and issue tracker off of SourceForge and over to svn.python.org and using Roundup for bugs.python.org in October 2006 ("fun" fact: originally the issue tracker was going to use JIRA and have it hosted by Atlassian, but pressure from the community -- including Richard Stallman and the FSF -- led to us using Roundup instead). This migration impacted me in two ways. One is it established how I chose to handle decisions regarding competing solutions to an infrastructure problem. Essentially I would put out a call for proposals on how to solve the problem which helped remove any personal bias I had by not allowing me to make a proposal myself (in this case the proposals took the form of test instances of the proposed issue trackers). I then had a small group of people whose opinions I trusted provide feedback directly to me while also allowing anyone else who wanted to chime in to say something on the appropriate mailing list. I then took all of the feedback, my experience with the test instances, and my general thoughts on everything and came to a decision. This is the same approach I took with migrating Python from svn.python.org to hg.python.org (that decision was made in 2009 and the switch occurred in 2011).

The second lesson I learned was about the dedication of volunteers. When the decision was made to switch to JIRA, one of the key attractions of that platform was that Atlassian was going to be hosting our instance and providing direct support (they were very involved in their proposal). But when the community started to protest over the idea of a closed-source, Java application I publicly said if we could get enough volunteers to manage our own Roundup instance then I would relent to using Roundup. There was actually a decent number of people who stepped forward, so we switched (the FSF offered to help put the call out for volunteers but in the end I didn't take them up on the offer as my own personal call for help at the time seemed to bring enough volunteers forward). But what ended up happening is nearly none of those volunteers stuck around. At this point we have Ezio Melotti and R. David Murray to thank -- both core developers -- for keeping our issue tracker up an running all these years and Upfront Systems for hosting it. That experience taught me that the people you can really count on are those that put the effort into the proposals themselves and those with a proven track record. While people who come out of nowhere have good intentions, that doesn't guarantee they will actually follow through (which I honestly should have known based on my experience from the Python core sprints at PyCon US where people used to regularly come to tackle a big problem, get part way to a solution, swear they will finish when they get home, and then never be heard from again).

What started it all

While my Ph.D. in computer science technically doesn't have a focus attached to it, unofficially it's software engineering. Having been part of the software practices lab at the University of British Columbia meant I was regularly surrounded by people trying to figure out ways to make software development better. This meant that I have perpetually had an interest in software development practices. It also means that I'm acutely aware of when Python's own development process starts to trail current best practices or becomes burdensome.

By 2014 it had become obvious to some of us that the Python development process had in fact become a burden. The rate at which patches were being submitted was much greater than the rate at which they were being reviewed. This was leading to external contributors getting frustrated because they would put in the effort to write a patch but would occasionally end up with waiting years for a review from a core developer. And since the Python community is so friendly, people were being very polite and just waiting for someone to notice that their patch had been sitting there instead of asking if someone could look at it (we have subsequently told people to email python-dev if their patch sits for too long on the issue tracker). For an open source project, this is a bad situation to be in because if external contributors stop participating then the project slowly dies with the core developers as they stop participating and they are not replaced because there is no one to replace them with.

While a few of us openly lamented about the problem, Nick Coghlan decided to publicly acknowledge it by creating PEP 474 in July 2014. In that document, Nick proposed moving to Kallithea to host our Mercurial code and to provide us with code review integration (the latter was desired because bugs.python.org uses a custom fork of Rietveld that was not being actively maintained). The PEP didn't go anywhere, though, as Nick had a ton of other projects to work on and he didn't have pre-existing buy-in on changing the development process.

Fast-forward to November 2014, and Nick decided to try and move things forward yet again by asking if we should move ancillary repositories to Bitbucket (for Python, the ancillary repositories are for things like the PEPs, benchmarks, etc.; basically all the repositories that are not the cpython repository and are contributed to by more than one person). In the ensuing discussion, Guido countered Nick's idea with moving to GitHub as had been proposed by Donald Stufft earlier in that same discussion. To help force a decision, Guido pointed out that 2 of the 3 repositories being discussed were primarily managed by me and thus I should make a call and he was simply going to wait until I did (this whole discussion was reported on by LWN).

Having the decision squarely on my shoulders (I could have said no, but when it comes to Python I usually don't, as my wife will attest to), I decided to seize on the opportunity and use it as my way to try and modernize Python's development process. I wrote a vision document in December 2014 where I spelled out exactly what my ideal development process would be for Python itself; I thought worrying about only the ancillary repositories and punting on the key repository everyone truly cared about was simply ignoring the real problem at hand. I said that what I wanted was a development process that was as simple as possible for core developers to use to review external contributions. This meant things like automated testing for patches, code coverage, patch merging through the browser, etc. My pithy summary of what I wanted was the ability to review an external contribution -- from submission to commit -- all on a tablet while at a beach with WiFi (which I actually have in Vancouver so this wasn't entirely a silly request). My thinking was that if we got the process to be that simple, core developers could do a review at lunch time while at work, or when they had some down time at home without having to be on some special machine that had their SSH keys installed on it. Basically I wanted the development process to become so streamlined that doing an external contribution review was something one did to relax (that is definitely not the case now; it's a hassle at the moment and not something I at least do for fun, but more out of a feeling of obligation).

The process

The decision process started with me asking for draft PEPs by February 2015. The PEPs were to initially act as a way for me to know who was proposing what and to know who was going to be willing to put in the effort to help. By the deadline I had Nick's PEP 474 and PEP 481 by Donald Stufft proposing GitHub (and optionally Phabricator, but in the end that was dropped). I then wanted final PEPs by PyCon US 2015 in April. My hope was to make a decision by May 1.

But May 1 came and went with no decision. What happened was I ended up job hunting after PyCon US, leaving Google, moving across Canada to live in Vancouver again, and joining the Python team at Microsoft. Then with the typical time sink that is moving into a new place, re-establishing oneself in a "new" town (having done my Ph.D. in Vancouver and my wife born and raised, it was just re-adjusting to being back in Vancouver), and starting a new job, I didn't get back around to working towards a decision until September. Thanks to my new job I was able to start putting in some work time on this, so that's when I asked for test instances of the cpython repository for the two proposed platforms by October 31, 2015 -- Halloween in North America -- with a goal of me making a decision by January 1, 2016. What quickly happened, though, is Barry Warsaw asked if he could toss GitLab in for consideration and Nick Coghlan subsequently saying he would rather back GitLab over Kallithea (due to maturity issues of the projects). With Donald having no issues with the switch, Nick technically only tweaking his pre-existing proposal, and Barry having a track record of following through, I allowed the pivot of Nick's proposal.

I spent November 2015 playing with the test instances, as did various other people who provided feedback to me either privately or on the core-workflow mailing list. The first three weeks of December were spent addressing questions people had about the two approaches. And then it was January 1, 2016.

The decision

On New Years Day, I announced that I had chosen GitHub over GitLab. There were multiple reasons as to why I made that decision. One was that GitHub has basically built a social network of open source contributors. That led to various core developers telling me that they were comfortable with GitHub already and they were hoping it would win. It also means that there is more tooling already available for use with GitHub which ties into the goal of automating the development process as much as possible while cutting back on the infrastructure maintained for the Python development team.

Two, there was no killer feature that GitLab had. Now some would argue that the fact GitLab is open source is its killer feature. But to me, the development process is more important than worrying whether a cloud-based service publishes its source code. If someone worries to that extent than they shouldn't e.g., have a Gmail address that they use when developing open source. This is especially not a worry as GitHub is not a walled garden; its extensive SDK allows for downloading any and all data that goes into the platform. Since GitHub is going to be used for repository hosting and code reviews, that means we just need to back up the code review data since Git is a distributed version control system and thus carries its entire history around with it in every (full) clone.

Lastly, our BDFL prefers GitHub. Since the beginning of this whole decision process, Guido let it be known that he thought GitHub was the best choice. For me, that means something as I want to make sure that Guido feels comfortable and not frustrated contributing to his own programming language. Now, Guido would be the first person to tell you that his frequency of contributions is low enough that his opinion shouldn't go beyond that of any other infrequent core developer. But for me, I wanted to make sure that Guido's engagement stays as high as possible, so Guido's preference mattered to me.

What's next

With the decision made, discussions have started on the core-workflow mailing list to work out how we are exactly going to manage this migration. I'm currently writing a PEP that will outline all of the steps necessary to migrate each of the repositories we wish to move over. We will then begin to tackle each of the steps and slowly migrate over the various repositories to GitHub. I'm hoping we can complete the migration this year, but it may spill into 2017 (this is being driven by volunteers and there are a lot of people with varying opinions, so this is simply not going to happen quickly).

Regardless of how long it takes, I'm optimistic it will pay off in the end. I have some ideas on how to try and leverage the benefits we get from GitHub in getting our development process working smoothly enough that we could have something like an SLA with the Python community about how long we are willing to take in addressing an external contribution. If we can manage to become a responsive project again to external contributions, then all of this will have been worth it.