How to port 100,000 lines of Python 2 to Python 3
TLDR: Use python-future.
Last summer I led the conversion of a 100KLOC Python 2 web application to Python 3.
The application is called “Accelerate” - the backbone of operations at my employer, MassChallenge, a global startup accelerator. It handles every stage of a running accelerator program:
- account creation for entrepreneurs and experts (mentors)
- startup applications
- online and in-person judging of applications
- coordination of one-on-one meetings with mentors during the program
- generation of reports used by judges selecting cash award recipients
So, it’s a mission-critical app. With EOL looming for the versions we were using of both Django (1.11) and Python (2.7), we dove into the migration work about a year ago. Done in parallel with our usual work of maintenance, bugfixes, and enhancements, it took about three months.
After looking into How People Are Doing This, I settled on the python-future project and its futurize tool. It was a good fit for us because we did not want the disruption of a single Big Rewrite project. We couldn’t stop the world for a rewrite, and the more time long-running branches go, the more merge conflict hassles you will have. Futurize can get you to Python 3 by way of an intermediate state that still runs on Python 2.
This allowed us to do a couple big merges along the way, from the Python 3 branch into the main line. Much less disruptive and conflict-ridden than trying to do one big merge at the end.
First, we did what they call “Stage 1 conversion”, which tackles things that will break in Python 3 but can be easily converted to a 2/3 friendly form. As the docs say, “the goal for this stage is to create most of the diff for the entire porting process, but without introducing any bugs.”
So, after Stage 1 your application doesn’t work without issues under Python 3, but basic syntax and library issues are taken care of, and nothing is broken for Python 2.
Then we did Stage 2; the end result of that is “Python 3-style code that [also] runs on Python 2 with the help of the appropriate builtins and utilities in
The most interesting bit there is
builtins, which contains rewritten versions of 18 builtins that provide Python 3 semantics. These are:
ascii, bytes, chr, dict, filter, hex, input, int, map, next, oct, open, pow, range, round, str, super, zip
Then came Stage 3, the longest and most challenging (several weeks of work for me), summarized in a single sentence in the futurize docs:
After running futurize, we recommend first running your tests on Python 3 and making further code changes until they pass on Python 3.
Ah. A mere matter of programming.
A prerequisite for a successful effort of this type, in my book, is excellent test coverage. We were at about 95% line coverage at the beginning of this work. The beginning of Stage 3 is basically watching your test suite explode.
I had to touch about 25% of our 2400 unit tests to complete that work. Most of the issues were around string handling. A lot of our tests were checking for string (
str) values in Django
HttpResponse.content — which under Python 3 is a bytestring (
bytes). So, under Python 3 a lot of those tests just threw
TypeError. In almost all those cases, the fix was simply to use the Django test framework’s
assertContains(response, text) method, which reconciles
bytes pretty seamlessly.
After fixing up the test suite, we did exhaustive manual QA and caught a few things that needed fixing. A significant bit of the effort of Stage 3, which is surprisingly little discussed, is that dependencies can be a major pain point in this process. While most of our many dependencies worked fine with Python 3 when updated to their latest version, many did not.
We had to 1) find substitutes , 2) rework our application to let us drop the problematic dependency (my favorite), or 3) patch for compatibility,
Spreading the word
Python 2 is now officially EOL, but of course there is lots of Python 2 code running in production out there. I suspect the know-how for this kind of conversion will be relevant for many years. I gave talks on this at Django Boston last fall, and NERD Summit this spring.
Here’s a link to a video of the more recent of those two talks - the link skips to the end of my “five minute” intro that runs for nine minutes.
Footnote: After we completed the work I describe in this post, we moved on to the Django upgrade, settling on version 2.2 which was the newest LTS at the time. The Python 3 conversion taught us a lot about managing the Django upgrade, and it went very smoothly.