E-Scribe : a programmer’s blog

About Me

PBX I'm Paul Bissex. I build web applications using open source software, especially Django. Started my career doing graphic design for newspapers and magazines in the '90s. Then wrote tech commentary and reviews for Wired, Salon, Chicago Tribune, and others you never heard of. Then I built operations software at a photography school. Then I helped big media serve 40 million pages a day. Then I worked on a translation services API doing millions of dollars of business. Now I'm building the core platform of a global startup accelerator. Feel free to email me.

Book

I co-wrote "Python Web Development with Django". It was the first book to cover the long-awaited Django 1.0. Published by Addison-Wesley and still in print!

Colophon

Built using Django, served with gunicorn and nginx. The database is SQLite. Hosted on a FreeBSD VPS at Johncompanies.com. Comment-spam protection by Akismet.

Elsewhere

Pile o'Tags

Stuff I Use

Bitbucket, Debian Linux, Django, Emacs, FreeBSD, Git, jQuery, LaunchBar, macOS, Markdown, Mercurial, Python, S3, SQLite, Sublime Text, xmonad

Spam Report

At least 237138 pieces of comment spam killed since 2008, mostly via Akismet.

How to port 100,000 lines of Python 2 to Python 3

TLDR: Use python-future.

The Project

Last summer I led the conversion of a 100KLOC Python 2 web application to Python 3.

The application is called "Accelerate" - the backbone of operations at my employer, MassChallenge, a global startup accelerator. It handles every stage of a running accelerator program:

So, it's a mission-critical app. With EOL looming for the versions we were using of both Django (1.11) and Python (2.7), we dove into the migration work about a year ago. Done in parallel with our usual work of maintenance, bugfixes, and enhancements, it took about three months.

After looking into How People Are Doing This, I settled on the python-future project and its futurize tool. It was a good fit for us because we did not want the disruption of a single Big Rewrite project. We couldn't stop the world for a rewrite, and the more time long-running branches go, the more merge conflict hassles you will have. Futurize can get you to Python 3 by way of an intermediate state that still runs on Python 2.

This allowed us to do a couple big merges along the way, from the Python 3 branch into the main line. Much less disruptive and conflict-ridden than trying to do one big merge at the end.

Stage 1

First, we did what they call "Stage 1 conversion", which tackles things that will break in Python 3 but can be easily converted to a 2/3 friendly form. As the docs say, “the goal for this stage is to create most of the diff for the entire porting process, but without introducing any bugs.”

So, after Stage 1 your application doesn't work without issues undr Python 3, but basic syntax and library issues are taken care of, and nothing is broken for Python 2.

Stage 2

Then we did Stage 2; the end result of that is “Python 3-style code that [also] runs on Python 2 with the help of the appropriate builtins and utilities in future.”

The most interesting bit there is builtins, which contains rewritten versions of 18 builtins that provide Python 3 semantics. These are: ascii, bytes, chr, dict, filter, hex, input, int, map, next, oct, open, pow, range, round, str, super, zip

Stage 3

Then came Stage 3, the longest and most challenging (several weeks of work for me), summarized in a single sentence in the futurize docs:

After running futurize, we recommend first running your tests on Python 3 and making further code changes until they pass on Python 3.

Ah. A mere matter of programming.

Fixing tests

A prerequisite for a successful effort of this type, in my book, is excellent test coverage. We were at about 95% line coverage at the beginning of this work. The beginning of stage 3 is basically watching your test suite explode.

I had to touch about 25% of our 2400 unit tests to complete that work. Most of the issues were around string handling. A lot of our tests were checking for string (str) values in Django HttpResponse.content — which under Python 3 is a bytestring (bytes). So, under Python 3 a lot of those tests just threw TypeError. In almost all those cases, the fix was simply to use the Django test framework's assertContains(response, text) method, which reconciles str and bytes pretty seamlessly.

Dependencies

After fixing up the test suite, we did exhaustive manual QA and caught a few things that needed fixing. A significant bit of the effort of Stage 3, which is surprisingly little discussed, is that dependencies can be a major pain point in this process. While most of our many dependencies worked fine with Python 3 when updated to their latest version, many did not.

We had to 1) find substitutes , 2) rework our application to let us drop the problematic dependency (my favorite), or 3) patch for compatibility,


Spreading the word

Python 2 is now officially EOL, but of course there is lots of Python 2 code running in production out there. I suspect the know-how for this kind of conversion will be relevant for many years. I gave talks on this at Django Boston last fall, and NERD Summit this spring.

The thumbnail here links to the video recording of the more recent of those two talks. (My "five minute" intro extends to 9:10; skip to there if you want to just dive into the meat of the talk.)

Footnote: After we completed the work I describe in this post, we moved on to the Django upgrade, settling on version 2.2 which was the newest LTS at the time. The Python 3 conversion taught us a lot about managing the Django upgrade, and it went very smoothly.


Wednesday, April 15th, 2020
+ + +

2 comments pending approval

Post a comment

Thanks for reading! Please note: Your comment will not appear until approved, which may take a few hours or more. Spammers will be torpedoed.


(Will not be shared)

(Optional)