Skip to content

Instantly share code, notes, and snippets.

@brettcannon
Created November 21, 2014 19:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brettcannon/91ff264ae549315706f6 to your computer and use it in GitHub Desktop.
Save brettcannon/91ff264ae549315706f6 to your computer and use it in GitHub Desktop.
Rewrite of porting HOWTO for Python 2/3 source-compatibility

Porting Python 2 Code to Python 3

author

Brett Cannon

Abstract

With Python 3 being the future of Python while Python 2 is still in active use, it is good to have your project available for both major releases of Python. This guide is meant to help you figure out how best to support both Python 2 & 3 simultaneously.

If you are looking to port an extension module instead of pure Python code, please see cporting-howto.

If you would like to read one core Python developer's take on why Python 3 came into existence, you can read Nick Coghlan's Python 3 Q & A.

For help with porting, you can email the python-porting mailing list with questions.

The Short Explanation

To make your project be single-source Python 2/3 compatible, the basic steps are:

  1. Update your code to drop support for Python 2.5 or older (supporting only Python 2.7 is ideal)
  2. Make sure you have good test coverage (coverage.py can help)
  3. Learn the differences between Python 2 & 3
  4. Use Modernize or Futurize to update your code
  5. Use Pylint to help make sure you don't regress on your Python 3 support
  6. Use caniusepython3 to find out which of your dependencies are blocking your use of Python 3
  7. Once your dependencies are no longer blocking you, use constant integration to make sure you stay compatible with Python 2 & 3 (tox can help test against multiple versions of Python)

If you are dropping support for Python 2 entirely, then after you learn the differences between Python 2 & 3 you can run 2to3 over your code and skip the rest of the steps outlined above.

Details

A key point about supporting Python 2 & 3 simultaneously is that you can start today! Even if your dependencies are not supporting Python 3 yet that does not mean you can't modernize your code now to support Python 3. Most changes required to support Python 3 lead to cleaner code supporting newer practices which will lead to a better code base overall even in a Python 2 context.

Another key point is that modernizing your Python 2 code to also support Python 3 is largely automated for you. While you might have to make some API decisions thanks to Python 3 clarifying text data versus binary data, the lower-level work is now mostly done for you.

Keep those key points in mind while you read on about the details of porting your code to support Python 2 & 3 simultaneously.

Drop support for Python 2.5 and older (at least)

While you can make Python 2.5 work with Python 3, it is much easier if you only have to work with Python 2.6 or newer (and easier still if you only have to work with Python 2.7). If dropping Python 2.5 is not an option then the six project can help you support Python 2.5 & 3 simultaneously. Do realize, though, that nearly all the projects listed in this HOWTO will not be available to you.

If you are able to only support Python 2.6 or newer, then the required changes to your code should continue to look and feel like idiomatic Python code. At worst you will have to use a function instead of a method in some instances or have to import a function instead of using a built-in one, but otherwise the overall transformation should not feel foreign to you.

Make sure you specify the proper version support in your setup.py file

In your setup.py file you should have the proper trove classifier specifying what versions of Python you support. As your project does not support Python 3 yet you should at least have Programming Language :: Python :: 2 :: Only specified. Ideally you should also specify each major/minor version of Python that you do support, e.g. Programming Language :: Python :: 2.7.

Have good test coverage using coverage.py

Once you have your code supporting the oldest version of Python 2 you want it to, you will want to make sure your test suite has good coverage. A good rule of thumb is that if you want to be confident enough in your test suite that any failures that appear after having tools rewrite your code are actual bugs in the tools and not in your code. If you want a number to aim for, try to get over 80% coverage (and don't feel bad if you can't easily get past 90%). If you don't already have a tool to measure test coverage then coverage.py is recommended.

Learn the differences between Python 2 & 3

Once you have your code well-tested you are ready to begin porting your code to Python 3! But to fully understand how your code is going to change and what you want to look out for while you code, you will want to learn what changes Python 3 makes in terms of Python 2. Typically the two best ways of doing that is reading the "What's New" doc for each release of Python 3 and the Porting to Python 3 book (which is free online).

Update your code using Modernize or Futurize

Once you feel like you know what is different in Python 3 compared to Python 2, it's time to update your code! You have a choice between two tools in porting your code automatically: Modernize and Futurize. Which tool you choose will depend on how much like Python 3 you want your code to be. Futurize does its best to make Python 3 idioms and practices exist in Python 2, e.g. backporting the bytes type from Python 3 so that you have semantic parity between the major versions of Python. Modernize, on the other hand, is more conservative and targets a Python 2/3 subset of Python, relying on six to help provide compatibility.

Regardless of which tool you choose, they will update your code to run under Python 3 while staying compatible with the version of Python 2 you started with. The resulting code should run under Python 2 with equivalent semantics compared to before the transformation. Depending on how conservative you want to be, you may want to run the tool over your test suite first and visually inspect the diff to make sure the transformation is accurate. After you have transformed your test suite and verified that all the tests still pass as expected, then you can transform your application code knowing that any tests which fail is a translation failure.

Unfortunately the tools can't automate everything for working under Python 3 and so there are a handful of things you will need to update manually to get full Python 3 support (which of these steps are necessary vary between the tools). Luckily there are only a couple of things to watch out for and to update manually.

Division

In Python 3, 5 / 2 == 2.5 and not 2. All division between int values result in a float. This change has actually been planned since Python 2.2 which was released in 2002. Since then users have been encouraged to add from __future__ import division to any and all files which use the / and // operators or to be running the interpreter with the -Q flag. If you have not been doing this then you will need to go through your code and do two things:

  1. Add from __future__ import division to your files
  2. Update any division operator as necessary to either use // to use floor division or leave using / and expect a float

The reason that / isn't simply translated to // automatically is that if an object defines its own __div__ method but not __floordiv__ then your code would begin to fail.

Text versus binary data

In Python 2 you could use the str type for both text and binary data. Unfortunately this confluence of two different concepts could lead to brittle code which sometimes worked for either kind of data, sometimes not. It also could lead to confusing APIs if people didn't explicitly state that something that accepted str accepted either text or binary data instea of one specific type. This complicated the situation especially for anyone supporting multiple languages as APIs wouldn't bother explicitly supporting unicode which is needed for many languages which are not Latin-1 compatible.

To make the distinction between text and binary data clearer and more pronounced, Python 3 did what most languages created in the age of the internet have done and made text and binary data distinct types that cannot blindly be mixed together. For any code that only deals with text or only binary data, this separation doesn't pose an issue. But for code that has to deal with both, it does mean you might have to now care about when you are using text compared to binary data, which is why this cannot be entirely automated.

To start, you will need to decide which APIs take text and which take binary (it is highly recommended you don't design APIs that can take both due to the difficulty of keeping the code working; as stated earlier it is difficult to do well). In Python 2 this means making sure the APIs that take text can work with unicode and those that work with binary data work with the bytes type from Python 3 and thus a subset of str in Python 2 (which the bytes type in Python 2 is an alias for). Usually the biggest issue is realizing which methods exist for which types in Python 2 & 3 (for text that's unicode in Python 2 and str in Python 3, for binary that's str/bytes in Python 2 and bytes in Python 3). The following table lists the unique methods of each data type across Python 2 & 3.

Text data Binary data
decode

-------------------encode


-------------------format


-------------------isdecimal


-------------------isnumeric


Making the distinction easier to handle can be accomplished by encoding and decoding between binary data and text at the edge of your code. This means when you receive binary data you know to be text then immediately decode it. And if your code needs to send text as binary data then encode it as late as possible. This allows your code to work with only text internally and thus eliminates having to keep track of what type of data you are working with.

The next issue is making sure you know whether the string literals in your code represent text or binary data. At minimum you should add a b prefix to any literal that presents binary data. For text you should either use the from __future__ import unicode_literals statement or add a u prefix to the text literal.

Finally, the indexing of binary data requires careful handling (slicing does not require any special handling). In Python 2, b'123'[1] == b'2' while in Python 3 b'123'[1] == 50. Because binary data is simply a collection of binary numbers, Python 3 returns the integer value for the byte you index on. But in Python 2 because bytes == str, indexing returns a one-item slice of bytes. This means you will need to choose how you want to normalize this. If you want the Python 3 approach, wrap all bytes indexing in ord() calls: ord(b'123'[1]) == 50. For the Python 2 approach use bytes(): bytes(b'123'[1]) == b'1'. The six project has a function named six.indexbytes() if you prefer the Python 3 approach: six.indexbytes(b'123', 1).

To summarize:

  1. Decide which of your APIs take text and which take binary data
  2. Make your code that works with text work with unicode and binary data work with bytes in Python 2 (see the table above for what methods you cannot use for each type)
  3. Decode binary data to text as soon as possible, encode text as binary data as late as possible
  4. Be careful when indexing binary data

Prevent compatibility regressions using Pylint

Once you have fully translated your code to be compatible with Python 3, you will want to make sure your code doesn't regress and stop working under Python 3. This is especially true if you have a dependency which is blocking you from actually running under Python 3.

You can use the Pylint project and its --py3k flag to lint your code to receive warnings when your code begins to deviate from Python 3 compatibility. This also prevents you from having to run Modernize or Futurize over your code regularly to catch compatibility regressions.

To also help with staying compatible, any new modules you create should have at least the following block of code at the top of it:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_statement
from __future__ import unicode_literals

Check which dependencies block your transition with caniusepython3

After you have made your code compatible with Python 3 you should begin to care about whether your dependencies have also been ported. The caniusepython3 project was created to help you determine which projects -- directly or indirectly -- are blocking you from supporting Python 3. There is both a command-line tool as well as a web interface at https://caniusepython3.com .

The project also provides code which you can integrate into your test suite so that you will have a failing test when you no longer have dependencies blocking you from using Python 3. This allows you to avoid having to manually check your dependencies and to be notified quickly when you can start running on Python 3.

Update your setup.py file to denote Python 3 compatibility

Once your code works under Python 3, you should update the classifiers in your setup.py to contain Programming Language :: Python :: 3 and to not specify sole Python 2 support. This will tell anyone using your code that you support Python 2 and 3. Ideally you will also want to add classifiers for each major/minor version of Python you now support.

Use continuous integration to stay compatible with tox

Once you are able to fully run under Python 3 you will want to make sure your code always works under both Python 2 & 3. Probably the best tool for running your tests under multiple Python interpreters is tox. You can then integrate tox with your continuous integration system so that you never accidentally break Python 2 or 3 support.

And that's it! At this point your code base is compatible with both Python 2 and 3 simultaneously. Your testing will also be set up so that you don't accidentally break Python 2 or 3 compatibility regardless of which version you typically run your tests under while developing.

Dropping Python 2 support completely

If you are able to fully drop support for Python 2, then the steps required to transition to Python 3 simplify greatly.

  1. Update your code to only support Python 2.7
  2. Make sure you have good test coverage (coverage.py can help)
  3. Learn the differences between Python 2 & 3
  4. Use 2to3 to rewrite your code to run only under Python 3

After this your code will be fully Python 3 compliant but in a way that is not supported by Python 2. You should also update the classifiers in your setup.py to contain Programming Language :: Python :: 3 :: Only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment