Skip to content

Instantly share code, notes, and snippets.

@yyyyyyyan
Created December 13, 2018 17:17
Show Gist options
  • Save yyyyyyyan/863e1d5b1392c6553397d856d3ecd7c4 to your computer and use it in GitHub Desktop.
Save yyyyyyyan/863e1d5b1392c6553397d856d3ecd7c4 to your computer and use it in GitHub Desktop.

Before starting with Python, there's one question that always comes to mind - which language version should I learn? Perhaps a more difficult question than this one what comes from who already programs in Python - should I create my programs based on which version of Python?

Dealing with some other programming languages, this question may seem a bit unusual, because what works in one version also works on the next version. The point is that the two major versions of Python (2 and 3, the only ones in use) have crucial differences. Python 3 is not backward compatible, and this ends up bringing some confusion and doubts to us developers.

But what does this "backward incompatibility" mean? Basically, it indicates that code written in Python 2 might not work running in the Python 3 interpreter. Python 3 was released in late 2008, just over 8 years after Python 2, and came with clear intentions - to rectify the design flaws that the language brought.

Knowing this, what do we do? We now have a dilemma, what do we learn? And yet, what do we use? In this post we will try to debate this, presenting explanatory points of the differences between the versions and arguing for a more pythonic code.

Main differences between Python 2 and Python 3

First of all, we need to really understand what changes from one version to another. Thus, I divided this bit into categories, to make it easier to understand.

Subtle changes

The first differences between the two versions in use of Python are simple and at the same time, very troublesome.

The first example is the classic confusion already discussed by us, the differences between the functions raw_input() and input()', which, to a developer unfamiliar with the version changes, it may even cause a serious security problem.

In addition to this, we have the print() and file cases. In Python 2, print is not a function, but a statement. So, we could use it that way, without the parentheses:

https://gist.github.com/18d76c80015bf8f373ab8f2d9f2fc135

And everything would work fine! In Python 3, however, print() turns in fact into a function, requiring the parentheses to be called. Let's see what happens when we try to use print as we used it in Python 2:

image alt text

An exception indicating the missing parentheses! Fortunately, this question is very simple to solve, because statements also work with parentheses. So basically, we can do this:

https://gist.github.com/5b1ff7ef7a8d56e9dade0d43bfbee913

This is valid in both Python 2 and Python 3.

Last but not least, we have the file constructor, known in Python 2 by its use to open any files. In Python 3, this constructor was simply removed.The recommendation is, therefore, to use the open() function, independent of the Python version, due to compatibility issues.

Lazy Evaluation

We’ve already seen what Python iterators are and how useful they can be when compared to lists. Because of that, Python developers and its own community started to give more support and preference to iterators instead of lists.

Some examples have already been seen when we talked about functional programming in Python, such as the alterations in the functions map() and filter() to return iterators instead of lists. But what else?

Another interesting example that may cause confusion in older developers using one of the two versions is the function range().

In Python 2, the function range() generates an arithmetic progression as a list. So, it has its limits caused by the computer memory. Let’s test it:

image alt text

Besides it, we also had the xrange() class that did the same but calculating only a particular value at time, hence, a lazy evaluation.

In Python 3, xrange() was renamed to range() and the original behavior of range() was lost (it can be easily simulated by using list(range())):

image alt text

Thus, using list methods such as range() or risking memory security can cause code portability issues for each version.

Comparison Methods

The classic comparison, that was already going through changes in Python since the 2.1 version with the implementation of rich comparison, has its style delimited in Python 3, with the removal of the built-in function cmp() and the special method __cmp__().

Those old features were based on the traditional behavior of comparison that we see in the majority of programming languages such as Java, where there are three types of return values - -1, 0 and 1, referring to objects smaller than, equal to or greater than other objects, respectively.

Regarding rich comparison, we also saw another important change in Python 3. The documentation for the rich comparison method in Python 2 makes it clear that no operator is intrinsically linked to another. So, == is not the opposite of !=:

image alt text

However, this changed in Python 3 and the operator != is the opposite of ==. Thus:

image alt text

Regarding comparison, there’s another significative change in Python 3. It solved a weird behavior present in Python 2.

In the second version, all objects were ordinally comparable to each other. Yes, all of them. This means that comparing if a string is bigger than an integer or if a float is smaller than a list would work and would return True or False, depending on the test.

The logic behind it wasn’t random but this doesn’t mean that it was justifiable since it gives margin to mistakes made by the developer to go unnoticed.

The first rule of this type of comparison is that any instance of a class defined by the user was always smaller than any built-in object. We can easily test this in the Python interpreter:

image alt text

The second rule is that numeric types were always smaller than other objects. Let’s test this:

image alt text

The last and main rule is that with other types of objects (neither created by the user nor numeric) the comparison is made through alphabetic order based on the name of the types. Thus:

image alt text

The result was True because this test is the equivalent of a comparison between two strings, so:

image alt text

Because this causes unexpected results in the code, this behavior was removed from Python 3. Now, trying to compare two ordinal objects that doesn’t follow a natural order returns a TypeError exception.

Numbers

Some structural changes regarding numeric types can be seen in the new version. The first one is related to the types themselves, such as the removal of the long type in Python 3.

In Python 2, the int type was limited, meaning that it had a maximum value. Beyond this value, it becomes another type - long, represented by a L after the numeric digits:

image alt text

In Python 3 this difference doesn’t exist anymore, unifying the two types into one - the int type.

Thus, int has lost its limit and a larger number isn’t identified by the suffix L.

Division of integers

Regarding numbers we still have a fundamental change to understand because it can cause serious problems for the code to function. This change happened more specifically in the integer division.

In Python 2, all divisions between two integers resulted in a third integer:

IMAGEM

In Python 3, however, this behavior was highly modified - the divisions between two integers now results in a float number:

IMAGEM

In the two Python versions we still have an operator // that follows the division behavior of Python 2:

IMAGEM

Thus, it’s recommended to follow a pattern that works similarly in both versions, either using the operator // that results in an int, or converting one of the integers to float, resulting in a float:

IMAGEM

Text Types

One of the most important changes featured in Python third version is regarding the available types to store "texts".

"But why text in quotes?"

Let's get this straight!

In Python 2, we have two main types where to store text. The built-in string (str) and unicode, identified by an u before quotes:

IMAGEM

All this means two fundamental things: first,** string doesn’t support non-ASCI**I** (unicod**e) characters; second, there isn’t a proper type to store bytes.

Because of that, in our community, we say that string stores bytes while unicode indeed stores text.

Believing that string works with text can bring problems to the functioning of the simplest application such as printing an accentuated text:

https://gist.github.com/fe3362f4976eaf110e30c23ebcf2746e

Look what happens when we open this file in the Python 2 interpreter:

https://gist.github.com/7f7c124fc361ae2bcf519b1e077bba81

An exception indicating that no encoding was defined. Which means that we need to define a standard to codify the patterns in the coding containing non-ASCII characters.

In Python 3, things changed a lot. The unicode type becomes the string type (renamed to str) and the built-in type that functions as the string doesn’t exist anymore. This allows some interesting things to happen such as the use of unicode characters as identifiers names:

IMAGEM

Note that the use of non-ASCII characters as identifiers names, although an interesting and sometimes fun, isn’t recommended if we want to share our code since there’s a risk of incompatibility with other systems and environments.

Besides that, another specific type was added to handle bytes, the bytes class. It’s represented by the prefix b:

IMAGEM

For a developer who doesn’t know about this new class bytes this can cause some problems such as the identification of a file type.

Taking GIF files as an example, that (almost) always starts with GIF89a, let’s see how an GIF identification code that works in Python 2 looks like. We’re going to open the file as byte reading file and check it from the start:

https://gist.github.com/fd8e7de4b7b6e6e12ae23b573939f6e5

When we run the code in the Python 2 interpreter:

https://gist.github.com/24b5cf036784f10b68d0445c565eddca

Now look what happens when we run it in Python 3:

https://gist.github.com/ad2ed0ceec95e186ff7e6e9123e830b2

Wow! There’s a clear difference between how Python 3 handles bytes with the new class. What happens is that it compares the identifier, that is a byte type with a string - because of the reading data - and returns False. Look what happens when the string is treated as a byte:

https://gist.github.com/69a2d8a65f1c26fb34c3d9c69b99a4a6

And now:

https://gist.github.com/0566c95ec083362535599ed5e84bfdd6

Syntax

Besides the structural and logical changes of the code, we also had some significative change between the syntax of Python 2 to Python 3. Some things were added, others removed. Let's focus on what was altered.

Exception Handling

Some of the syntax related to the handling of exceptions, more specifically raise and except, was altered. In Python 2, in order to display an exception with a determined message, we can use the following syntax:

IMAGEM

Look what happened when we tried to use the same syntax in Python 3:

IMAGEM

We do have an exception message but it doesn’t have anything to do with what we expected! It was a syntax error because this way of using raise isn’t allowed in Python 3. Instead, we have to treat the exception as a function and the error message as a parameter:

IMAGEM

Now it works. Fortunately, this syntax works both in Python 3 and Python 2, so it’s recommended to always use it, independent of the version we are for coding so as to increase the compatibility of our code.

About except, the change was also subtle. In Python 2, we can use the following syntax in an try/except block:

https://gist.github.com/56ed8df7e68aaff93b3d5b14e01a7cf4

When we run this code in the Python 2 interpreter:

https://gist.github.com/350cbd08719129ff2ccbc5048b735e5d

Right! But what if we run it in Python 3, check what happens:

https://gist.github.com/68eb93e48a015048afc71a45cdd480b5

For the code to work in Python 3, we have to use the keyword as:

https://gist.github.com/bd4680ea827c5ba61765f7d673718d80

Now:

https://gist.github.com/ac73425ed0e3c68d26cd2b7d936e6bce

Right! This code, as well as with raise, works either in Python 3 as in Python 2, which is great for us developers.

Indentation

Python, by standard, is very strict when it comes to the indentation of our code. We don’t use identifiers for the start and end of code blocks, such as bracket keys ({}), but we identify them through their own indentation.

This detail is advantageous because it stimulates us to use good design and indentation code practices. This is a mandatory syntax rule but there are also other recommendations regarding this specified in our Style Guide for Python Code, the PEP-8.

PEP-8 states that 4 spaces are preferable instead of a TAB for indentation. In the end, it’s at the discretion of the developer. But above all, this guide teaches us that the best code is a consistent code. Meaning that is better to use just TABS for indentation than to change from TABs to spaces.

However, in Python 3, there isn’t a restriction for this. In the same code block, we can alternate between TABs and spaces. In Python 3, this changed and such inconsistency returns an exception:

https://gist.github.com/9c3b3ac4a7ebc3297f2cd197cda661ce

Those are some of the differences between the two versions that can confuse a developer who is unaware.

If you want to check the official list, Python presents the change of each version in its documentation pages. And now, what?

Which version to learn? (And yet, which version to teach?)

For someone just starting with Python, which version should they learn? In which version should they focus?

It’s known that Python third version is an advance for the language, since it was developed to improve the previous version. Of course there are arguments that favor Python 2 since some of its characteristics attended better some necessities. But it’s still hard to deny the improvement brought by Python 3.

Also, we can still see Python 3 as a Python 2 fix. So, it’s recommended for beginners that they learn Python 3 so they won’t be confused by the mistakes of the previous version. It avoids the trouble of learning something considered wrong and outdated.

But it’s a fact that codes in Python 2 still exist and will continue to exist for a long time, so it’s important for a Python developer to know the differences between the two versions and also know how to deal with both just in case.

Now that we have some knowledge about Python, which version should I choose to work with?

Which version to work with?

For coding, we have to be realistic and consider the market and Python use at companies and workplaces. A 2014 research intended for Python developers can help us understand which prevails, or at least did four years ago.

It’s recommended that the latest version is prioritized, if possible. So, if you’re starting a new personal project, give preference to programming in Python 3.

Starting a new project in Python 3 avoids possible problems showed in the previous version that have been fixed. Also, it gives more support for your progress, since it’s been said that, starting from 2020, Python 2 will be discontinued, losing a part of its support.

It’s also recommended the migration of the code done in Python 2 to the third version. This is what big important projects, initially developed in the second version, are doing such as Requests and BeautifulSoup.

We know that migrating may not be so simple because of technical and bureaucratic aspects. The company might not always give the necessary support which can cause some problems at its beginning.

So the maintenance of codes in Python 2 ends up being almost mandatory and it can be advantageous for those who are already familiar.

Either way, we want our codes to be compatible with most part of the systems, reaching a larger audience.

The ideal would be if we could develop a code that worked in both Python 2 and Python 3, running successfully in either interpreter. Can we do that?

Making your program compatible with both versions

Before we learn how to increase the availability of our codes, we need to analyze to whom we want to do that. Because this process can become very complex each time since more and more details, systems and versions are implemented.

Python's own documentation recommends to not rely on the support for any version previous of the 2.7 version, if possible. This facilitates code portability.

If this isn’t an option, we have some projects in the community, such as six, that can help us maintain a compatible code.

With this in mind, we will be working considering the support for Python after the 2.7 version.

Some tools, techniques, and practices can help us during the portability of our code. Let’s understand and test some of them so we can decide the best options.

Writing code compatible with both versions

It’s still perfectly possible to write a working code for both version most of times because it’s not a total change.

To do that we just need to be careful with some practices, remembering to use, if possible, functions or syntaxes that work in Python 2 and Python 3. A good example is print, which in Python 2 can also be used with a parenthesis.

The problem is that, in some cases, the problems are really incompatible, such as input() and raw_input(). How can we deal with that?

An approach is to check which version of Python is running. We can do that through the attribute version_info.major from the sys module:

https://gist.github.com/ba10a30b82d205b85f832a6919e185fb

Apparently, this code works. But what if Python 4 is released? The version won’t be equal to the third one and the program will try to use raw_input(), which probably won’t exist. It’s better to base our checking on the Python 2 version:

https://gist.github.com/db09121e5caba583dd0daefcd979a8a0

Here, it’s operating better. But checking the version instead of checking the functionality can become a problem because things change and can become what was before the latest version. Trusting in the version detection can disrupt compatibility with future changes in the language.

Plus, there’s a highly cited quote in the Python community from one of the most important developers in history, Grace Hopper. She states, "It’s easier to ask forgiveness than permission".

Having that in mind, we can consider the functionality detection as a way of handling errors in Python:

https://gist.github.com/97aacabc066dfa5ce1a405ad9423e19f

And now we have the code for this situation!

Conclusion

In this post, we could see some of the key differences between Python versions 2 and 3, understanding why these differences exist and what they affect.

In addition, we discussed which version we should prioritize, concluding that, overall, Python 3 is preferable from now on.

We were also able to work out some techniques to improve the compatibility of our code, making it usable with both versions.

So, did you like the content? Tell me what you think about it and what version you use the most in the comments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment