manojpandey/gotchas.md

## gotchas.md

      
    Raw
  

              gotchas.md
            
          
    Attack of Pythons

Among computer programmers, a “gotcha” has become a term for a feature of a programming language that is likely to play tricks on you to display behavior that is different than what you expect.
Just as a fly or a mosquito can “bite” you, we say that a gotcha can “bite” you.
So, let's proceed to examine some of Python's gotchas !


List repetition with nested lists

>>> m=[[0]*3]*2
>>> for i in range(3):
...     m[0][i]=1
>>> print m
# expected 
[[1, 1, 1], [0, 0, 0]]
# but it prints
[[1, 1, 1], [1, 1, 1]] 
This is a bit devilish, but quite obvious when you understand what you're doing. when you're doing the [[0]*3]*2 bit, you're first creating a list with 3 zeros, then you copy that to make two elements. But when you do that copy, you do not create new lists with the same contents, but rather reference the same list several times. So when you change one, they all change.


Don't mix spaces and tabs


Just don't. You would cry.


###Inconsistent indentation
Many newbies come from languages where whitespace "doesn't matter", and are in for a rude surprise when they find out the hard way that their inconsistent indentation practices are punished by Python.
Solution: Indent consistently. Use all spaces, or all tabs, but don't mix them. Use a decent editor !

Explicit type cast of strings

>>> float('infinity')
inf
>>> float('NaN')
nan
But
>>> float('random-string')
ValueError: could not convert string to float: random-string

one common error in python is

>>> a={}
>>> a[1.0]= 'quora'
>>> a[1]= 'facebook'
>>> a[1.0]
'facebook'
it happens because 1.0 and 1 return same hash values.
>>> hash(1)==hash(1.0)
True

Another Gotcha

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?
>>> 257 is 257
True           # Yet the literal numbers compare properly
Reason: The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined.
>>> a = 256
>>> b = 256
>>> id(a)
9987148
>>> id(b)
9987148
>>> a = 257
>>> b = 257
>>> id(a)
11662816
>>> id(b)
11662828

Local Variable Optimization

This flaw bites people fairly often. Consider the following function; what do you think happens when you run it?:
	i=1
	def f():
	    print "i=",i
	    i = i + 1 
	
	f()
You might expect it to print "i=1" and increase the value of i to 2. In fact, you get this:
UnboundLocalError: local variable 'i' referenced before assignment

What's going on here?
Python's source-to-bytecode compiler tries to optimize accesses to local variables. It decides that a variable is local if it's ever assigned a value in a function. Without the assignment i = i + 1, Python would assume that i is a global and generate code that accessed the variable as a global at the module level. When the assignment is present, the bytecode compiler generates different code that assumes i is a local variable; local variables are assigned consecutive numbers in an array so they can be retrieved more quickly. The print statement, therefore, gets compiled to look for the local i, which doesn't exist yet, and dies with the NameError exception. (In Python 1.6 and later, this raises a different exception, UnboundLocalError; it's hoped that this makes the problem a bit clearer.)
The fix for this problem is to declare i as a global in your function, like this:
	i=1
	def f():
	    global i
	    print "i=",i
	    i = i + 1 

A comma at the end of a print statement prevents a newline from being written... but appears to write a trailing space.

It doesn't really write a trailing space -- it just causes an immediately subsequent print statement to write a leading space!
The Python Language Reference Manual says, about the print statement,

A "\n" character is written at the end, unless the print statement ends with a comma.
But it also says that if two print statements in succession write to stdout, and the first one ends with a comma (and so doesn't write a trailing newline), then the second one prepends a leading space to its output. (See section "6.6 The print statement" in the Python Reference Manual. Thanks to Marcus Rubenstein and Hans Meine for pointing this out to me.)

So
	for i in range(10): print "*",
produces
* * * * * * * * * *

If you want to print a string without any trailing characters at all, your best bet is to use sys.stdout.write()
	import sys
	for i in range(10): sys.stdout.write("*")
produces
**********


Catching Multiple Exceptions

Sometimes you want to catch multiple exception in one except clause. An obvious idiom seems to be:
>>> try:
... 	#something that raises an error...
... except IndexError, ValueError:
... 	# expects to catch IndexError and ValueError
... 	# wrong!
This doesn't work though... the reason becomes clear when comparing this to:
>>> try:
...     1/0
... except ZeroDivisionError, e:
...     print e
...     
integer division or modulo by zero
The first "argument" in the except clause is the exception class, the second one is an optional name, which will be used to bind the actual exception instance that has been raised. So, in the erroneous code above, the except clause catches an IndexError, and binds the name ValueError to the exception instance. Probably not what we want. ;-)
This works better:
try:
...something that raises an error...
except (IndexError, ValueError):
# does catch IndexError and ValueError
Solution: When catching multiple exceptions in one except clause, use parentheses to create a tuple with exceptions.

The += operator

In languages like C, augmented assignment operators like += are a shorthand for a longer expression. For example,
x += 42;

is syntactic sugar for
x = x + 42;

So, you might think that it's the same in Python. Sure enough, it seems that way at first:
	a = 1
	a = a + 42
	# a is 43
	a = 1
	a += 42
	# a is 43
However, for mutable objects, x += y is not necessarily the same as x = x + y. Consider lists:
	>>> z = [1, 2, 3]
	>>> id(z)
	24213240
	>>> z += [4]
	>>> id(z)
	24213240
	>>> z = z + [5]
	>>> id(z)
	24226184
x += y changes the list in-place, having the same effect as the extend method. x = x + y creates a new list and rebinds it to x, which is something else. A subtle difference that can lead to subtle and hard-to-catch bugs.
Not only that, it also leads to surprising behavior when mixing mutable and immutable containers:
	>>> t = ([],)
	>>> t[0] += [2, 3]
	Traceback (most recent call last):
	  File "<input>", line 1, in <module>
	TypeError: object doesn't support item assignment
	>>> t
	([2, 3],)
Sure enough, tuples don't support item assignment -- but after applying the +=, the list inside the tuple did change! The reason is again that += changes in-place. The item assignment doesn't work, but when the exception occurs, the item has already been changed in place.
This is one pitfall that I personally consider a wart.
Solution: depending on your stance on this, you can: avoid += altogether; use it for integers only; or just live with it. :-)

Class attributes vs instance attributes

Another (small) pitfall is that self.foo can refer to two things: the instance attribute foo, or, in absence of that, the class attribute foo. Compare:
>>> class Foo:
...     a = 42
...     def __init__(self):
...         self.a = 43
...     
>>> f = Foo()
>>> f.a
43
and
>>> class Foo:
...     a = 42
...     
>>> f = Foo()
>>> f.a
42
In the first example, f.a refers to the instance attribute, with value 43. It overrides the class attribute a with value 42. In the second example, there is no instance attribute a, so f.a refers to the class attribute.
The following code combines the two:
>>> class Foo:
...     
...     bar = []
...     def __init__(self, x):
...         self.bar = self.bar + [x]
...     
>>> f = Foo(42)
>>> g = Foo(100)
>>> f.bar
[42]
>>> g.bar
[100]
Finally,
>>> class Foo:
... 	bar=[]
...     def __init__(self, x):
...         self.bar += [x]

>>> f=Foo(42)

>>> f.bar
[42]

>>> g=Foo(100)

>>> g.bar
[42, 100]

>>> f.bar
[42, 100]
Again, the reason for this behavior is that self.bar += something is not the same as self.bar = self.bar + something. self.bar refers to Foo.bar here, so f and g update the same list.

Mutable default arguments

This one bites beginners over and over again. It's really a variant of #2, combined with unexpected behavior of default arguments. Consider this function:
>>> def popo(x=[]):
...     x.append(666)
...     print x
     
>>> popo([1, 2, 3])
[1, 2, 3, 666]

>>> x = [1, 2]

>>> popo(x)
[1, 2, 666]

>>> x
[1, 2, 666]
This was expected. But now:
>>> popo()
[666]

>>> popo()
[666, 666]

>>> popo()
[666, 666, 666]
Maybe you expected that the output would be [666] in all cases... after all, when popo() is called without arguments, it takes [] as the default argument for x, right? Wrong. The default argument is bound once, when the function is created, not when it's called. (In other words, for a function f(x=[]), x is not bound whenever the function is called. x got bound to [] when we defined f, and that's it.) So if it's a mutable object, and it has changed, then the next function call will take this same list (which has different contents now) as its default argument.
Solution: This behavior can occasionally be useful. In general, just watch out for unwanted side effects.

String concatenation

This is a different kind of pitfall. In many languages, concatenating strings with the + operator or something similar might be quite efficient.
But in Python it's likely to be highly inefficient. Since Python strings are immutable, a new string is created for every iteration (and old ones are thrown away). This may result in unexpected performance hits. Using string concatenation with + or += is OK for small changes, but it's usually not recommended in a loop.
Solution: If at all possible, create a list of values, then use string.join (or the join() method) to glue them together as one long string. Sometimes this can result in dramatic speedups.
To illustrate this, a simple benchmark. (timeit is a simple function that runs another function and returns how long it took to complete, in seconds.)
>>> def f():
...     s = ""
...     for i in range(100000):
...         s = s + "abcdefg"[i % 7]
...     
>>> timeit(f)
23.7819999456

>>> def g():
...     z = []
...     for i in range(100000):
...         z.append("abcdefg"[i % 7])
...     return ''.join(z)
...     
>>> timeit(g)
0.343000054359

Don't use index to loop over a sequence

Don't :
for i in range(len(tab)):
    print tab[i]
Do :
for elem in tab :
    print elem
For will automate most iteration operations for you.
Use enumerate if you really need both the index and the element.
for i, elem in enumerate(tab):
     print i, elem

Float range values

The values of range(end_val) are not only strictly smaller than end_val, but strictly smaller than int(end_val). For a float argument to range, this might be an unexpected result:
list(range(2.89))
[0, 1]

class variables vs instance variables

Using class variables when you want instance variables. Most of the time this doesn't cause problems, but if it's a mutable value it causes surprises.
class Foo(object):
    x = {}
But:
>>> f1 = Foo()
>>> f2 = Foo()
>>> f1.x['a'] = 'b'
>>> f2.x
{'a': 'b'}
You almost always want instance variables, which require you to assign inside init:
class Foo(object):
    def __init__(self):
        self.x = {}

List slicing

List slicing has caused me a lot of grief. I actually consider the following behavior a bug.
Define a list x
>>> x = [10, 20, 30, 40, 50]

Access index 2:
>>> x[2]
30

As you expect.
Slice the list from index 2 and to the end of the list:
>>> x[2:]
[30, 40, 50]

As you expect.
Access index 7:
>>> x[7]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
Again, as you expect.

However, try to slice the list from index 7 until the end of the list:
>>> x[7:]
[]

WAT

The remedy is to put a lot of tests when using list slicing. I wish I'd just get an error instead. Much easier to debug.

Do not check if you can, just do it and handle the error

Pythonistas usually say

"It's easier to ask for forgiveness than permission".

Don't :
if os.path.isfile(file_path) :
    file = open(file_path)
else :
    # do something
Do :
try :
    file =  open(file_path)
except OSError as e:
    # do something