Skip to content

Instantly share code, notes, and snippets.

@tjjjwxzq
Last active May 31, 2018 08:55
Show Gist options
  • Save tjjjwxzq/6922e5d2f9c788417cc4 to your computer and use it in GitHub Desktop.
Save tjjjwxzq/6922e5d2f9c788417cc4 to your computer and use it in GitHub Desktop.
Python 2.7 Notes

Python 2.7 Notes

Contents

Basic Concepts, Common Beginner Pitfalls, and Best Practices

Seriously, learn to use the docs

No internet access during exams. Your memory isn't infallible. Your notes aren't perfect. The docs will be hosted on a local server. Please please please learn to use them.

In all honesty, the Python docs aren't the best. Which makes it all the more important that you familiarize yourself with them so you know where to search for stuff when you need to search for stuff. (To prove my point, why not try typing string methods into the "Quick Search" bar and see whether you intuitively know which result to click on to get what you want)

These pages will be your best friends:

  1. Built-in Functions

If you need to look up functions like input(), sum(), int(), max(), round() etc.

  1. Built-in Types

If you need to look up string (str) methods, list methods, dictionary (dict) methods and operations, and file methods

  1. Data Structures

More explicit documentation on list methods and list comprehension

See, it ain't that hard to learn to use the docs.

Variables and scope

You can't just have values floating around; you need handles on them. That's what variables are, names that are bound to some value.

I got this nice imagery from Eloquent Javascript: think of variables as octopus tentacles grasping onto values. They are not the values themselves. You can reassign variables, so you have the same tentacle (name) grapsing (bound to) a different value.

x = 10
x = 2

Pictorially:

![variables are tentacles][variable-octopus] [variable-octopus]:https://raw.githubusercontent.com/tjjjwxzq/Digital-World-TA-Materials/master/Python-2.7-Notes/variable%20octopus.jpg

This mental model is important in understanding how python is pass by object reference.

Also python has this cool feature of multiple assignment:

x,y,z = 1,2,3
print x # prints 1
print y # prints 2 print z # prints 3
# naise

Another important concept is variable scope. This is the region where python will look for variable bindings when you reference a variable:

# top-level, global scope
x = 10
# I can access the variable x anywhere in my script
# because it is a global variable

# this is still in global scope
print x

# this is within a local function scope
# but if python can't find anything in the local scope
# it will check the for any enclosing function scopes
# and then the global scope
def f():
  print x # x here is found in the global scope

f() # prints 10

def g():
  x = 3
  def h():
    print x # this finds x in the enclosing function scope
  h()

g() # prints 3

# this is again a local function scope
# but within the local scope I have defined another variable x
# this is different from the global variable x
# and won't affect the value of the global variable
def g():
  x = 2
  print x # this x is found in the local function scope

g() # prints 2
print x # prints 10, this is the global variable x

# code blocks like functions and classes introduce a new scope
class Dog:
  x = 0
  # this is again a different x

  # unlike function blocks, the scope of class blocks
  # do not extend to any nested blocks
  def woof(self):
    print x # this finds x in the global scope, not in the class scope

  def woof2(self):
    print self.x # this references the x defined in the class scope

dog = Dog()
dog.woof() # prints 10
dog.woof2() # prints 0

You damn well better name your variables properly

I have taken the liberty of giving single character variable and function names for demonstration purposes but this is not in general a good A VERY BAD idea. You should always always always name your variables something descriptive (though not tediously long). Unless you are writing a very short and simple program and know what you are doing (or such single character names are conventional, as in looping variables i,j,k), littering your code with crappy variable names like a, b, c,d, x , y, z, and even worse, reusing those same crappy variable names for different and completely unrelated purposes is a surefire way to screw yourself over, start a debugging nightmare, and piss off anybody who has to read your code (and they do mark your code manually, so if you want partial credit...).

Also, you will make python sad.

sad python

YOU DON'T WANT TO MAKE PYTHON SAD, DO YOU????

Assignment = operator vs Equality == operator

The assignment = operator is for assigning a value to a variable. The equality == operator compares to values and returns a boolean value (True or False). You really shouldn't be confusing them at this point.

Arithemetic

Be wary of integer division

Remember that in python 2x, dividing ints with the / operator does integer division:

print 3/4 # prints 0
print 3.0/4 # prints 0.75

Make sure you make one of the values a float if you want normal division.

Be wary of multiplying bracketed expressions

Yes, when you write algebraic expressions down on the pen and paper and put bracketed stuff side by side, we know the multiplication sign is implied. Python is not as smart as you (sorry):

print (10+5)(10-5) # throws TypeError, `int` object is not callable
                   # bonus points for figuring out why
                   # otherwise just don't do this

# you forgot your * d'oh
print (10+5)*(10-5) # prints 75

Be wary of floating point arithmetic

This is a very good article on the issue

The upshot is that we count with 10 fingers while our computer counts with 2 (stupid computer!), and you can't exactly represent base 10 numbers in binary, so python tries its best. Still, the errors can add up, as some of you might have realized when you've tried to iteratively increment the value of a float, and realized that the end result is a little off from the expected.

This has consequences when comparing floats. In general, either round your floats to the precision you care about, or don't compare for direct equality, but rather a range within a threshold you care about:

print round(4.9999987214,2) == 5.0 # prints True
x = 4.999827213
print abs(x-5.0) <=  0.005 # prints True

Indentation makes a difference

In python indentation is syntactically significant.

So there was this question about checking whether a number is prime:

# what's the difference between this
def is_prime(num):
  if num >0 and num <= 3:
    return True
  for i in range(2,num):
    if num%i ==0:
      return False
    else:
      return True

# and this?
def is_prime2(num):
  if num >0 and num <= 3:
    return True
  for i in range(2,num):
    if num%i == 0:
      return False
  return True

Why doesn't my code do anything?

Better make sure it's not something facepalm-worthy before you ask...

You forgot to call your function (d'oh)

So you happily define a function, run your code, and are baffled when nothing happens.

Remember, functions are factories right? When you defined a function, you built the factory. But if you don't order from factory, why are you even expecting anything?

If you want something from your factory, you have to order from it. If you want something from your function, you have to call it. How do you call a function?

functionname(arguments)

The parentheses are key to calling the function. (And get the number of arguments correct too, or python will throw an error)

You forgot to print the output (d'oh)

Okay, you defined your function and you called it. But your program still isn't doing anything!

Except that it is. If only your vision could penetrate into that 6th Generation Intel Core i7 processor and see how hard its transistors are chugging away to evaluate your ifs and fors and +s and -s. Alas, as a substitute for being Clarke Kent, you have to consign yourself to putting print statements in your code.

So that, you know, you can actually see the stuff your program is outputting. We all know the interactive console/commandline is lamer than supervision, but it's all we've got.

print vs return

This is a very common misconception. "But aren't print and return the same thing???"

![python says no][python-no] [python-no]:https://raw.githubusercontent.com/tjjjwxzq/Digital-World-TA-Materials/master/Python-2.7-Notes/angry-python.jpg

Observe:

def f(x):
  print x
  # this function doesn't have an explicit return statement
  # so it returns None by default

f(2) # prints 2
print f(2)
# prints
# None
# 2
# This is because print prints the value of the expression passed to it
# when we call a function, it evaluates to its return value
# So in this case, we printed the return value of None
# and when the function was called the print statement within
# the function was executed and printed 2

# see how this is different
def g(x):
  return x

g(2) # you won't see any output on your console
print g(2) # prints 2

while vs for loops

When should you use one over the other?

Generally, we use while loops when we don't know beforehand how many times we need to loop, but we keep checking the looping condition in each loop and exit when the condition becomes false.

Use for i in range(numloops) when you already know the number of times you need to loop (numloops). Also you can use for to iterate over iterables like lists, strings, tuples, dictionaries, files etc. as well.

Single vs Double-quoted strings

There is no difference between using either in Python (unlike for example Ruby or Java). It might be slightly more convenient to use double-quotes for strings like this:

"I'm using double-quotes 'cos my string has single-quotes in it!"

# though you could just escape those single quotes like so
'I\'m using single-quotes so I have to escape them!'

Comparisons and 'Falsey' Values

Unlike some languages(eg. Java), you can test conditions on non-Boolean values as well:

astring = ''

if not astring:
  astring = "something"

print astring # prints something

The empty string evaluated to a false value in the condition. Python has this notion of 'truthy' and 'falsey' values which you can substitute for the value of True and False, which is convenient, as you've seen above.

None, False, zero of any numeric type (0,0L,0.0,0+0j), empty sequences (strings, lists, tuples) or mappings (dictionaries) ("", [], (), {}) evaluate to false. Everything else evaluates to true*.

* This is not strictly speaking correct, but good enough for now. For the full picture, head to the docs

Appending to Lists

Just be wary of the difference between list.append(item) and alist += item. The latter is equivalent to list.extend(item), ie. item has to itself be a list.

a = [1,2,3]
a.append(4)
print a # prints [1,2,3,4]

a += 5 # throws TypeError: 'int' object is not iterable
a.extend(5) # throws TypeError: 'int' object is not iterable

a += [5] # make sure you are adding a list
a.extend([6]) # likewise
print a # prints [1,2,3,4,5,6]

# if you want to add in sublists, then use append
a.append([7])
print a # prints [1,2,3,4,5,6,[7]]

Python passes object-references by value

This is by far the trickiest concept to grasp. This is the reason you can inadvertantly end up mutating a list that you've passed to a function, or assigned to another variable (innocently thinking that you've passed a copy and that original list will be untouched). This is why you have to explicity copy your lists when you pass them around, if you don't want to mess up your original list:

a = [1,2,3]
b = a # let me operate on b so I don't end up changing a, because I might need it later

b[0] = 0
print b # prints [0,2,3]
print a # prints [0,2,3]; oops, a was changed as well!

# remedy this by explicitly copying your list
a = [1,2,3]
b = a[:] # remember this slice notation returns a copy of the whole list

b[0] = 0
print b # prints [0,2,3]
print a # prints [1,2,3] # that's more like it

This concept lies pretty closely with the notion of mutable and immutable types in python, so let's use our octopus tentacles to dissect it.

![tentacleception][tentacleception] [tentacleception]:https://raw.githubusercontent.com/tjjjwxzq/Digital-World-TA-Materials/master/Python-2.7-Notes/tentacleception.jpg

So, now you know why, to truly copy a list with nested lists/dictionaries/sets/any mutable type, you have to use deepcopy():

import copy

# no copying
a = [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
b = a # no copy
b[0] = 0 # you changed a as well
print a # prints [0,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]

# shallow copying
a = [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
b = a[:] # shallow copy, equivalent to copy.copy(a)
b[0] = 0 # you didn't change a
print a # prints [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
b[-1][3] = 4 # now you changed a, because b contains a reference to the same dictionary as a
print a # prints [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:4}]
# also to note
# do you think you can do this:
# b[3][0] = 0

# deep copying
a = [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
b = copy.deepcopy(a) # deep copy, copies everything, even with multiple levels of nesting
b[0] = 0 # you didn't change a
print a # prints [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
b[-1][3] = 4 #  you still didn't change a
print a # prints [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
# whew, safe at last

You should now know why to be wary when passing lists or mutable types into functions as arguments. You may inadvertantly end up modifying your original list in your function, because python passes object-references by value, not the object itself. With immutable types this is okay, because you can't change them anyway. With mutable types, this means a lot of weird shit can happen if you're not careful. You have been warned.

If you need more explanation, here's a pretty good (albeit tentacle-less) article.

Weird patterns I've seen (please don't do this)

Nesting functions

Why you do this:

def somefunc():
  # do stuff
  def anotherfunct():
    # do completely unrelated stuff

There are good reasons to use nested functions, none of which you will encounter at this point.

Please, don't do this.

Conditionally defining functions

Why you do this:

if somecondition:
  def somefunc():
    # do stuff

If you're doing this, I think you're confused and actually want the conditional inside the function body.

There are good reasons to conditionally define functions, none of which you will encounter at this point.

Please, don't do this.

Superfluous pass statements

Why you do this:

def somefunc():
  pass
  x = 10
  print x
  ...
  ...
  # do stuff

Why you do this:

if somecondition:
  pass
else:
  # do stuff

You realize you could have just done this:

if not somecondition:
  # do stuff

right?

There are no good reasons for superfluous pass statements.

Please, get rid of them.

Numeric Types

Python has four numeric types: int, long float and complex. To specify the numeric types:

x = 10    # x is an int
x = 10L   # x is a long
x = 10.0  # x is a float
x = 10+0j # x is a complex

You can carry out arithmetic operations with a mixture of these types, and python will return a result with the broadest type (complex is broader than float is broader than long is broader than int)

The arithmetic operations are:

# x and y are variables with some numeric value

x + y   # addition
x - y   # subtraction
x * y   # multiplication
x / y   # division; note that if x and y are ints, this will be integer division which is equivalent to floored division
x // y  # floored division
x % y   # remainder of divison; modulo
x ** y  # x to the power of y

(You may be wondering about the other ways you can go about getting x to the power of y. What about pow(x,y) and math.pow(x,y)?. If you're confused just stick with x ** y)

To get the absolute value, use the built-in function abs():

x = -10
print abs(x) # prints 10

To round off floating point values, use the built-in function round():

x = 10.575
print round(x,2) # prints 10.57; not exactly what we expect
                 # because of limitations of floating point arithmetic

To convert between numeric types, use the built-in functions int(), long(), float(), and complex():

x = 15.6
print int(x) # prints 15
print long(x) # prints 15L
print complex(x) # prints (15.6 + 0j)

You can also use those functions to convert a string literal into those numeric types, provided the string literal is valid (or ValueError will be thrown):

s = "123"
print int(s) # prints 123
print float(s) # prints 123.0
print long(s) # prints 123L
print complex(s) # prints  (123+0j)

s = "abc"
print int(s) # throws ValueError

s = "14 + 5j"
print int(s) # throws ValueError
print complex(s) # throws ValueError; the spaces make a difference!
s = "14+5j"
print complex(s) # prints (14+5j)

Strings

(This section is on things specific to strings. For operations that you can apply to strings as iterables, see Iterables)

Useful Operations

String concatenation and 'multiplication':

s = ""
s += "abc"
print s # prints abc

s = "4"
s *= 3
print s # prints 444

Checking whether one string contains another substring:

# say I want to count the number of vowels in a long string
count = 0
for char in longstring:
  if char in "aeiou":
    count += 1

String Formatting

This is a very good article.

Flow Control

if statements

Use if statements to test conditions. This can be followed by zero or more elifs and optionally an else.

programmingisfun = True
programmingisboring = False

if programmingisfun:
  print "Yay"
elif programmingisboring:
  print "But why"
else:
  print "So what do you think of programming?"

while loops

Repeat a block of code until the while condition becomes false:

suxatprogramming = True
level = 0
while suxatprogramming:
  # practicepracticepracticepractice
  level += 1
  if level >= 1000:
    suxatprogramming = False

for loops

Iterate over an iterable/sequence. The basic idiom is to iterate over a list returned by the range() function, if you know beforehand the number of types you need to loop.

numloops = 10

for i in range(numloops):
  # do stuff

break and continue statements

break breaks out of the smallest enclosing for or while loop. You usually use it when you achieve a condition that makes you want to prematurely terminate the loop or together with a while True loop:

# breaking from a for loop
newnumlist = []
oldnumlist = [1,3,5,12,30,13,4,56,2]
for num in oldnumlist:
  newnumlist.append(num)
  if sum(newnumlist) > 10:
    break
print newnumlist # prints [1,3,5,12]

# breaking from a while True loop
# typically you use this idiom when there are many potential conditions
# you want to test for. So instead of testing for the conditions after
# the while loop statement, test them in the loop body and break
from random import randint
while True:
  x = randint(1,6)
  y = randint(1,6)

  if x == 6:
    break
  if x + y == 5:
    break

continue allows you to move on to the next iteration of the loop even before the current iteration has completed fully:

astring = "abcdefg"

for char in astring:
  if char == "d":
    continue
  print char

# prints
# a
# b
# c
# e
# f
# g

pass statement

This statement does nothing. Use it as filler when you don't need your code to do anything but you syntactially need a statement. Commonly used in the skeleton code given to you:

def yourfunction():
  pass
  # add your code here and please remove the pass afterwards

If it's not there python will scream about a SyntaxError.

Handy when you are in the midst of building up a complex program, and are thinking at and abstract level about the kinds of functions you'd need to implement. You can just put down your function definitions while leaving the body empty with a pass statement:

# making a super complex tictactoe game

# prompts user for row number
def prompt_row():
  pass

# prompts user for col number
def prompt_col():
  pass

# plays piece on board
def play_piece(board, piece, coords):
  pass

else clause after for and while loops

Not necessary but good to know.

User Input

To get user input from standard input, use the built-in input() or raw_input() functions.

promptmsg = "Enter something"
x = raw_input(promptmsg)
# x is a string
print x # prints whatever the user input

The difference between the two functions is that input() tries to evaluate as if it were Python code:

print raw_input("Hi!")
# say the user types "ghi"
# then ghi is printed
print input("Hi!")
# say the user types "ghi"
# then we get a NameError: name 'ghi' is not defined
# because Python tries to interpret the input as Python code

In otherwords input(msg) is equivalent to eval(raw_input(msg)), and there's not much reason to really use it. In fact it's functionality is deprecated in python 3x, with raw_input() being renamed to input().

Functions

Defining and Calling Functions

Think of functions as mobile factories - they take in some kind of input (passed in as arguments to the function) and produce an output (the return value of the function). You can build factories (define functions) with the following syntax:

def functionname(arg1,arg2...):
  # function body
  # do something
  # return something

Functions allow you to compartmentalize and reuse your code. If you find yourself repeating the same or almost similar block of code lots of times, maybe you should encapsulate it in a function.

So now we've got our factory, but it won't do anything unless we order something from it (call the function):

def f(x):
  print x

# if I don't call the function I just defined this snippet of code won't do anything!
f(1) # prints 1

You call a function by adding the parenthesis to the function name and passing in whatever arguments it takes. Some functions don't take any arguments, so you just add a set of empty parentheses:

def f():
  print "howdy"

f() # prints "howdy"

Note that adding the parenthesis is crucial to actually calling functions. Because in python functions are actually just objects that you can pass around:

def f():
  print "howdy"

print f # prints <function f at someidnumber>; nope, it doesn't print "howdy" and None
f() # prints "howdy"

See the difference?

Function Parameters, Arguments and Return Values

Arguments are the values you pass into a function when you call it. For example:

def f(x):
  print x

f("hello") # "hello" is my argument

You also hear people talking about function parameters. What's the difference between arguments and parameters?

Python has a handful of handy features(default arguments, keyword arguments etc) regarding function arguments. Check the docs if you're interested.

Functions return something. To specify the return value use the return statement:

def f(x):
  return x +1

print f(2) # prints 3

If a return statement is not executed (ie. it is not specifed, or the logical flow of the function code is such that any explicit return statements are not executed), then functions will return a None value by default:

def f(x):
  x += 1

print f(2) # prints None

Function scope

Function are considered code blocks in Python meaning they introduce a new scope for any variables defined within it:

# x here is a top-level global variable
x = 10

def f():
  # this x is in the function's local scope
  x = 2
  print x

f() # prints 2
# outside the function body I'm back in global scope
# the variable x I reference here is the global variable
print x # prints 10

You can't access variables local to a function block in the global scope:

def f():
  x = 3

print x # this will throw a NameError: name 'x' is not defined

The scope of variables defined in function blocks extends to any code blocks enclosed by that function block, such as nested functions:

def f():
  x = 3

  def innerf():
    # I'm still able to access x here
    print x

  innerf()

f() # prints 2

(Note that at this point, you have hardly any/no good reasons to use nested functions, so you really shouldn't. Just do all your function definitions in the top-level of your script. If you're really interested, read up on 'closures')

Anonymous (Lambda) functions

When defining a function the normal way with a def key, you have give the function an explicit name. That you can use to reference the function in the future. Sometimes, however, you just need to define a short little function and use it only once. Python allows you to use the lambda keyword to create these anonymous functions. A common use case is passing these anonymous functions as arguments to built-in functions like sorted() and max():

listoftups = [(1,2,3), (2,3), (4,5), (1,3)]
# I want to get the maximum tuple, based on the sum of its first two values
print max(listoftups, key=lambda x: x[0] + x[1]) # prints (4,5)

The lambda syntax is as follows:

lambda arguments: expression

the value of the expression will be returned by the lambda function.

More examples:

f = lambda x,y: x + y
print f(2,3) # prints 5

Iterables (String, List, Tuple, Set)

Conceptually iterables are python objects you can iterate (ie. loop) over, accessing each of its elements ("it is capable of returning its members one at a time"). The iterables you should be familiar with are sequence types such as lists, tuples and strings, and some other non-sequence types like dictionaries and files (though they will be treated here in their own section).

Basically if you can call a for x in iterable on it you've got an iterable on your hands.

For a more technically exact definition of python iterables, head to the docs

Indexing and Slice Notation

To get one item in an ordered iterable, specify it by its index (remember that the index starts from 0)

someiterable[i] # get the i'th item of someiterable

Indices can be negative, which means that you count from the back of the iterable. A common idiom is getting the last item in the iterable:

someiterable[-1] # get the last item of someiterable

Note that only string, list and tuple types support indexing (since these iterables are ordered types)

The slice syntax

Python's slice notation allows you to get a subset of the iterable's items (eg. part of a list, or a substring)

someiterable[start:stop:step]

The slice syntax is similar to that of the built-in range() function - you specify the starting index, the stopping index, and optionally, the step. Note that the subset will contain items from and including start up to but not including stop (thus up to and including stop-1)

Examples:

"abcdefg"[2:5] # gets the substring "de"

If the start is ommitted, the subset starts from the first item:

"abcdefg"[:5] # gets the substring "abcde"

If the stop is ommitted, the subset goes till (including) the last item:

"abcdefg"[1:] # gets the substring "bcdefg"

A common idiom is to get a copy of a whole list like so:

somelist = ["a","b","c"]
copylist = somelist[:] # gets the a copy of somelist and assigns it to copylist

(Remember it only really makes sense to do this only for mutable iterables like lists, because don't want to modify the original list. Since immutable types like strings and tuples can't be modified, there isn't really such a thing as a "copy of a string/tuple". If string1 = "abc" and string2 = "abc", string1 and string2 point to the same string object)

If either start or stop are out of range, the slice will get as many items as is possible:

"abcdefg"[2:19] # gets the substring "cdefg"
"abcdefg"[-10:5] # gets the substring "abcde"

If the slice doesn't end up selecting anything, we get an empty subset:

"abcdefg"[4:3] # gets a empty string ""

We can specify negative indices in slice notation as well:

"abcdefg"[1:-1] # gets the substring "bcdef"

If step is specified and positive, we get a subset of items with the indices: [start, start + step, start+2*step...] till the largest start + i*step smaller than stop:

"abcdefg"[::2] # gets the substring "aceg"
"abcdefg"[1::2] # gets the substring "bdf"
"abcdefg"[:-1:3] # gets the substring "ad"

If step is negative, we get a subset of items with the indices: [start, start + step, start +2*step... ] till the smallest start + i*step larger than stop

"abcdefg"[::-1] # gets the reversed string "gfedcba"
"abcdefg"[-1::-2] # gets the substring "geca"
"abcdefg"[4:-4:-1] # gets the substring "e"

Assigning to slices

You can only assign to slices of mutable types (like lists!) You can only assign iterable types to the slice. And the length of slice doesn't necessarily have to be the same as the length of the iterable you assign it to. Basically imagine cutting away that slice and pasting in the assigned iterable, whatever the length is. For example:

a = range(10) # range(10) returns the list [0,1,2,3,4,5,6,7,8,9]
a[3:6] = ["a","b","c"] # slice and assigned iterable of the same length, now a becomes [0,1,2,3,"a","b","c",7,8,9]
a[3:6] = ["a","b"] # now a becomes [0,1,2,3,"a","b",7,8,9]
a[3:6] = ["a","b","c","d","e"] # now a becomes [0,1,2,3,"a","b","c","d","e",7,8,9]
a[3:6] = ("a","b") # we can assign a tuple too: now a becomes [0,1,2,3,"a","b",7,8,9]
a[3:6] = "ab" # we can assign a string as well: now a becomes [0,1,2,3,"a","b",7,8,9]

Assigning to an empty slice is a quick way to insert values into a list:

a = range(10)
a[4:3] = "abc" # now a becomes [0,1,2,3,"a","b","c",4,5,6,7,8,9] (the assigned iterable is inserted into the start index)

Iteration and Reverse Iteration

for item in iterable:
  # do stuff

# to reverse use the built-in function reversed()
for item in reversed(iterable):
  # do stuff

Get the Length

len(iterable)

Get the Max and Min

max(iterable)
min(iterable)

If you want to specify your own ordering function, you can pass in the key argument: max(iterable, key=somefunction) For example

max(d, key=lambda k: d[k])

By passing in a lambda (anonymous) function as the key argument, I'm asking max to call the function passed to the key argument on the iterable elements before finding the maximum. So I'm basically finding the key that corresponds to the maximum of dictionary's values.

Note that to get the maximum of the dictionary's keys, I can just call max(d) (calling d.keys(), though intuitive, is superfluous)

Another common idiom is to get the maximum string by length

max(str1,str2,str3,...strn, key=len) # remember len() is a built-in function

Otherwise max(str1,str2,str3...strn) normally returns the max by lexicographical order

Sum an iterable

sum(iterable, [start])

Used to sum a list of numbers (and it won't work on a list of strings, use the more efficient ''.join(listofstrings) instead). You can speficy an optional argument start as a value to add to the sum (defaults to 0).

Sorting an iterable

There are two ways to go about it (for lists) - use the built-in sorted() function:

sorted(alist,[key,[reverse]])

or use the list method list.sort():

alist.sort([key,[reverse]])

The optional keyword arguments key and reverse are the same for both sorted() and list.sort(), and can be quite handy. The difference between the two is that sorted() returns a copy of the sorted list while list.sort() modifies the list in-place and returns None. No real reason to use one over another, as long as you know what you're doing. Maybe sorted() is a teeny bit more convenient.

The biggest difference is probably that sorted() can be called on any iterable (strings, tuples, sets, dictionaries, files), while list.sort() can (obviously) be only called on lists. But sorted() always returns a list.

(If you're wondering how calling sorted() on a dictionary or a file works, the former basically returns a sorted list of the dictionary keys, the latter a sorted list of lines in the file)

Examples:

alist = ["def", "ghi", "abc"]
print sorted(alist) # prints ["abc", "def","ghi"]
print alist # prints ["def", "ghi", "abc"]; the original list has not been modified
print alist.sort() # prints None
print alist # prints ["abc", "def", "ghi"]; the original list has been modified

The key and reverse keyword arguments are useful to know. Say you want to sort a list of strings, but not by the default lexicographical comparison but by their lengths, then you could do:

strings = ["efg", "hi", "abcd"]
# normal sorted() without arguments sorts lexicographically
print sorted(strings) # prints ["abcd", "efg", "hi"]
# but I can sort by string length instead
print sorted(strings, key=len) # prints ["hi", "efg", "abcd"]

Basically the key argument specifies a function that is called on each item in the iterable and returns a value that is used to compare each item. If key is not specified the item itself is used directly for comparison.

The reverse keyword argument is clearly for reversing the sort:

astring = "abcdefg"
print sorted(astring) # prints ["a","b","c","d","e","f","g"]
print sorted(astring, reverse=True) # prints ["g","f","e","d","c","b","a"]
# if you want to get back a string, just use str.join()
print "".join(sorted(astring)) # prints "gfedcba"

Reversing an iterable

There are a few ways. First using sorted(reverse=True):

tup = (12,5,12,4,55,2)
print sorted(tup, reverse=True) # prints [55,12,12,5,4,2]

Second calling list.reverse() on a list (remember this works only for lists!):

alist = [12,5,12,4,55,2]
# like list.sort(), list.reverse() modifies the list in place and returns None
print alist.reverse() # prints None
print alist # prints [55,12,12,5,4,2]

Third, and probably the most elegant, using slice notation:

astring = "abcdefg"
print astring[::-1] # prints "gfedcba"

(Just note that sorted() is still the most general since it can be called on dictionaries and file objects as well, though those are very uncommon use cases. And it always returns a list.)

Never use the reversed() built-in function, which returns a reversed iterator, which is most of the time not what you want. Only use reversed() for iteration like so:

astring = "adfsdfsdf"
for char in reversed(astring):
  print char

print reversed(astring) # prints  <reversed object at someidnumber >; nope, totally not what you were expecting

See this PEP for more on reverse iteration.

Rearranging a set of iterables - getting the ith element into the ith iterable (like transposing a matrix, or pairing two lists)

Use zip(iterable1, iterable2...) which returns a list of tuples (which can be easily converted into a list of lists, if desired, using something like [list(tup) for tup in zip(iterable1,iterable2...)])

Here's a potential use-case: suppose you have a list of items and you want to merge every other item together. For example:

strings = ["hi", 1, "bye",2, "sad", 5, "angry", 6]
# suppose I want to transform strings into this list: ["hi1", "bye2", "sad5", "angry6"]
strings = zip(strings[::2], strings[1::2]) # this is the same as zip(["hi","bye","sad","angry"], [1,2,5,6])
# now strings is [("hi",1), ("bye",2), ("sad",5),("angry",6)]
strings = [tup[0] + str(tup[1]) for tup in strings]
# now strings is ["hi1", "bye2", "sad5", "angry6"]

Checking if all items in an iterable are True

all(iterable) # returns True if all items in iterable are True

The items do not literally have to be the boolean values True or False, since python has the notion of truthy and falsy values. For example, I have a list of strings, and I want to check if all of them are non-empty:

strings = ["acb", "" ,"adf", "sdfs"]

if all(strings):
  print "No empty strings!"
# nothing gets printed

Or maybe you want to check if all numbers in a list are equal to a certain value:

numlist = [0,0,0,0,0]

print all([num == 0 for num in numlist]) # prints True

# note that the following is different
all(numlist) # prints False, since 0 is a falsey value

Checking if any items in a iterable are True

any(iterable) # returns True if at least one item in the iterable is True

The items do not literally have to be the boolean values True or False, since python has the notion of truthy and falsy values.

Say I have a list of students and their grades and want to praise all of them once at least one of them gets an 'A' (because I'm a nice person):

students = {"john":"B", "mary":"B", "joseph":"A"}

if any([grade == "A" for grade in students.values()]):
  print "Good job guys!"

Mapping Lists (applying a function onto all elements of the list)

map(function, iterable)

map() applies function to every item in iterable and returns a list of the results.

For example, suppose I have a list of strings and I want to get a list of their lengths:

strings =["first", "second","third"]
print map(len, strings) # prints [5,6,5]

A common idiom is to convert a list of string numbers into floats or ints:

scores = ["1.3", "5.6", "9.0"]
print map(float, scores) # prints [1.3, 5.6, 9.0]

Note that you can also do this using list comprehensions, which is generally considered more 'pythonic':

strings = ["first", "second", "third"]
print [len(s) for s in strings] # prints [5,6,5]

For most part, stick with list comprehensions, though there might be some very uncommon cases that can mess you up. More on why one way over the other.

Filtering Lists (getting certain values that fulfill some condttion)

filter(function, iterable)

Filter applies function to each element in iterable and returns a list of the elements for which function returns true.

For example, suppose I have a list of integers and I want to extract only the even ones:

numlist = [1,2,3,4,5,7,13,4,5,61,50,98]
print filter(lambda x: x%2==0, numlist) # prints [2,4,4,50,98]

As with the map() function, you can accomplish the same thing with the if clause in list comprehensions, also generally considered more 'pythonic':

numlist = [1,2,4,5,1,2,4,5,3,1,0,0,5,5,4]
maxnum = max(numlist) # maxnum is 5
print len([num for num in numlist if num ==maxnum]) # prints 4, the number of 5's in numlist
# this is just a convoluted way of doing numlist.count(maxnum), just for demonstration purposes

List Comprehension

List comprehension is python's really cool syntactic sugar for constructing lists out of some other iterable. For example:

cubes = [x**3 for x in range(10)] # creates a list [1,8,27,...1000]

In general a list comprehension is of the form:

[ (expression) for clause [zero or more for or if clauses]]

In the example, expression was x**3, and there was only one for clause: for x in range(10).

Additional for and if clauses

More complex list comprehensions might use additional for and if clauses. For example, the if clause can be used for filtering an iterable:

strings = ["abd", "abc", "bcd", "gdf"]
# get only the strings that contain 'a'
strings = [s for s in strings if "a" in strings] # strings is now ["abd","abc"]

Using additional for clauses is the same as creating nested for loops:

z = [num1*num2 for num1 in range(3) for num2 in range(2)] # z is now [0,0,0,1,0,2]

is equivalent to:

z = []
for num1 in range(3):
  for num2 in range(2):
    z.append(num1*num2)

Note how the for clauses are evaluated in order.

We may combine addtional for and if clauses:

names = ["finch", "ebot","amigo","pi"]
descriptions = ["is fun", "is cute", "is educational", "is cheap"]
dwstuff = [name +" "+ desc for name in names for desc in descriptions if name != "amigo" and desc != "is cheap"]

is equivalent to

names = ["finch", "ebot","amigo","pi"]
descriptions = ["is fun", "is cute", "is educational", "is cheap"]
dwstuff = []
for name in names:
  for desc in descriptions:
    if name!= "amigo" and desc != "is cheap":
      dwstuff.append(name + " " + desc)

Note how the clauses are all evaluated in order. Note that using the if clause in list comprehensions is different from using a ternary operator (aka conditional expression), ie. x if condition else y in the list comprehension expression:

names = ["finch", "ebot","amigo","pi"]
dwstuff1 = ["cute" if name=="finch" else "ugly" for name in names] # dwstuff1 is now ["cute","ugly","ugly","ugly"]
# this does something completely different
dwstuff2 = ["cute" for name in names if name == "finch"] # dwstuff2 is now ["cute"]

Nested List Comprehensions

You can nest list comprehensions:

colours = ["red","blue","green","yellow"]
print [ [colour[i] for i in range(2)] for colour in colours] # prints [["r", "e"], ["b","l"], ["g","r"],["y","e"]]

Note that this is different from using additional for clauses. You are basically creating a list of lists here.

Checking if an Iterable contains a certain element

char = "a"
astring = "abcdefg"
if char in astring:
  print char

That nifty in keyword.

Iterating through Lists

for item in alist:
  # do something

It's as simple as that, but there's a catch: what if you want to modify the list items while you loop?

alist = ["ham","egg","sausage"]
for item in alist:
  print "The item is", item
  item = "spam"
  print "The item is now", item

print alist # prints ["ham", "egg", "sausage"]; woops, what went wrong?

The problem is that item is like a copy of the actual list item at that point in the iteration. It just holds on to the value of the list item at that point but is not the actual list item in and of itself. Imagine that this:

for item in alist:
  item = "spam"

is actually like this:

for i in range(len(alist)):
  item = alist[i] # item now stores the value of alist[i]
  item = "spam" # item now stores the value 'spam'; this doesn't affect the value of alist[i]

So to modify list elements while looping, you need to access the element via its index, so you need to iterate with the index.

Note that Thou Shalt Not Delete/Add Elements to A List During Iteration. You have been warned.

Iterating with the index

for i, item in enumerate(iterable):
  # do something

Sometimes you want to iterate through an iterable while accessing the index for that iteration. You could just do something like:

for i in range(len(iterable)):
  # do something

But this is kinda ugly and not as explicit in what you actually want to do. And what you want to do is iterate over and iterable, while accessing the current index:

for i, item in enumerate(iterable):
  print "index", i
  print "item", item

"For the index and item in the enumeration of the iterable". Reads like English, see?

When to use Tuples versus Lists

Stack Overflow is your friend.

Nifty String Methods

Most commonly used:

  1. str.strip() - returns a copy of the string with leading and trailing whitespace removed

  2. str.split(delimiter) - returns a list of strings with the original string split by the specified delimiter

  3. str.replace(old,new) - returns a copy of the string with all instances of the old substring replaced by the new substring

  4. str.count(substr) - returns the number of non-overlapping occurences of substr in the string

Head to the docs for more.

Nifty List Methods

They're all used quite commonly.

Head to the docs for all.

Nifty Tuple Syntax

For a 0-element tuple, the parenthesis is key:

zerotup = ()
print zerotup # prints ()
print type(zerotup) # prints <type 'tuple'>

For single element tuples, the comma is key (not the parenthesis):

onetup = 1,
print onetup # prints (1,)
print type(onetup) # prints <type 'tuple'>

# this is not a tuple
notatup = (1)
print notatup # prints 1
print type(notatup) # prints <type 'int'>

For multi-element tuples, the commas between elements is key:

# these are all the same
tup = 1,2,3
tup = 1,2,3,
tup = (1,2,3)

Be wary when comparing tuples; the equality comparison operator has precedence over your commas:

def f(x):
  return x, x

print f(2) == 2,2 # prints (False, 2)

# what's happening is something like this
# print ((f(2) == 2), 2)
# what you want is this:

print f(2) == (2,2) # prints True

# what do you think the following prints?
print 2,2 == 2,2

Nifty Set Operations

A set is an unordered collection with no duplicates.

Say you have a list with multiples occurences of various items. If you want to remove all the duplicate items, you could use a for loop:

newlist = []
for item in oldlist:
  if item not in newlist:
    newlist.append(item)

Or you could convert the list into a set:

set(alist)

(You can convert the set back into a list using list() if need be)

Obviously using set() is more elegant, but the difference is that the order of your elements is not preserved (since it's an unordered collection), while the first method using the for loop preserves them. Usually this is not important.

Sets can be initialized using curly braces (but note that an empty set should be initialized with set(); {} initializes an empty dictionary):

a = {1,2,3,4,5}
print a # prints set([1,2,3,4,5])
b = {1,2,1}
print b # prints set([1,2])

Difference

a = {1,2,3,4,5,6}
b = {3,5,6,7,8,9,10}
# get elements in a that are not in b
print a - b # prints set([1,2,4])

Union

a = {1,2,3,4,5,6}
b = {3,5,6,7,8,9,10}
# get elements in either a or b
print a | b # prints set([1,2,3,4,5,6,7,8,9,10])

Intersection

a = {1,2,3,4,5,6}
b = {3,5,6,7,8,9,10}
# get elements in both a and b
print a & b # prints set([3,5,6])

Symmetric Difference

a = {1,2,3,4,5,6}
b = {3,5,6,7,8,9,10}
# get elements in either a or b but not both
# ie. the union minus the intersection
print a ^ b # prints set([1,2,4,7,8,9,10])

Mappings (Dictionary)

Converting a dictionary to a list of (key,value) pairs

Use d.items() where d is your dictionary. Returns a list of (key,value) tuples

This can also be used for iterating through the key-value pairs:

for k,v in d.items():
  print "Key is", k, "Value is", v

Dictionary with default value

Use collections.defaultdict. The constructor takes in factory callable (ie. function) that should return the default value when called. This means that when you try to access a value in a dictionary by a key that you've never assigned to it before, it will insert the default value returned by the factory function with the given key, instead of returning None as would happen in a normal dictionary. Better explained with examples than anything else:

Common patterns include setting the default value to an empty list:

d = collections.defaultdict(list)
for key in somelist:
  d[key].append("wut")

(the list function constructs an empty list when passed no arguments)

Setting the default value to 0 (for counting):

d = collecftions.defaultdict(int)
for key in somelist:
  d[key] +=1

(the int function returns 0 when passed no arguments)

Of course you could do it in a slightly more roundabout way with a normal dictionary:

d ={}
for key in somelist:
  if d.get(key) == None:
    d[key] = 0
  else:
    d[key] += 1

Note that you have to test whether the key exists with d.get() because if you try to access a non-existent key with d[key] Python will raise a KeyError (though you could do a try/except if you really really want. Just go with the defaultdict please)

Iterating through dictionary keys

It's good practice to always do

for key in d:

as opposed to

for key in d.keys()

Though they seem to do the same thing, the first is actually faster (has lower asymptotic complexity) though the time difference obviously won't show up for the size of dictionaries you'll be dealing with. Still, if for anything else, save yourself some typing.

For the interested, this difference is because d.keys() returns a list of the dictionary keys and iterates through that, but it takes time to build that list (O(n) time if you have n items, and are familiar with asymptotic notation). Doing key in d calls a function that does a efficient hash lookup in O(1) time (this fast lookup time is actually what motivates the hash data structure, of which Pyton's dictionary is a specific implementation)

File IO

Note: if you are using Canopy, make sure you are running it from the correct directory relative to the file you are trying to open, or you will get an IOError. Do this by right-clicking the Python pane (the interactive prompt) and selecting Keep working directory synced to current file or something to that effect. You can also change your working directory manually if need be

When accessing files there's the notion of a current pointer which points to where you are in your file right now. Calling certain methods on the file object will cause the pointer to move, and if you are not aware of this it can cause some confusion.

Opening files

open(filename,[mode])

The filename can be specified as a relative path or as an absolute path. (In the former case, ensure your working directory is set correctly, or the relative path might be broken). The most convenient is usually to have the file you are trying to open and your python script in the same directory.

The mode can be specified as r (read only), w (write only, with truncation, meaning you will overwrite whatever was originally in the file; also creates the file if it doesn't exist), and a+ (append to the file, so previous content won't be overwritten; also creates the file if it doesn't exist). If you need to read and write simultaneously, you can use the modes r+ (read and write, with the initial position at the start of the file), w+ (read and write, and truncates on write, meaning it will overwrite previous file content; also creates the file if it doesn't exist) and a+ (read and write, initial position at the end of file; also creates the file if it doesn't exist).

If the mode is not specified it defaults to r

(There is another optional buffering argument but you won't need it)

Reading from Files

You can read the entire contents of the file as a single string:

f = open("somefile.txt")
filestring = f.read()

Doing so moves the file's current pointer to the end of file. If you want to process the file's contents again, you have to reposition the pointer to the beginning using f.seek(0)

You can also read a single line:

aline = f.readline()

A line is demarcated by a newline \n character and this character is also kept by readline() (ie., in the example above, aline will be a string containing a \n at the end). Using readline() will move the file's current pointer to the start of the next line.

You can also get all lines at once in as a list of strings:

lines = f.readlines() # lines is a list of strings, eg. ["line1\n","line2\n"...]

The newline character is likewise kept. The current pointer will be moved to the end of the file.

If you need to go through each line of the file, you could definitely call f.readline() repeatedly until the EOF (end of file) is reached (whereupon f.readline() will return an empty string), but you really shouldn't (it's not pythonic). Just iterate through a file like so:

for line in f:
  #do something

(Yes, a file object is also an iterable as well as an iterator)

Should you need to iterate through the file again, remember to reset the current pointer to the start with f.seek(0)

Moving the Current Pointer

f.seek(byteoffset)

Moves the current pointer byteoffset bytes from the start of the file. Normally you'd just need to move to to the beginning of the file, using:

f.seek(0)

Writing to Files

Note that file output is buffered, so unless you flush or close the output stream, the string may not actually be written to file. So remember to always f.close() your files!

To write a string:

f.write(somestring)

To write a list of strings:

f.writelines(listofstrings)

Note that writelines() doesn't add a newline character. It's just named to match readlines(). For example:

f.writelines(["some","lines","hoho"]) # produces a file with the text: somelineshoho

Closing Files

f.close()

You should make it a habit to always close files that you open, even though the python garbage collector will close the file for you when it destroys the file object (you won't have control over this and the exact behaviour varies over python versions so you shouldn't depend on it). Actually a better way to handle file resources is to use the with statement:

with open("somefile.txt","w") as f:
  # do stuff with your file
  print f.write("writing")

This will ensure that your file is always closed. This is good to know, though you don't really need to know about it at this point.

Object-Oriented Programming (OOP)

OOP is a programming paradigm that logically encapsulates code in objects. Objects couple data (in the form of fields/attributes) with functions or procedures (methods).

You can think of the OOP paradigm as the way you intuitively view the world too: you classify the objects in the world based on properties they have (attributes) and the things they can do (methods). For example, a dog might have brown fur, an age and a weight, and it can run, bark, eat etc.

How might we write a Dog class?

class Dog:

  # This is a special method called the constructor
  # You initialize instance variables/attributes here
  def __init__(self, color, age, weight):
    self.color = color
    self.age = age
    self.weight = weight

  # A method to run
  def run(self, distance):
    self.weight -= 0.1 * distance # dogs burn calories too!

  # A method to bark
  def bark(self):
    print "Woof woof!"

  # A method to eat
  def eat(self, amount):
    self.weight += amount

Classes and Instances

The form of OOP that python employs is class-based, meaning that you define classes and work with objects that are instances of those classes.

You can think of a class as your own custom type. Python has its own built-in types such as int, str, list etc. But sometimes you want to work with your own type, your own category of things, such as the Dog class we saw above.

A class is like a general category, but most times you don't want to work with the category per se, you want to work with an instance of that category (a real, concrete Fido, if you will, the one that slobbers all over your face in the morning; not some abstract Platonic notion of 'a dog'). There can be many many instances of a single class.

Constructors and Instantiating Objects

How do we get this concrete instance? We have to instantiate/construct it, by calling the special method known as the constructor:

# Assuming I've already defined my Dog class
# as in the previous snippet of code

fido = Dog("yellow", 5, 30) # let me instantiate a Dog, and store it in the variable fido

To call the constructor, we see that we just do classname(arguments). Just like any other function call, we have to pass it arguments. But where did we define our constructor, in our class? This is where constructors are a little different from normal functions you've seen so far. Let's go back to the definition of our Dog class:

class Dog:

  # This is our constructor definition, right here
  def __init__(self, color, age, weight):
    self.color = color
    self.age = age
    self.weight = weight

Even though you called your constructor by classname(arguments), when you defined your constructor, you used the name __init__. That's just how it is. The __init__ method is what is called a magic method in python. These are methods which you can define in your classes which python treats somewhat differently from normal methods. They add some 'magical' behavior, so to speak. The constructor, or __init__ method is by far the most important.

A constructor allows you to initialize your instance (hence the name __init__). This means setting values to your object attributes when you create it. The attributes of a Dog instance are its color, age, and weight. Note how these instance attributes are prefixed with self., like self.color, self.age etc.

That Magical Word self

What is self? Notice how all our method definitions have self as the first parameter. This is because a reference to the instance you are calling the method on is being implicitly passed to the method as the first argument, whenever you call methods on object instances using dot notation:

# Remember our Dog class had a bark method
class Dog:
  # ...
  # stuff

  def bark(self):
    print "woof woof!"

dog = Dog()
dog.bark() # here I'm calling the method bark on dog,
           # which is an instance of Dog

How is it that when I call dog.bark() I don't pass any arguments to bark(), but in the definition of bark I actually have a parameter self?

This is because when calling dog.bark(), we are actually calling a bound method, ie. the bark method is bound to the instance dog, and so python implicitly passes a reference to the instance dog into the self parameter of bark. Thus we don't explicitly pass in arguments to bark when we do dog.bark().

Try this (in continuation with the previous code-block, ie. I assume that I've already defined the Dog class, and instantiated dog):

print dog.bark # prints <bound_method Dog.bark of <__main__.Dog instance at 0xsomehexaddresss>
print Dog.bark # prints <unbound_method Dog.bark>
b = dog.bark # remember we can pass methods around as well
print b.__self__ # prints <__main__.Dog object at 0xsomehexaddress>
                 # we can get back the object to which the method was bound!

dog.bark() # prints "woof woof!"
Dog.bark() # throws this error:
           # TypeError: unbound method bark() must be called
           # with Dog instance as first argument
           # (got nothing instead)
Dog.bark(dog) # prints "woof woof!"

So we see how python is passing a reference to the instance on which the method is called, under the hood, and that reference is stored in the self parameter.

So that's the magic behind self. And actually, it really isn't very magical at all. Consider this:

class Dog:
  def __init__(trolololol, color, age, weight):
    trolololol.color = color
    trolololol.age = age
    trolololol.weight = weight

  def bark(hohohoho):
    print "woof woof I'm a " + str(hohohoho.age) " year old dog!"

dog = Dog("yellow",10, 40)
dog.bark() # prints "woof woof I'm a 10 year old dog!"

Works just like normal. Except, who the hell would ever write that kind of code??? Yep, only Digital World TAs. We have superpowers. #beadigitalworldTA

Class Attributes

Most of time, we only need attributes that belong to specific instances of a class. For example, the color, age and weight attributes are particular to a specific instance of dog. Two different dog instances should keep track of their own color, age and weight.

fido = Dog("black", 10, 40)
dingo = Dog("yellow", 2, 20)

But what if there are some attributes that should be global to all instances of Dog? For example, suppose we would like to have an attribute that tracks the threshold weight over which a dog would be considered obese.

obese dogs yo

Logically speaking, this should not be an attribute that belongs to each instance of Dog. This threshold weight should be the same for all instances of Dog. It should be an attribute of the Dog class itself, a class attribute.

class Dog:
  # here we initialize a class attribute, in the class body
  obeseweight = 30

  def __init__(self, color, age, weight):
    self.color = color
    self.age = age
    self.weight = weight

  def check_obese(self):
    # we could also write this as
    # if self.weight > Dog.obeseweight:
    # but it's less elegant and maintainable
    if self.weight > self.__class__.obeseweight:
      return True
    return False

dog = Dog("yellow", 10, 40)
print dog.check_obese() # prints True

Note now we initialized the class attribute, and accessed it within our method definitions. Try this instead:

class Dog:
  # here we initialize a class attribute, in the class body
  obeseweight = 30

  def __init__(self, color, age, weight):
    self.color = color
    self.age = age
    self.weight = weight

  def check_obese(self):
    if self.weight > obeseweight: # what happens here?
      return True
    return False

dog = Dog("yellow", 10, 40)
print dog.check_obese() # throws NameError: global name 'obeseweight' is not defined

The following error is thrown: NameError: global name 'obeseweight' is not defined. Remember from our discussion of variable scope, class definitions introduce a new code block with a new scope, but this scope does not extend to nested blocks, that is, it does not extend to the methods that you define in your class definitions. That's why you get that error thrown: obeseweight is not defined within your method scope.

Now try this:

class Dog:
  # here we initialize a class attribute, in the class body
  obeseweight = 30

  def __init__(self, color, age, weight):
    self.color = color
    self.age = age
    self.weight = weight

  def check_obese(self):
    if self.weight > self.obeseweight: # we can access it like an instance attribute too?
      return True
    return False

dog = Dog("yellow", 10, 40)
print dog.check_obese() # prints True

This works too. Note how you can actually access the class attribute through the instance as well. This is convenient but can be quite confusing. Python keeps track of an instance namespace and a class namespace. If you try to access an attribute through an instance, python checks the instance namespace first. If it doesn't find anything, it checks the class namespace. So the above code is able to access the obeseweight class attribute through self.obeseweight.

Now you know how to properly access class attributes. But what about updating class attributes? This is even trickier. Suppose we want to have a class attributes that keeps track of the number of instances:

class Dog:
  count = 0

  def __init__(self, color,age,weight):
    self.color = color
    self.age = age
    self.weight = weight
    self.__class__.count += 1

dog1 = Dog("yellow", 10, 30)
dog2 = Dog("black", 10, 40)
dog3 = Dog("black", 10, 40)
print Dog.count # prints 3
print dog1.count # prints 3
print dog2.count # prints 3
print dog3.count # prints 3

This works fine, but how about this:

class Dog:
  count = 0

  def __init__(self, color,age,weight):
    self.color = color
    self.age = age
    self.weight = weight
    self.count += 1

dog1 = Dog("yellow", 10, 30)
dog2 = Dog("black", 10, 40)
dog3 = Dog("black", 10, 40)
print Dog.count # prints 0; WHUT HAAPPPENED 
print dog1.count # prints 1
print dog2.count # prints 1
print dog3.count # prints 1

If the class attribute was set by accessing the class (as we did with self.__class__.count += 1), then the new value will be set for the class (and hence all instances). But if it was set by accessing an instance (as we did with self.count += 1), then the new value will be set only for that instance; in effect you have created an instance variable of the same name that overrides the class variable.

Inheritance

One of the main features of OOP is inheritance. This allows code to be easily reused and makes your program more modular and maintainable.

The idea is that you have parent classes which provide a set of baseline functionalities, and you have children classes which inherit these from the parent class, but extend them with functionalities specific to each child.

Back to our Dog class, we have this base parent class that defines all these basic attributes and methods:

class Dog:
  obeseweight = 30

  def __init__(self, color, age, weight):
    self.color = color
    self.age = age
    self.weight = weight

  def run(self, distance):
    self.weight -= 0.1 * distance # dogs burn calories too!

  def bark(self):
    print "Woof woof!"

  def eat(self, amount):
    self.weight += amount

  def check_obese(self):
    if self.weight > self.__class__.obeseweight:
        return True
    return False

But there are many kinds of dogs! And it doesn't make sense to have the same obeseweight threshold for all breeds of dogs. So let's create some child classes, also known as subclasses:

class Chihuahua(Dog): # this is the way of saying class
                      # Chihuahua inherits from class Dog
  obeseweight = 5 # overwrite the parent class attribute

class BerneseMountainDog(Dog):
  obeseweight = 55

chi = Chihuahua("yellow", 1, 6)
bernese = BerneseMountainDog("black", 4, 40)
genericdog = Dog("white",10,20)

# child classes inherit methods from their parent class!
chi.bark() # prints "Woof woof!"
print chi.check_obese() # prints True
print bernese.check_obese() # prints False
print genericdog.check_obese() # prints False

Using inheritance is way better than copying and pasting your Dog class code into your Chihuahua and BerneseMountainDog classes.

A slightly subtle point to note - what if the check_obese method had been written like this instead:

 def check_obese(self):
    # hard-code the reference Dog class attribute
    if self.weight > Dog.obeseweight:
        return True
    return False

then

print chi.check_obese() # prints False
print bernese.check_obese() # prints True

This is why we should avoid hardcoding the class name when referencing class attributes and access it through the built-in attribute __class__ of the object instance, that is, self; ie. self.__class__.classattribute.

We've seen how we can override a parent class attribute. What of overriding or extending a parent class method?

class Chihuahua(Dog):
  obeseweight = 5

  # we want to override the bark method
  def bark(self):
    print "Squeak squeak!"

chi = Chihuahua("yellow",10, 3)
chi.bark() # prints "Squeak squeak!" instead of "Woof woof!"

When you define a method in the child class with the same name as a method defined in the parent class, the child class method overrides the parent class method. But what if we don't want to completely override the parent class method, but simply extend it? That is, we still want the base functionality provided by the parent method, but we want to add more stuff? For example, suppose for Chihuahuas we want to have an extra attribute called hairlength:

class Chihuahua(Dog):
  obeseweight = 5

  # we want to extend the constructor
  def __init__(self, color, age, weight, hairlength):
    # call the parent class constructor
    Dog.__init__(self, color,age,weight)
    # note that hard-coding the parent class name works but
    # isn't best practice

    # add the extra stuff specific to this child class
    self.hairlength = hairlength

  # we want to override the bark method
  def bark(self):
    print "Squeak squeak!"

chi = Chihuahua("yellow", 3, 5, "short")
print chi.hairlength # prints "short"

A short digression on Old-style and New-style Classes

This stuff is slightly tricky but for extra knowledge

Notice how we hard-coded in the parent class name Dog in order to access the parent class __init__ method. That really isn't ideal, but is the only way we can access the parent class method when we are dealing with so-called 'old-style' classes. New-style classes on the other hand are more flexible and allow us to use the method super() to in a way get a handle on the parent class without explicitly naming it.

New-style classes must inherit from the built-in object class:

class Dog(object): # Dog is a new-style class
  obeseweight = 30

  def __init__(self, color, age, weight):
    self.color = color
    self.age = age
    self.weight = weight

  def run(self, distance):
    self.weight -= 0.1 * distance

  def bark(self):
    print "Woof woof!"

  def eat(self, amount):
    self.weight += amount

  def check_obese(self):
    if self.weight > self.__class__.obeseweight:
        return True
    return False

class Chihuahua(Dog):
  obeseweight = 5

  # we want to extend the constructor
  def __init__(self, color, age, weight, hairlength):
    # call the parent class constructor
    # using super instead of hard-coding the name
    super(Chihuahua, self).__init__(self, color,age,weight)
    # gee, doesn't this look familiar somehow?

    # add the extra stuff specific to this child class
    self.hairlength = hairlength

  # we want to override the bark method
  def bark(self):
    print "Squeak squeak!"

chi = Chihuahua("yellow", 3, 5, "short")
print chi.hairlength # prints "short"

What's the difference between old-style and new-style classes?

class OldStyle:
  pass

class NewStyle:
  pass

print type(OldStyle) # prints <type 'classobj'>
print type(NewStyle) # prints <type 'type'>
print type(int) # prints <type 'type'>

New-style classes are fully legitimate custom types, on the same footing as built-in python types like int. New-style classes are more flexible and in fact in Python 3x all classes are new-style.

State Machines

You've been using the Digital World libary (libdw) to implement state machines in python so far. If you understood that whole deal about inheritance, you should realize by now that you've been using it all along to create your state machine classes:

import libdw.sm as sm

class MySM(sm.SM): # you are inheriting from the base SM class
                   # defined in the libdw.sm module

  # you are overriding the default class attribute
  # defined in the base SM class, which initializes
  # startState as None by default
  startState = "somestartstate"

  # you are overriding the getNextValues method in the base class
  def getNextValues(self, state, inp):
    # do stuff
    # make sure you return a tuple of
    # (nextstate, output)

Tackling State Machine questions

Don't jump straight into writing code, unless you already know you won't confuse yourself. Think at a higher level first. Draw out a state machine diagram. Define your states and inputs clearly. Then the coding becomes trivial.

What should your state be? A way to figure this out is to ask the question, what kind of information to I need to remember? The inputs and outputs should be something specified in the question.

Common Mistakes

Unbound Local Error

Often in getNextValues you end up building an if/elif/else tree for all the condition-checking (what's my state, what's my input). Then you initialize the variables nextState and output in some of these conditional branches but not in others. Then you get this kind of error and become sadded:

import libdw.sm as sm

class MySM(sm.SM):

  def getNextValues(self, state, inp):
    if state == 0:
      nextState = 1
      output = 1
    elif state == 1:
      nextState = 0
      output = 0

    return nextState, output

mySM = MySM()
mySM.transduce([0,0,1])

# you get this error message
#Traceback (most recent call last):
#  File "yourfile.py", line 13, in <module>
#    mySM.transduce([0,0,1])
#  File "libdw/sm.py", line 147, in transduce
#  File "libdw/sm.py", line 101, in step
#  File "yourfile.py", line 10, in getNextValues
#    return nextState, output
#UnboundLocalError: local variable 'nextState' referenced before asnment

The problem is that are some conditional paths for which nextState and output are not initialized, but your return statement, which will be executed no matter what, references nextState and output. Think about if your state happens to be not 0, or 1, but 2. Then nextState and output will never have been initialized before the return.

But, you protest, what if my state machine is never going to be in any other state than 0 or 1? Well, python doesn't ever care or know about that; it just knows that your code may very well end up referencing uninitialized variables, and it's not going to let that happen.

So if you're really sure that state 0 and 1 are all the state you will ever need, just change the elif state == 1 to else. That will fix things for this particular problem.

But more often than not you will be building much more complicated conditional trees than this, and rather than having to ensure that your nextState and output are defined under every possible condition, just initialize them at the very top of the function:

def getNextValues(self, state, inp):
  nextState, output = None, None

  if state == 0:
    nextState, output = 1, 1
  elif state ==1:
    nextState, output = 0, 0

  return nextState, output

Return value is None

Some of you don't like to define nextState and output variables, and prefer to directly return under your if/elif/else blocks, which is also perfectly valid, except sometimes this happens:

import libdw.sm as sm

class MySM(sm.SM):

  def getNextValues(self, state, inp):
    if state == 0:
        return 1,1
    elif state == 1:
        return 0,0


mySM = MySM()
mySM.transduce([0,0,1])

# you get this error mesage
#Traceback (most recent call last):
#  File "test.py", line 62, in <module>
#    mySM.transduce([1])
#  File "libdw/sm.py", line 147, in transduce
#  File "libdw/sm.py", line 101, in step
#TypeError: 'NoneType' object is not iterable

This happens for similar reasons to the UnboundLocalError - there are some conditional paths for which you are not explicitly returning a value. So python returns None by default. But transduce is expecting your getNextValues to return a tuple (which is an iterable) of the next state and the output, so you get that error.

For this particular case, you can fix it by defining startState = 0, so you will always either be in state 1 or state 0. Otherwise startState is None by default.

Got libdw source code or not?

Suuuuure - here ya go. You'll be mainly interested in the libdw.sm module. (That's Prof Oka's github repo by the way)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment