martin-rdz/python_data_structures.md

## python_data_structures.md

      
    Raw
  

              python_data_structures.md
            
          
    Python data structures in the context of scientific programming

Built-in

Several data structures are available in python natively, such as tuples, lists and dictionaries.
All of them can hold any python object, for example floats and strings, but other tupels, lists, dictonaries as well.
atuple = (11, 21)
alist = [1, 3, 4, 7, 12, 16, 19, 27]
The main difference is, that lists are mutable, while tuples are immutable [1].
A single item can be queried by inserting the index into brackets [i]. Zero-based indexing is used.
print(atuple[1])
print(alist[4])
print(alist[-2]) # negative values start from end with -1 als the last element
Slicing can be done by a colon :
print(alist[2:])  # everything from the 2nd element on
print(alist[:4])  # everything up to the 4th element (excluding)
print(alist[2:6]) # elements between 2nd and 6th
print(alist[::3]) # every 3rd element
Several functions to modify lists are built-in
alist.append(43) # => [1, 3, 4, 7, 12, 16, 19, 27, 43]
alist.pop()      # => return last element and remove it from list
alist.pop(3)     # => return and remove element with index 3

alist.insert(4, 300) # insert 300 at index 4
alist[4] = 300       # replace element at index 4 by 300

len(alist)       # length of the list

newlist = [7, 12] + [24, 65]  # combine two lists
Lists can also be nested
nestedlist = [[1,2,3], [6,7,8]]
print(nestedlist[1][2])
Further read on lists: [2]
Dictonaries hold key-value pairs, where keys can be integers or strings and values can be any python object
adict = {'clouds': ["Ac", "Ns", "Ci"],
         'colors': ["red" "blue"]}

print(adict[clouds])
print(adict[clouds][1])
The type of a variable can be quaried by
print(type(adict))
print(type(adict["clouds"]))

print(adict.keys())      # list all the keys
print(adict.values())    # list all the values
Multi dimensional arrays with numpy

The python standard library has no (good & fast) support for multidimensional arrays.
The numpy package (http://www.numpy.org/) implements n-dimensional arrays and
routines to do calculations on them. The implementation is rather fast and the
linear algebra operations are based on BLAS und LAPACK. Numpy is used by
a variety of other libraries (matplotlib, netCDF4, ...) as well.
Further read: Python Data Science Handbook [3],
numpy for matlab users [4] and  numpy documentation [5].
import numpy as np

a = np.array([1, 5, 3, 7, 9, 4, 10])  # generate a new python array from a list
print(type(a))   # => <type 'numpy.ndarray'>

print(a.shape)                     # display the dimensions of this array
a[4] = 20    # indexing is mostly similar to lists

b = np.array([[1,2,3],     # example for a 3x3 array
              [4,5,6],
              [7,8,9]])
print(b.shape)
print(b[0,0])              # when indexing, different dimensions are separated by ,
print(b[1,:])              # all elements along index 1 of the first dimension
print(np.transpose(b))     # example for calculations on b
It is also possible to select only values, that fulfill a certain condition
print(a[a>5]) # which is a shorthand for
print(np.where(a>5, a))
Calculations can be done, as with normal python syntax or using numpy functions:
print(a + 5)
print(a / np.array([2,3,1,4,10,7,1]))
print(np.sum(b))
print(np.sum(b, axis=1)) # mean just along a specified axis
print(np.sum(b, axis=0))
print(np.mean(b))
Some standard arrays can be genrated conveniently
np.zeros((3,4))    # array filled with zeros of shape 3, 4
np.ones((3,4)) 
np.arange(5)       # => numpy array containing [0, 1, 2, 3, 4]
It is possible to convert a numpy array back to a (nested) python list
b.tolist()
# => [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Several functions allow to append to an array.
np.concatenate() appends to an existing axis
np.concatenate((a, a))
# => array([ 1,  5,  3,  7,  9,  4, 10,  1,  5,  3,  7,  9,  4, 10])

c = np.arange(6).reshape(2,3)
# => array([[0, 1, 2],
#           [3, 4, 5]])

np.concatenate((c, c), axis=0)
# => array([[0, 1, 2],
#           [3, 4, 5],
#           [0, 1, 2],
#           [3, 4, 5]])

np.concatenate((c, c), axis=1)
# => array([[0, 1, 2, 0, 1, 2],
#           [3, 4, 5, 3, 4, 5]])
Another option is np.append(). The two arrays have to be the same shape, despite
the axis to which is appended. If no axis is defined, the arrays will be flattened before combining them into a 1d array.
np.append([[1, 2, 3], [4, 5, 6]], [[7, 8, 9]], axis=0)
# => array([[1, 2, 3],
#           [4, 5, 6],
#           [7, 8, 9]])

np.append([[1, 2, 3], [4, 5, 6]], [[7, 8, 9]])
# => array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Appending to a not-existing axis 2 would not work with concatenate.
For that, the function np.stack() may be used, which will always create a new axis
np.stack((a, a))
# => array([[ 1,  5,  3,  7,  9,  4, 10],
# =>        [ 1,  5,  3,  7,  9,  4, 10]])

np.stack((c, c))
# will produce an array with shape (2, 2, 3):
# => array([[[0, 1, 2],
#            [3, 4, 5]],
#           [[0, 1, 2],
#            [3, 4, 5]]])
The functions np.hstack, np.vstack and  np.dstack are shorthands for appending
onto the first, second and thrid dimension (not necessarily creating a new axis)
np.hstack((c, c))
# => array([[0, 1, 2, 0, 1, 2],
#           [3, 4, 5, 3, 4, 5]])

np.vstack((c, c))
# => array([[0, 1, 2],
#           [3, 4, 5],
#           [0, 1, 2],
#           [3, 4, 5]])

np.dstack((c, c))
# not equal to np.stack((c, c)), produces shape (2, 3, 2)
# => array([[[0, 0],
#            [1, 1],
#            [2, 2]],
#           [[3, 3],
#            [4, 4],
#            [5, 5]]])
masked arrays

Numpy also has the feature to treat masks and fill values internally by the
np.ma.array. For most use cases these masked arrays behave like normal numpy arrays.
However, the performance of some calculations might be worse.
An overview is provided here [6]
d = np.array([1,2,3,99,3,6,1])
np.ma.masked_where(d==99, d)
# => masked_array(data = [1 2 3 -- 3 6 1],
#                 mask = [False False False  True False False False],
#           fill_value = 999999)
Footnotes


[1] this works alist[1] = 5, but this not atuple[1]=5
[2] https://www.digitalocean.com/community/tutorials/understanding-lists-in-python-3
[3] https://jakevdp.github.io/PythonDataScienceHandbook/
[4] https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html
[5] https://docs.scipy.org/doc/numpy/
[6] https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html