Skip to content

Instantly share code, notes, and snippets.

@martin-rdz
Last active September 14, 2019 08:20
Show Gist options
  • Save martin-rdz/1018e3b19601edeba52eeac91520d711 to your computer and use it in GitHub Desktop.
Save martin-rdz/1018e3b19601edeba52eeac91520d711 to your computer and use it in GitHub Desktop.
Short overview on python built-in data structures and numpy arrays

Python data structures in the context of scientific programming

Built-in

Several data structures are available in python natively, such as tuples, lists and dictionaries. All of them can hold any python object, for example floats and strings, but other tupels, lists, dictonaries as well.

atuple = (11, 21)
alist = [1, 3, 4, 7, 12, 16, 19, 27]

The main difference is, that lists are mutable, while tuples are immutable [1]. A single item can be queried by inserting the index into brackets [i]. Zero-based indexing is used.

print(atuple[1])
print(alist[4])
print(alist[-2]) # negative values start from end with -1 als the last element

Slicing can be done by a colon :

print(alist[2:])  # everything from the 2nd element on
print(alist[:4])  # everything up to the 4th element (excluding)
print(alist[2:6]) # elements between 2nd and 6th
print(alist[::3]) # every 3rd element

Several functions to modify lists are built-in

alist.append(43) # => [1, 3, 4, 7, 12, 16, 19, 27, 43]
alist.pop()      # => return last element and remove it from list
alist.pop(3)     # => return and remove element with index 3

alist.insert(4, 300) # insert 300 at index 4
alist[4] = 300       # replace element at index 4 by 300

len(alist)       # length of the list

newlist = [7, 12] + [24, 65]  # combine two lists

Lists can also be nested

nestedlist = [[1,2,3], [6,7,8]]
print(nestedlist[1][2])

Further read on lists: [2]

Dictonaries hold key-value pairs, where keys can be integers or strings and values can be any python object

adict = {'clouds': ["Ac", "Ns", "Ci"],
         'colors': ["red" "blue"]}

print(adict[clouds])
print(adict[clouds][1])

The type of a variable can be quaried by

print(type(adict))
print(type(adict["clouds"]))

print(adict.keys())      # list all the keys
print(adict.values())    # list all the values

Multi dimensional arrays with numpy

The python standard library has no (good & fast) support for multidimensional arrays. The numpy package (http://www.numpy.org/) implements n-dimensional arrays and routines to do calculations on them. The implementation is rather fast and the linear algebra operations are based on BLAS und LAPACK. Numpy is used by a variety of other libraries (matplotlib, netCDF4, ...) as well. Further read: Python Data Science Handbook [3], numpy for matlab users [4] and numpy documentation [5].

import numpy as np

a = np.array([1, 5, 3, 7, 9, 4, 10])  # generate a new python array from a list
print(type(a))   # => <type 'numpy.ndarray'>

print(a.shape)                     # display the dimensions of this array
a[4] = 20    # indexing is mostly similar to lists

b = np.array([[1,2,3],     # example for a 3x3 array
              [4,5,6],
              [7,8,9]])
print(b.shape)
print(b[0,0])              # when indexing, different dimensions are separated by ,
print(b[1,:])              # all elements along index 1 of the first dimension
print(np.transpose(b))     # example for calculations on b

It is also possible to select only values, that fulfill a certain condition

print(a[a>5]) # which is a shorthand for
print(np.where(a>5, a))

Calculations can be done, as with normal python syntax or using numpy functions:

print(a + 5)
print(a / np.array([2,3,1,4,10,7,1]))
print(np.sum(b))
print(np.sum(b, axis=1)) # mean just along a specified axis
print(np.sum(b, axis=0))
print(np.mean(b))

Some standard arrays can be genrated conveniently

np.zeros((3,4))    # array filled with zeros of shape 3, 4
np.ones((3,4)) 
np.arange(5)       # => numpy array containing [0, 1, 2, 3, 4]

It is possible to convert a numpy array back to a (nested) python list

b.tolist()
# => [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Several functions allow to append to an array. np.concatenate() appends to an existing axis

np.concatenate((a, a))
# => array([ 1,  5,  3,  7,  9,  4, 10,  1,  5,  3,  7,  9,  4, 10])

c = np.arange(6).reshape(2,3)
# => array([[0, 1, 2],
#           [3, 4, 5]])

np.concatenate((c, c), axis=0)
# => array([[0, 1, 2],
#           [3, 4, 5],
#           [0, 1, 2],
#           [3, 4, 5]])

np.concatenate((c, c), axis=1)
# => array([[0, 1, 2, 0, 1, 2],
#           [3, 4, 5, 3, 4, 5]])

Another option is np.append(). The two arrays have to be the same shape, despite the axis to which is appended. If no axis is defined, the arrays will be flattened before combining them into a 1d array.

np.append([[1, 2, 3], [4, 5, 6]], [[7, 8, 9]], axis=0)
# => array([[1, 2, 3],
#           [4, 5, 6],
#           [7, 8, 9]])

np.append([[1, 2, 3], [4, 5, 6]], [[7, 8, 9]])
# => array([1, 2, 3, 4, 5, 6, 7, 8, 9])

Appending to a not-existing axis 2 would not work with concatenate. For that, the function np.stack() may be used, which will always create a new axis

np.stack((a, a))
# => array([[ 1,  5,  3,  7,  9,  4, 10],
# =>        [ 1,  5,  3,  7,  9,  4, 10]])

np.stack((c, c))
# will produce an array with shape (2, 2, 3):
# => array([[[0, 1, 2],
#            [3, 4, 5]],
#           [[0, 1, 2],
#            [3, 4, 5]]])

The functions np.hstack, np.vstack and np.dstack are shorthands for appending onto the first, second and thrid dimension (not necessarily creating a new axis)

np.hstack((c, c))
# => array([[0, 1, 2, 0, 1, 2],
#           [3, 4, 5, 3, 4, 5]])

np.vstack((c, c))
# => array([[0, 1, 2],
#           [3, 4, 5],
#           [0, 1, 2],
#           [3, 4, 5]])

np.dstack((c, c))
# not equal to np.stack((c, c)), produces shape (2, 3, 2)
# => array([[[0, 0],
#            [1, 1],
#            [2, 2]],
#           [[3, 3],
#            [4, 4],
#            [5, 5]]])

masked arrays

Numpy also has the feature to treat masks and fill values internally by the np.ma.array. For most use cases these masked arrays behave like normal numpy arrays. However, the performance of some calculations might be worse. An overview is provided here [6]

d = np.array([1,2,3,99,3,6,1])
np.ma.masked_where(d==99, d)
# => masked_array(data = [1 2 3 -- 3 6 1],
#                 mask = [False False False  True False False False],
#           fill_value = 999999)

Footnotes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment