Several data structures are available in python natively, such as tuples, lists and dictionaries. All of them can hold any python object, for example floats and strings, but other tupels, lists, dictonaries as well.
atuple = (11, 21)
alist = [1, 3, 4, 7, 12, 16, 19, 27]
The main difference is, that lists are mutable, while tuples are immutable [1].
A single item can be queried by inserting the index into brackets [i]
. Zero-based indexing is used.
print(atuple[1])
print(alist[4])
print(alist[-2]) # negative values start from end with -1 als the last element
Slicing can be done by a colon :
print(alist[2:]) # everything from the 2nd element on
print(alist[:4]) # everything up to the 4th element (excluding)
print(alist[2:6]) # elements between 2nd and 6th
print(alist[::3]) # every 3rd element
Several functions to modify lists are built-in
alist.append(43) # => [1, 3, 4, 7, 12, 16, 19, 27, 43]
alist.pop() # => return last element and remove it from list
alist.pop(3) # => return and remove element with index 3
alist.insert(4, 300) # insert 300 at index 4
alist[4] = 300 # replace element at index 4 by 300
len(alist) # length of the list
newlist = [7, 12] + [24, 65] # combine two lists
Lists can also be nested
nestedlist = [[1,2,3], [6,7,8]]
print(nestedlist[1][2])
Further read on lists: [2]
Dictonaries hold key-value pairs, where keys can be integers or strings and values can be any python object
adict = {'clouds': ["Ac", "Ns", "Ci"],
'colors': ["red" "blue"]}
print(adict[clouds])
print(adict[clouds][1])
The type of a variable can be quaried by
print(type(adict))
print(type(adict["clouds"]))
print(adict.keys()) # list all the keys
print(adict.values()) # list all the values
The python standard library has no (good & fast) support for multidimensional arrays. The numpy package (http://www.numpy.org/) implements n-dimensional arrays and routines to do calculations on them. The implementation is rather fast and the linear algebra operations are based on BLAS und LAPACK. Numpy is used by a variety of other libraries (matplotlib, netCDF4, ...) as well. Further read: Python Data Science Handbook [3], numpy for matlab users [4] and numpy documentation [5].
import numpy as np
a = np.array([1, 5, 3, 7, 9, 4, 10]) # generate a new python array from a list
print(type(a)) # => <type 'numpy.ndarray'>
print(a.shape) # display the dimensions of this array
a[4] = 20 # indexing is mostly similar to lists
b = np.array([[1,2,3], # example for a 3x3 array
[4,5,6],
[7,8,9]])
print(b.shape)
print(b[0,0]) # when indexing, different dimensions are separated by ,
print(b[1,:]) # all elements along index 1 of the first dimension
print(np.transpose(b)) # example for calculations on b
It is also possible to select only values, that fulfill a certain condition
print(a[a>5]) # which is a shorthand for
print(np.where(a>5, a))
Calculations can be done, as with normal python syntax or using numpy functions:
print(a + 5)
print(a / np.array([2,3,1,4,10,7,1]))
print(np.sum(b))
print(np.sum(b, axis=1)) # mean just along a specified axis
print(np.sum(b, axis=0))
print(np.mean(b))
Some standard arrays can be genrated conveniently
np.zeros((3,4)) # array filled with zeros of shape 3, 4
np.ones((3,4))
np.arange(5) # => numpy array containing [0, 1, 2, 3, 4]
It is possible to convert a numpy array back to a (nested) python list
b.tolist()
# => [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Several functions allow to append to an array.
np.concatenate()
appends to an existing axis
np.concatenate((a, a))
# => array([ 1, 5, 3, 7, 9, 4, 10, 1, 5, 3, 7, 9, 4, 10])
c = np.arange(6).reshape(2,3)
# => array([[0, 1, 2],
# [3, 4, 5]])
np.concatenate((c, c), axis=0)
# => array([[0, 1, 2],
# [3, 4, 5],
# [0, 1, 2],
# [3, 4, 5]])
np.concatenate((c, c), axis=1)
# => array([[0, 1, 2, 0, 1, 2],
# [3, 4, 5, 3, 4, 5]])
Another option is np.append()
. The two arrays have to be the same shape, despite
the axis to which is appended. If no axis is defined, the arrays will be flattened before combining them into a 1d array.
np.append([[1, 2, 3], [4, 5, 6]], [[7, 8, 9]], axis=0)
# => array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])
np.append([[1, 2, 3], [4, 5, 6]], [[7, 8, 9]])
# => array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Appending to a not-existing axis 2 would not work with concatenate
.
For that, the function np.stack()
may be used, which will always create a new axis
np.stack((a, a))
# => array([[ 1, 5, 3, 7, 9, 4, 10],
# => [ 1, 5, 3, 7, 9, 4, 10]])
np.stack((c, c))
# will produce an array with shape (2, 2, 3):
# => array([[[0, 1, 2],
# [3, 4, 5]],
# [[0, 1, 2],
# [3, 4, 5]]])
The functions np.hstack
, np.vstack
and np.dstack
are shorthands for appending
onto the first, second and thrid dimension (not necessarily creating a new axis)
np.hstack((c, c))
# => array([[0, 1, 2, 0, 1, 2],
# [3, 4, 5, 3, 4, 5]])
np.vstack((c, c))
# => array([[0, 1, 2],
# [3, 4, 5],
# [0, 1, 2],
# [3, 4, 5]])
np.dstack((c, c))
# not equal to np.stack((c, c)), produces shape (2, 3, 2)
# => array([[[0, 0],
# [1, 1],
# [2, 2]],
# [[3, 3],
# [4, 4],
# [5, 5]]])
Numpy also has the feature to treat masks and fill values internally by the
np.ma.array
. For most use cases these masked arrays behave like normal numpy arrays.
However, the performance of some calculations might be worse.
An overview is provided here [6]
d = np.array([1,2,3,99,3,6,1])
np.ma.masked_where(d==99, d)
# => masked_array(data = [1 2 3 -- 3 6 1],
# mask = [False False False True False False False],
# fill_value = 999999)
- [1] this works
alist[1] = 5
, but this notatuple[1]=5
- [2] https://www.digitalocean.com/community/tutorials/understanding-lists-in-python-3
- [3] https://jakevdp.github.io/PythonDataScienceHandbook/
- [4] https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html
- [5] https://docs.scipy.org/doc/numpy/
- [6] https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html