Skip to content

Instantly share code, notes, and snippets.

@dkua
Created November 12, 2012 05:09
Show Gist options
  • Save dkua/4057618 to your computer and use it in GitHub Desktop.
Save dkua/4057618 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "pyconca"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 2,
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h1><center>Introduction to Numerical and Scientific Computing with Python</center></h1>\n<h3><center>David Warde-Farley</center></h3>\n<h4><center>Laboratoire d'Informatique des Syst\u00e8mes Adaptifs (LISA)<br/>Departement d'Informatique et de Recherche Operationelle (DIRO)<br />Universit\u00e9 de Montr\u00e9al</center></h4>"
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "This tutorial is available as an IPython notebook.\n\nGet the notebook from:\n\n* http://bit.ly/QwFqFX\n\nFollow along at:\n\n* http://bit.ly/UAKFjH"
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>IPython and the IPython Notebook</h2>\n[IPython](http://www.ipython.org/) is an enhanced interactive shell for Python, including an HTML notebook interface in which this tutorial is available (and presented, using an experimental slideshow mode written by Matthias Bussonnier)."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "IPython provides\n\n * Interactive plotting by managing a GUI event loop thread, or by inline browser display\n * Easy object introspection via tab completion\n * Real time syntax highlighting (Qt console and HTML notebook only)\n * `obj?` docstring lookup, `objjQuery171032988427649252117_1352570802922` code lookup (where available)\n * Easy history management, logging capabilities\n * Parallel computation capabilities (not covered here)\n * Much more!\n\nIt is by far the most pleasant way to do interactive scientific work in Python, but is useful to any Python developer."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>The HTML Notebook</h2>\n\nIn order to run or re-run a cell, use Shift-Enter or Ctrl-Enter."
},
{
"cell_type": "code",
"collapsed": false,
"input": "# Enables inline plotting with matplotlib, imports a bunch of good stuff\n%pylab inline",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "plot(range(50), [x ** 2 for x in range(50)], '--')",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "NumPy: The fundamental package"
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Python is a fairly good language for scientific work. It is easy to learn, binds easily to C code. But, insofar as most scientific computing is numerical in nature, it has two problems:"
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": " * It is slow. In particular, function call overhead is quite high, and loops are fairly poorly performing.\n * Its basic data structures are optimized for use cases other than storing large quantities of numerical data. `array.array` is an option but it is clunky and not very featureful (see above)\n\n**NumPy** (the merging of former attempts at this, **Numeric** and **numarray**) solves this problem by providing an full-featured N-dimensional array object, and a few utility libraries (basic linear algebra, FFT, polynomials, pseudorandom numbers). It forms the basis for essentially all other Python packages in the scientific ecosystem."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Diving In</h2>\n\nAt first glance, NumPy arrays appear similar to (nested) Python sequences:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "import numpy as np\nnp.array([0, 2, 4])",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "np.array([[2, 9, 3], [2, 5, 7]])",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "Things are a little bit subtler, however."
},
{
"cell_type": "code",
"collapsed": false,
"input": "a = np.array([[5, 2], [3, 9]])",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "a.dtype",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "b = np.array([[5., 2], [3, 9]])",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "b",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "b.dtype",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>NumPy arrays in a nutshell</h2>\n\nNumPy arrays are\n\n* *explicitly typed** (every element shares a common, usually numeric, machine-level type),\n* homogeneous* (only one type of atomic element per array),\n* N-dimensional grids (no ragged dimensions -- necessary for rapid index computations)\n* Backed by a real chunk of memory"
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Fundamentally important attributes:\n\n* `ndim` -- the number of dimensions\n* `shape` -- a tuple of integers; the length along each dimension (`len(x.shape) == x.ndim`)\n* `dtype` -- a descriptor of the individual elements of the array"
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2><tt>dtype</tt>s in more detail</h2>\n\nUnlike Python builtin types, integers have fixed precision, and not all floating point numbers are doubles!"
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Built-in NumPy numeric dtype\n\n* `int8`, `int16`, `int32`, `int64`, `uint*` variants\n* `float32`, `float64` (a.k.a. single and double precision, or `float` and `double` in C)\n* `bool`\n* string/Unicode (with bounded character length) and object arrays are also supported\n\nIn all multi-byte cases, NumPy supports both big- and little-endian modes."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "**Compound dtypes** can be created, with field names:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "dt = np.dtype([('name', (str, 8)), ('age', 'int8')])\nnp.array([('David', 27), ('Joan', 70)], dtype=dt)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "arr = _; arr['age']",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Can be quite powerful for tabular data; integrates with [PyTables](http://pytables.github.com/) HDF5 tables. [Pandas](http://pandas.pydata.org/) pushes this idea even further with a (NumPy-based) object called the DataFrame.\n\nWe will mostly focus on numeric arrays."
},
{
"cell_type": "heading",
"level": 2,
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "Other ways to create NumPy arrays"
},
{
"cell_type": "code",
"collapsed": false,
"input": "np.zeros((2, 3), dtype=np.float32) # all of these functions take a dtype keyword",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "np.empty((2, 5), dtype='>i2') # terse string codes too",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "Even more ways to create NumPy arrays"
},
{
"cell_type": "code",
"collapsed": false,
"input": "np.eye(4, dtype='complex64') # Identity matrix",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "np.arange(3, 10, 2) # Same arguments, behaviour as range()/xrange()",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "np.linspace(0, 1, 5) # 5 values linearly between 0 and 1 inclusive",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "np.logspace(-5, 0, 3) # args are exponent (base 10, or `base` keyword)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>ufuncs</h2>\n \nNumPy includes many *universal functions*, or *ufuncs*, that operate on every element in an array.\n\nThe loop is performed in C, so it's quite fast."
},
{
"cell_type": "code",
"collapsed": false,
"input": "np.sqrt(np.arange(20))",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>A brief diversion into plotting</h2>\n\n<a href=\"http://matplotlib.sourceforge.net/\">matplotlib</a> contains two interfaces to creating plots: one object-oriented and one procedural (MATLAB-like) that maintains module-level state."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "In a production application that requires a lot of customization of plots, the OO API is the way to go."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "... **But** the procedural interface is conceptually simple and the fastest way from idea to plot."
},
{
"cell_type": "code",
"collapsed": false,
"input": "x = np.linspace(-1.5, 1.5, 50)\ny = np.tanh(x)\nplot(x, y)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "Multiple `plot()` commands on the same active figure will draw all of them on the same axes."
},
{
"cell_type": "code",
"collapsed": false,
"input": "x = np.linspace(-2 * np.pi, 2 * np.pi, 50)\ny1 = np.sin(x)\ny2 = np.cos(x)\nplot(x, y1)\nplot(x, y2, '--') # Dashed line",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "x, y = numpy.random.normal(size=(2, 600)) # You can unpack along first axis!\nscatter(x, y)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "_ = hist(x, 20)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Basic indexing/slicing</h2>\n\nAnother big difference between NumPy arrays and nested sequences: much more powerful indexing."
},
{
"cell_type": "code",
"collapsed": false,
"input": "a = np.random.uniform(size=(5, 3, 2))\na[0, 0, 0]",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "a[0, 0, :]",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "a[0, :2, :]",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "a[1:5:2, 2, :] # Mixing single index, slicing",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Views</h2>\n\nNote that when slicing and indexing arrays like with single indices and slices, *no data copy is performed*. For example:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "b = a[0, 1:, :]\nprint b.shape",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "b[0, 0] = 1; b[0, 1] = 2; b[1, 0] = 3; b[1, 1] = 4",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Now, if we look at the same slice of `a`:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "a[0, 1:, :]",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "This is a very good thing for memory efficiency and speed, but can lead to unexpected bugs for newcomers. The `copy` method is your friend."
},
{
"cell_type": "heading",
"level": 2,
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "Exercise 1"
},
{
"cell_type": "code",
"collapsed": false,
"input": "a = array([[ 1, 2, 3, 4],\n [ 5, 6, 7, 8],\n [ 9, 10, 11, 12],\n [13, 14, 15, 16]])",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Index into the array to pull out the 2x2 sub-array corresponding to 2, 4, 6 and 8."
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Set the 9, 11, 13 and 15 elements to 0. Note that you can use these indexing expressions on the left hand side of an assignment."
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Advanced Indexing</h2>\n\nAdvanced or ``fancy'' indexing allows one to use arrays or lists to indexing along a given dimension. In contrast to basic indexing and slicing, advanced indexing **always** creates a copy."
},
{
"cell_type": "code",
"collapsed": false,
"input": "b = np.arange(15).reshape((5, 3))\nb",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "b[[0, 1, 4]] # Implicitly b[[0, 1, 4], :]",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Multiple axes and advanced indexing</h2>\n\n"
},
{
"cell_type": "code",
"collapsed": false,
"input": "b[[1, 4], [0, 2]]",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "We're not limited to one-dimensional index arrays along each axis."
},
{
"cell_type": "code",
"collapsed": false,
"input": "b[[[1, 4], [2, 1]], [[2, 1], [1, 2]]]",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Note that deterministic but unintuitive things happen when you mix advanced indexing and slicing. My advice: *just say no*."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Exercise 2</h2>\n\nPull out the values 1, 2, 3, 4, 5 from `c` in ascending order using advanced indexing."
},
{
"cell_type": "code",
"collapsed": false,
"input": "c = np.array([[4, 9, 2, 5],\n [1, 6, 3, 7]])",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Elementwise operations</h2>\n\nPart of the real power of NumPy lies in how easily and quickly (at C speed) you can manipulate the values in whole arrays at once."
},
{
"cell_type": "code",
"collapsed": false,
"input": "5 * np.arange(4) + 3",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Arithmetic with scalars applies to the entire array. Array-array operations work as expected (elementwise) when things have the same shape."
},
{
"cell_type": "code",
"collapsed": false,
"input": "np.arange(4) + np.array([3, 5, 3, 1])",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Broadcasting</h2>\n\nWhen two arrays **don't** have the same shape, NumPy will attempt to **broadcast** them to the same shape. This first and foremost means that any dimensions with length 1 in one array are **repeated along that axis** for the purposes of the operation taking place. This is best illustrated with an example."
},
{
"cell_type": "code",
"collapsed": false,
"input": "row = np.array([[3, 6, 9]])\nrow",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "identity = np.eye(3)\nidentity",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "row.shape",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "identity.shape",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "row + identity",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "col = np.array([[3],\n [6],\n [9]])\ncol.shape",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "col + identity",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>More Broadcasting Fun</h2>\n\nIf two arrays in an operation do not have the same number of dimensions, NumPy follows a simple rule to try and broadcast them:\n\n<h2><center><em>Right pad the shape of the array with less dimensions with 1s,<br/>then proceed as above.</em></center></h2>\n\n(*) You can think of a scalar as a 0-dimensional array with shape `()`, and this unifies broadcasting and array-scalar operations."
},
{
"cell_type": "code",
"collapsed": false,
"input": "one_d = np.array([3, 6, 9])\none_d.shape",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "one_d + np.eye(3)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>What if I want the other way?</h2>\n\nOften you want to add singleton axes to an existing array of lower dimension. The easiest method is to index with `np.newaxis` (which is actually defined as `None`):"
},
{
"cell_type": "code",
"collapsed": false,
"input": "one_d[:, np.newaxis].shape",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "one_d[:, np.newaxis] + np.eye(3)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "The `reshape` method/function also works. "
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Reductions</h2>\n\nIn many, many settings, we care about some property of a group of elements. We call these generally *reductions*."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "NumPy arrays support a number of commonly required reductions as methods:\n\n* `min`, `max` -- Minimum and maximum element\n* `argmin`, `argmax` -- *index* of the minimum/maximum element\n* `sum` -- sum of elements\n* `prod` -- product of elements\n* `mean` -- mean element\n* `var`, `std` -- variance, standard deviation\n* `all` -- do all elements evaluate boolean `True`?\n* `any` -- do any elements evaluate boolean `True`?\n\nThey are also available as top-level functions in the `numpy` module.\n\nDefaults to *all elements*, but an `axis` argument allows reduction over a given axis."
},
{
"cell_type": "code",
"collapsed": false,
"input": "r = numpy.random.uniform(size=(50, 20))",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "r.mean()",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "r.mean(axis=1)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Exercise 3</h2>\n\nWrite a function `normalize_vectors` that takes a 2D array and divides each row by the norm of that row, where the norm is defined as $\\sqrt{x_1^2 + x_2^2 + \\ldots + x_n^2}$."
},
{
"cell_type": "code",
"collapsed": false,
"input": "def normalize_vectors(x):\n \n \n \n pass",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Elementwise comparisons</h2>\n\nComparing NumPy arrays with each other, or with a scalar, operates elementwise on the arrays. All of the same broadcasting rules apply."
},
{
"cell_type": "code",
"collapsed": false,
"input": "a = np.array([[4, 9],\n [5, 3]])\n\na > 4",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "Boolean arrays can be used for indexing, as well:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "a[a > 4]",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "(a > 0).all()",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "(a < 2).any()",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Exercise 4</h2>\n\nA **binomial** probability distribution with parameters $n$ and $p$ describes the number of times a coin lands heads given $n$ independent coin flips, where each coin flip has probability $p$ of turning up heads. Using `numpy.random.uniform` to generate uniform random numbers between 0 and 1, write a function `binomial` that generates samples from a binomial distribution with given $n$ and $p$ parameters.\n\n**Hint:** Given that you have access to uniform random numbers in $[0, 1]$, would you go about generating a binary coin flip such that heads (call it `True`) occurs with the right probability $p$?"
},
{
"cell_type": "code",
"collapsed": false,
"input": "def binomial(size, n, p):\n \"\"\"\n `size` is a shape tuple indicating the desired shape\n of the output array.\n\n `n` is the number of coin flips.\n `p` is the probability of a coin landing heads.\n \"\"\"\n \n \n \n \n pass",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>That's the basics!</h2>\n\nWith these tools you should be able to tackle a wide variety of numerical tasks;\n\n* `numpy.dot` is the matrix multiply operator. the `dot` method also exists, allowing one to write the (slightly) more natural `a.dot(b)`\n* `numpy.linalg` includes several basic utility functions for solving linear systems, matrix decompositions and least squares fits.\n* `numpy.fft` contains implementations of the fast Fourier transform (not the fastest, but suitably licensed)."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "But NumPy is just the beginning. "
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Exercise 5</h2>"
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "A common operation in working with probabilities is to divide several positive values by their sum, so that the divided values sum to 1. For a variety of reasons (chief among them avoiding *underflow* in intermediate computations), we typically work with the logarithms of the actual values. In order to compute the normalizer of $x_1, x_2, \\ldots, x_N$, we require to compute $\\log\\sum_{i=1}^N \\exp(x_i)$. This poses a problem because $\\exp(a)$ might get very big for large values of $a$ and overflow. However we can side step the problem by noting the following:\n\n$$\\log \\sum_{i=1}^N \\exp(x_i) = \\log \\left(m\\sum_{i=1}^N \\frac{1}{m}\\exp(x_i)\\right) = \\log m + \\log \\sum_{i=1}^N \\exp(x_i - \\log m)$$\n\nSo we can choose $\\log m$, the thing that gets subtracted, to be large enough that $\\exp$ on the result won't overflow. A good choice is the maximum element.\n\nWrite a function `logsumexp` that implements this trick to calculate these quantities without overflow. Make `axis` an argument that chooses the axis over which to sum."
},
{
"cell_type": "code",
"collapsed": false,
"input": "def logsumexp(x, axis=None):\n \n\n \n pass",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Performance tips</h2>\n\nBecause of the way that the interpreter evaluation works, intermediate expressions allocate lots of temporaries. Also costs you speed."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "`(a + b) / (c + d + e)` allocates 5 arrays, including the final result\n\n* With broadcasting in the mix, this can be bad news memory-wise -- gigantic temporaries!"
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "One NumPy-only way to get around this: extended assignment operators allow you to do *in-place operations*"
},
{
"cell_type": "code",
"collapsed": false,
"input": "a = np.array([[3, 9, 6], [2, 5, 7]])\na += np.array([2, 1, 0])\na **= 2\na",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>... But that's only half the story</h2>"
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Consider the following:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "def fooz(a, b, c):\n return sqrt(a) + b ** 2 + 5 * c\n\nfooz(np.arange(50), np.arange(0, 100, 2), np.arange(0, 200, 4))",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "This performs several loops over each of `a`, `b` and `c`."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "Modern CPU caches *hate* this."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "<h2>Enter Cython</h2>\n\n[Cython](http://www.cython.org/) is a Python-like language that compiles to C extension modules. It allows for *explicit C type annotations*; with loop index variables, it translates loops down to C."
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"source": "A Google Summer of Code project in 2008 by Dag Sverre Seljebotn (now a core Cython developer) added NumPy array support."
},
{
"cell_type": "code",
"collapsed": false,
"input": "%load_ext cythonmagic",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "%%cython\ncimport numpy as np\nimport numpy as np\ncdef extern from \"math.h\":\n double sqrt(double x)\n\ndef fooz_cy(np.ndarray[np.float64_t, ndim=1] a,\n np.ndarray[np.float64_t, ndim=1] b,\n np.ndarray[np.float64_t, ndim=1] c):\n cdef np.npy_intp i\n cdef np.ndarray[np.float64_t, ndim=1] out = np.empty_like(a)\n for i in range(a.shape[0]):\n out[i] = sqrt(a[i]) + b[i] * b[i] + 5. * c[i]\n return out",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "a = np.arange(500, dtype=np.float64)\nb = np.arange(0, 1000, 2, dtype=np.float64)\nc = np.arange(0, 2000, 4, dtype=np.float64)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "timeit fooz(a, b, c)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "timeit fooz_cy(a, b, c)",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_start": true
}
},
"source": "If we still have time: live discussion of more advanced techniques/idioms, Theano, etc."
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {
"slideshow": {
"slide_start": false
}
},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment