Created
November 12, 2012 05:09
-
-
Save dkua/4057618 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "pyconca" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h1><center>Introduction to Numerical and Scientific Computing with Python</center></h1>\n<h3><center>David Warde-Farley</center></h3>\n<h4><center>Laboratoire d'Informatique des Syst\u00e8mes Adaptifs (LISA)<br/>Departement d'Informatique et de Recherche Operationelle (DIRO)<br />Universit\u00e9 de Montr\u00e9al</center></h4>" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "This tutorial is available as an IPython notebook.\n\nGet the notebook from:\n\n* http://bit.ly/QwFqFX\n\nFollow along at:\n\n* http://bit.ly/UAKFjH" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>IPython and the IPython Notebook</h2>\n[IPython](http://www.ipython.org/) is an enhanced interactive shell for Python, including an HTML notebook interface in which this tutorial is available (and presented, using an experimental slideshow mode written by Matthias Bussonnier)." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "IPython provides\n\n * Interactive plotting by managing a GUI event loop thread, or by inline browser display\n * Easy object introspection via tab completion\n * Real time syntax highlighting (Qt console and HTML notebook only)\n * `obj?` docstring lookup, `objjQuery171032988427649252117_1352570802922` code lookup (where available)\n * Easy history management, logging capabilities\n * Parallel computation capabilities (not covered here)\n * Much more!\n\nIt is by far the most pleasant way to do interactive scientific work in Python, but is useful to any Python developer." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>The HTML Notebook</h2>\n\nIn order to run or re-run a cell, use Shift-Enter or Ctrl-Enter." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "# Enables inline plotting with matplotlib, imports a bunch of good stuff\n%pylab inline", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "plot(range(50), [x ** 2 for x in range(50)], '--')", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "NumPy: The fundamental package" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Python is a fairly good language for scientific work. It is easy to learn, binds easily to C code. But, insofar as most scientific computing is numerical in nature, it has two problems:" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": " * It is slow. In particular, function call overhead is quite high, and loops are fairly poorly performing.\n * Its basic data structures are optimized for use cases other than storing large quantities of numerical data. `array.array` is an option but it is clunky and not very featureful (see above)\n\n**NumPy** (the merging of former attempts at this, **Numeric** and **numarray**) solves this problem by providing an full-featured N-dimensional array object, and a few utility libraries (basic linear algebra, FFT, polynomials, pseudorandom numbers). It forms the basis for essentially all other Python packages in the scientific ecosystem." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Diving In</h2>\n\nAt first glance, NumPy arrays appear similar to (nested) Python sequences:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "import numpy as np\nnp.array([0, 2, 4])", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "np.array([[2, 9, 3], [2, 5, 7]])", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "Things are a little bit subtler, however." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a = np.array([[5, 2], [3, 9]])", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a.dtype", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "b = np.array([[5., 2], [3, 9]])", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "b", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "b.dtype", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>NumPy arrays in a nutshell</h2>\n\nNumPy arrays are\n\n* *explicitly typed** (every element shares a common, usually numeric, machine-level type),\n* homogeneous* (only one type of atomic element per array),\n* N-dimensional grids (no ragged dimensions -- necessary for rapid index computations)\n* Backed by a real chunk of memory" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Fundamentally important attributes:\n\n* `ndim` -- the number of dimensions\n* `shape` -- a tuple of integers; the length along each dimension (`len(x.shape) == x.ndim`)\n* `dtype` -- a descriptor of the individual elements of the array" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2><tt>dtype</tt>s in more detail</h2>\n\nUnlike Python builtin types, integers have fixed precision, and not all floating point numbers are doubles!" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Built-in NumPy numeric dtype\n\n* `int8`, `int16`, `int32`, `int64`, `uint*` variants\n* `float32`, `float64` (a.k.a. single and double precision, or `float` and `double` in C)\n* `bool`\n* string/Unicode (with bounded character length) and object arrays are also supported\n\nIn all multi-byte cases, NumPy supports both big- and little-endian modes." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "**Compound dtypes** can be created, with field names:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "dt = np.dtype([('name', (str, 8)), ('age', 'int8')])\nnp.array([('David', 27), ('Joan', 70)], dtype=dt)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "arr = _; arr['age']", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Can be quite powerful for tabular data; integrates with [PyTables](http://pytables.github.com/) HDF5 tables. [Pandas](http://pandas.pydata.org/) pushes this idea even further with a (NumPy-based) object called the DataFrame.\n\nWe will mostly focus on numeric arrays." | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "Other ways to create NumPy arrays" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "np.zeros((2, 3), dtype=np.float32) # all of these functions take a dtype keyword", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "np.empty((2, 5), dtype='>i2') # terse string codes too", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "Even more ways to create NumPy arrays" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "np.eye(4, dtype='complex64') # Identity matrix", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "np.arange(3, 10, 2) # Same arguments, behaviour as range()/xrange()", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "np.linspace(0, 1, 5) # 5 values linearly between 0 and 1 inclusive", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "np.logspace(-5, 0, 3) # args are exponent (base 10, or `base` keyword)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>ufuncs</h2>\n \nNumPy includes many *universal functions*, or *ufuncs*, that operate on every element in an array.\n\nThe loop is performed in C, so it's quite fast." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "np.sqrt(np.arange(20))", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>A brief diversion into plotting</h2>\n\n<a href=\"http://matplotlib.sourceforge.net/\">matplotlib</a> contains two interfaces to creating plots: one object-oriented and one procedural (MATLAB-like) that maintains module-level state." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "In a production application that requires a lot of customization of plots, the OO API is the way to go." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "... **But** the procedural interface is conceptually simple and the fastest way from idea to plot." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "x = np.linspace(-1.5, 1.5, 50)\ny = np.tanh(x)\nplot(x, y)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "Multiple `plot()` commands on the same active figure will draw all of them on the same axes." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "x = np.linspace(-2 * np.pi, 2 * np.pi, 50)\ny1 = np.sin(x)\ny2 = np.cos(x)\nplot(x, y1)\nplot(x, y2, '--') # Dashed line", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "x, y = numpy.random.normal(size=(2, 600)) # You can unpack along first axis!\nscatter(x, y)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "_ = hist(x, 20)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Basic indexing/slicing</h2>\n\nAnother big difference between NumPy arrays and nested sequences: much more powerful indexing." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a = np.random.uniform(size=(5, 3, 2))\na[0, 0, 0]", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a[0, 0, :]", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a[0, :2, :]", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a[1:5:2, 2, :] # Mixing single index, slicing", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Views</h2>\n\nNote that when slicing and indexing arrays like with single indices and slices, *no data copy is performed*. For example:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "b = a[0, 1:, :]\nprint b.shape", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "b[0, 0] = 1; b[0, 1] = 2; b[1, 0] = 3; b[1, 1] = 4", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Now, if we look at the same slice of `a`:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a[0, 1:, :]", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "This is a very good thing for memory efficiency and speed, but can lead to unexpected bugs for newcomers. The `copy` method is your friend." | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "Exercise 1" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a = array([[ 1, 2, 3, 4],\n [ 5, 6, 7, 8],\n [ 9, 10, 11, 12],\n [13, 14, 15, 16]])", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Index into the array to pull out the 2x2 sub-array corresponding to 2, 4, 6 and 8." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Set the 9, 11, 13 and 15 elements to 0. Note that you can use these indexing expressions on the left hand side of an assignment." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Advanced Indexing</h2>\n\nAdvanced or ``fancy'' indexing allows one to use arrays or lists to indexing along a given dimension. In contrast to basic indexing and slicing, advanced indexing **always** creates a copy." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "b = np.arange(15).reshape((5, 3))\nb", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "b[[0, 1, 4]] # Implicitly b[[0, 1, 4], :]", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Multiple axes and advanced indexing</h2>\n\n" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "b[[1, 4], [0, 2]]", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "We're not limited to one-dimensional index arrays along each axis." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "b[[[1, 4], [2, 1]], [[2, 1], [1, 2]]]", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Note that deterministic but unintuitive things happen when you mix advanced indexing and slicing. My advice: *just say no*." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Exercise 2</h2>\n\nPull out the values 1, 2, 3, 4, 5 from `c` in ascending order using advanced indexing." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "c = np.array([[4, 9, 2, 5],\n [1, 6, 3, 7]])", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Elementwise operations</h2>\n\nPart of the real power of NumPy lies in how easily and quickly (at C speed) you can manipulate the values in whole arrays at once." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "5 * np.arange(4) + 3", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Arithmetic with scalars applies to the entire array. Array-array operations work as expected (elementwise) when things have the same shape." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "np.arange(4) + np.array([3, 5, 3, 1])", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Broadcasting</h2>\n\nWhen two arrays **don't** have the same shape, NumPy will attempt to **broadcast** them to the same shape. This first and foremost means that any dimensions with length 1 in one array are **repeated along that axis** for the purposes of the operation taking place. This is best illustrated with an example." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "row = np.array([[3, 6, 9]])\nrow", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "identity = np.eye(3)\nidentity", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "row.shape", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "identity.shape", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "row + identity", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "col = np.array([[3],\n [6],\n [9]])\ncol.shape", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "col + identity", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>More Broadcasting Fun</h2>\n\nIf two arrays in an operation do not have the same number of dimensions, NumPy follows a simple rule to try and broadcast them:\n\n<h2><center><em>Right pad the shape of the array with less dimensions with 1s,<br/>then proceed as above.</em></center></h2>\n\n(*) You can think of a scalar as a 0-dimensional array with shape `()`, and this unifies broadcasting and array-scalar operations." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "one_d = np.array([3, 6, 9])\none_d.shape", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "one_d + np.eye(3)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>What if I want the other way?</h2>\n\nOften you want to add singleton axes to an existing array of lower dimension. The easiest method is to index with `np.newaxis` (which is actually defined as `None`):" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "one_d[:, np.newaxis].shape", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "one_d[:, np.newaxis] + np.eye(3)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "The `reshape` method/function also works. " | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Reductions</h2>\n\nIn many, many settings, we care about some property of a group of elements. We call these generally *reductions*." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "NumPy arrays support a number of commonly required reductions as methods:\n\n* `min`, `max` -- Minimum and maximum element\n* `argmin`, `argmax` -- *index* of the minimum/maximum element\n* `sum` -- sum of elements\n* `prod` -- product of elements\n* `mean` -- mean element\n* `var`, `std` -- variance, standard deviation\n* `all` -- do all elements evaluate boolean `True`?\n* `any` -- do any elements evaluate boolean `True`?\n\nThey are also available as top-level functions in the `numpy` module.\n\nDefaults to *all elements*, but an `axis` argument allows reduction over a given axis." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "r = numpy.random.uniform(size=(50, 20))", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "r.mean()", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "r.mean(axis=1)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Exercise 3</h2>\n\nWrite a function `normalize_vectors` that takes a 2D array and divides each row by the norm of that row, where the norm is defined as $\\sqrt{x_1^2 + x_2^2 + \\ldots + x_n^2}$." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "def normalize_vectors(x):\n \n \n \n pass", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Elementwise comparisons</h2>\n\nComparing NumPy arrays with each other, or with a scalar, operates elementwise on the arrays. All of the same broadcasting rules apply." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a = np.array([[4, 9],\n [5, 3]])\n\na > 4", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "Boolean arrays can be used for indexing, as well:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a[a > 4]", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "(a > 0).all()", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "(a < 2).any()", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Exercise 4</h2>\n\nA **binomial** probability distribution with parameters $n$ and $p$ describes the number of times a coin lands heads given $n$ independent coin flips, where each coin flip has probability $p$ of turning up heads. Using `numpy.random.uniform` to generate uniform random numbers between 0 and 1, write a function `binomial` that generates samples from a binomial distribution with given $n$ and $p$ parameters.\n\n**Hint:** Given that you have access to uniform random numbers in $[0, 1]$, would you go about generating a binary coin flip such that heads (call it `True`) occurs with the right probability $p$?" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "def binomial(size, n, p):\n \"\"\"\n `size` is a shape tuple indicating the desired shape\n of the output array.\n\n `n` is the number of coin flips.\n `p` is the probability of a coin landing heads.\n \"\"\"\n \n \n \n \n pass", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>That's the basics!</h2>\n\nWith these tools you should be able to tackle a wide variety of numerical tasks;\n\n* `numpy.dot` is the matrix multiply operator. the `dot` method also exists, allowing one to write the (slightly) more natural `a.dot(b)`\n* `numpy.linalg` includes several basic utility functions for solving linear systems, matrix decompositions and least squares fits.\n* `numpy.fft` contains implementations of the fast Fourier transform (not the fastest, but suitably licensed)." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "But NumPy is just the beginning. " | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Exercise 5</h2>" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "A common operation in working with probabilities is to divide several positive values by their sum, so that the divided values sum to 1. For a variety of reasons (chief among them avoiding *underflow* in intermediate computations), we typically work with the logarithms of the actual values. In order to compute the normalizer of $x_1, x_2, \\ldots, x_N$, we require to compute $\\log\\sum_{i=1}^N \\exp(x_i)$. This poses a problem because $\\exp(a)$ might get very big for large values of $a$ and overflow. However we can side step the problem by noting the following:\n\n$$\\log \\sum_{i=1}^N \\exp(x_i) = \\log \\left(m\\sum_{i=1}^N \\frac{1}{m}\\exp(x_i)\\right) = \\log m + \\log \\sum_{i=1}^N \\exp(x_i - \\log m)$$\n\nSo we can choose $\\log m$, the thing that gets subtracted, to be large enough that $\\exp$ on the result won't overflow. A good choice is the maximum element.\n\nWrite a function `logsumexp` that implements this trick to calculate these quantities without overflow. Make `axis` an argument that chooses the axis over which to sum." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "def logsumexp(x, axis=None):\n \n\n \n pass", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Performance tips</h2>\n\nBecause of the way that the interpreter evaluation works, intermediate expressions allocate lots of temporaries. Also costs you speed." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "`(a + b) / (c + d + e)` allocates 5 arrays, including the final result\n\n* With broadcasting in the mix, this can be bad news memory-wise -- gigantic temporaries!" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "One NumPy-only way to get around this: extended assignment operators allow you to do *in-place operations*" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a = np.array([[3, 9, 6], [2, 5, 7]])\na += np.array([2, 1, 0])\na **= 2\na", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>... But that's only half the story</h2>" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Consider the following:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "def fooz(a, b, c):\n return sqrt(a) + b ** 2 + 5 * c\n\nfooz(np.arange(50), np.arange(0, 100, 2), np.arange(0, 200, 4))", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "This performs several loops over each of `a`, `b` and `c`." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "Modern CPU caches *hate* this." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "<h2>Enter Cython</h2>\n\n[Cython](http://www.cython.org/) is a Python-like language that compiles to C extension modules. It allows for *explicit C type annotations*; with loop index variables, it translates loops down to C." | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"source": "A Google Summer of Code project in 2008 by Dag Sverre Seljebotn (now a core Cython developer) added NumPy array support." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "%load_ext cythonmagic", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "%%cython\ncimport numpy as np\nimport numpy as np\ncdef extern from \"math.h\":\n double sqrt(double x)\n\ndef fooz_cy(np.ndarray[np.float64_t, ndim=1] a,\n np.ndarray[np.float64_t, ndim=1] b,\n np.ndarray[np.float64_t, ndim=1] c):\n cdef np.npy_intp i\n cdef np.ndarray[np.float64_t, ndim=1] out = np.empty_like(a)\n for i in range(a.shape[0]):\n out[i] = sqrt(a[i]) + b[i] * b[i] + 5. * c[i]\n return out", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "a = np.arange(500, dtype=np.float64)\nb = np.arange(0, 1000, 2, dtype=np.float64)\nc = np.arange(0, 2000, 4, dtype=np.float64)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "timeit fooz(a, b, c)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "timeit fooz_cy(a, b, c)", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_start": true | |
} | |
}, | |
"source": "If we still have time: live discussion of more advanced techniques/idioms, Theano, etc." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "", | |
"language": "python", | |
"metadata": { | |
"slideshow": { | |
"slide_start": false | |
} | |
}, | |
"outputs": [] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment