Skip to content

Instantly share code, notes, and snippets.

@vollerts
Last active December 20, 2015 22:09
Show Gist options
  • Save vollerts/6202477 to your computer and use it in GitHub Desktop.
Save vollerts/6202477 to your computer and use it in GitHub Desktop.
Python | For Econometrics and data analysis. Based on "Python for Econometrics" by Kevin Sheppard.
************************************************************************************************************
*** Installation on Ubuntu 12.04 ***************************************************************************
************************************************************************************************************
# Dependencies
sudo apt-get install build-essential
sudo apt-get install libreadline-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev
# Download Python
cd ~/Downloads/
wget http://python.org/ftp/python/2.7.5/Python-2.7.5.tgz
# Extract Python
tar -xvf Python-2.7.5.tgz
cd Python-2.7.5
# install Python
./configure
make
sudo make altinstall
# install scipy numpy matplotlib etc.
sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose
# install easy_install
sudo apt-get install python-setuptools python-dev build-essential
# install iPhyton PyZMQ Pygments
sudo easy_install iPython
sudo easy_install PyZMQ
sudo easy_install Pygments
# install dependencies for spyder
sudo easy_install pyflakes
sudo easy_install rope
sudo easy_install sphynx
sudo easy_install pylint
sudo easy_install pep8
************************************************************************************************************
*** Built-in Data Types ************************************************************************************
************************************************************************************************************
*** core data types ****************************************************************************************
integer (32 or 64 bits depending on compiler)
long integer ('x = 1L')
float (64 bit, equivalent doubles in C++)
complex
bolean
string (str)
*** strings and slicing strings ******
y = 'some string'
print(y)
str[:] => Returns all str
str[−i] => Returns letter n − i
str[i] => Returns letter i
str[−i:] => Returnslettersn −i,...,n −1
str[i:] => Returnslettersi,...,n −1
str[:−i] => Returnsletters0,...,n −i
str[:i] => Returnsletters0,...,i −1
str[−j:−i:] => Returnslettersn −j,...,n −i
str[j:j:] => Returnslettersi,...,j −1
str[j:i:-1] => Returns letters j ,j − 1,. . .,i + 1
*** lists **************************************************************************************************
*** empty list***
>>> x = []
>>> type(x)
builtins.list
*** one dimensional ***
x=[1,2,3,4]
*** 2-dimensional list (list of lists) ***
x = [[1,2,3,4], [5,6,7,8]]
*** Jagged list, not rectangular ***
x = [[1,2,3,4] , [5,6,7]]
*** Mixed data types ***
x = [1,1.0,1+0j,'one',None,True]
*** slicing lists ***
x[:] => Return all x
x[i] => Return x at position i
x[i:] => Return x[i] to x[n-1]
x[:i] => Return x[0] to x[i-1]
x[i:j:] => Return x[i] to x[j-1]
x[−i] => Return x[n-i]
x[-i:] => Return x[n-i] to x[n-1]
x[:-i] => Return x[0] to x[n-i]
x[−j:−i:] => Return x[n-j] to x[n-i]
*** slicing for multidimensionals ***
x = [[1,2,3,4], [5,6,7,8]]
x[0][1]
x[1][1:4]
*** list functions *****************************************************************************************
list.append(x,value) => x.append(value) => Appends value to the end of the list.
len(x) => -- => Returns the number of elements in the list.
list.extend(x,list) => x.extend(list) => Appends the values in list to the existing list.
list.pop(x,index) => x.pop(index) => Removes the value in position index.
list.remove(x,value) => x.remove(value) => Removes the first occurrence of value from the list.
list.count(x,value) => x.count(value) => Counts the number of occurrences of value in the list.
del x[i] => delete at position i
del x[:i] => delete until position i
del x[i:] => delete after position i
*** Touples ***
yeah, it has been long ago... https://www.khanacademy.org/science/computer-science/v/python-lists
Touples are immutable. You can only use index() and count()
((1,2,)) => touple => COMMA!!! otherwise assigned as int
([1,2]) => list
*** xrange ***
Finds all integers x starting with a such a ≤ x < b iand where two consecutive values are separated by i
x = xrange(10) => declare range. can have multiple parameters
print(x) => xrange(10)
list(x) => convert xrange to list and prints data set [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
*** Dictionaries ***
Pass options into functions via key and value
>>> data = {'key1': 1234, 'key2' : [1,2]}
>>> data['key1']
1234
Assign new value to key
>>> data['key1'] = 'xyz'
>>> data['key1']
'xyz'
Adding new key value pair
>>> data['key3'] = 'abc'
>>> data
{'key1': 1234, 'key2': [1, 2], 'key3': 'abc'}
Deleting key value pair
>>> del data['key1']
>>> data
{'key2': [1, 2], 'key3': 'abc'}
*** Set (set, frozen) Functions ****************************************************************************
Sets are collections which contain all unique elements of a collection. set and frozenset only differ in that the latter is immutable (and so has higher performance).
set.add(x,element) => x.add(element) => Appends element to a set.
len(x) => - => Returns the number of elements in the set.
set.difference(x,set) => x.difference(set) => Returns the elements in x which are not in set.
set.intersection(x,set) => x.intersection(set) => Returns the elements of x which are also in set.
set.remove(x,element) => x.remove(element) => Removes element from the set.
set.union(x,set) => x.union(set) => Returns the set containing all elements of x and set.
Define a set
>>> x = set(['MSFT','GOOG','AAPL','HPQ'])
>>> x
set(['GOOG', 'AAPL', 'HPQ', 'MSFT'])
Add an element
>>> x.add('CSCO')
>>> x
set(['GOOG', 'AAPL', 'CSCO', 'HPQ', 'MSFT'])
Search for common elements in 2 sets
>>> y = set(['XOM', 'GOOG'])
>>> x.intersection(y)
set(['GOOG'])
merge 2 sets into 1
>>> x = x.union(y)
>>> x
set(['GOOG', 'AAPL', 'XOM', 'CSCO', 'HPQ', 'MSFT'])
Remove sth. from a set
>>> x.remove('XOM')
set(['GOOG', 'AAPL', 'CSCO', 'HPQ', 'MSFT'])
************************************************************************************************************
*** Memory, Arrays, Matrices *******************************************************************************
************************************************************************************************************
*** Memory Management in Python ***
if x=y both have a pointer to the same memory set => verify with id()
>>> x = 1
>>> y = x
>>> id(x)
82970264L
Lists are mutable objects and therefore do NOT have the same memory, except for immutable elements in the list
*** Arrays *************************************************************************************************
*** Initializing an array
>>> x = [0.0, 1, 2, 3, 4]
>>> y = array(x)
>>> y
array([0, 1, 2, 3, 4])
Multidimensional array
>>> y = array([[0.0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
>>> y
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.]])
>>> shape(y)
(2L, 5L)
*** Multidimensional array ***
>>> y = array([[[1,2],[3,4]],[[5,6],[7,8]]])
>>> y
array([[[1, 2],[3, 4]],[[5, 6],[7, 8]]])
>>> shape(y)
(2L, 2L, 2L)
*** Array dtypes ***
Find the dttype
>>> x = [0, 1, 2, 3, 4] # Integers
>>> y = array(x)
>>> y.dtype
dtype('int32')
NumPy attempts to find the smallest data type which can represent the data when constructing an array. It is possible to force NumPy to use a particular dtype by passing another argument, dtype=datetype to array(). Assign dtype.
>>> x = [0, 1, 2, 3, 4] # Integers
>>> y = array(x, dtype='float64')
>>> y.dtype
dtype('float64')
*** Matrix ***
.matrix() => converts 1/2 dimensional array to matrix
.mat() => converts 1/2 dimensional array to matrix (faster)
.asmatrix() => converts 1/2 dimensional array to matrix (faster)
*** Arrays, Matrices and Memory Management ***
ALLWAYS COPY so the data pointer changes as well!!
>>> x = array([[0.0, 1.0],[2.0,3.0]])
>>> y = copy(x)
>>> id(x)
130166048L
>>> id(y)
130165952L
Assignments from functions which change the value automatically create a copy.
>>> x = array([[0.0, 1.0],[2.0,3.0]])
>>> y = x
>>> id(x)
130166816L
>>> id(y)
130166816L
>>> y = x + 1.0
>>> y
array([[ 1., 2.], [ 3., 4.]])
>>> id(y)
130167008L
*** Create a complex Matrix ***
>>> x = array([[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]])
>>> x
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.]])
*** Concatenate Matrices ***
>>> x = array([[1.0,2.0],[3.0,4.0]])
>>> y = array([[5.0,6.0],[7.0,8.0]])
>>> z = concatenate((x,y),axis = 0)
>>> z = vstack((x,y)) # Same as z = concatenate((x,y),axis = 0)
>>> z
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.],
[ 7., 8.]])
>>> z = concatenate((x,y),axis = 1)
>>> z = hstack((x,y)) # Same as z = concatenate((x,y),axis = 1)
>>> z
array([[ 1., 2., 5., 6.],
[ 3., 4., 7., 8.]])
*** Accessing Elements of Array (Slicing) ***
>>> y = array([[0.0, 1, 2, 3, 4],[5, 6, 7, 8, 9]])
>>> y
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.]])
>>> y[0,:] # Row 0, all columns
array([ 0., 1., 2., 3., 4.])
>>> y[:,0] # all rows, column 0
array([ 0., 5.])
>>> y[0,0:3] # Row 0, columns 0 to 3
array([ 0., 1., 2.])
>>> y[0:,3:] # Row 0 and 1, columns 3 and 4
array([[ 3., 4.],
[ 8., 9.]])
>>> y = array([[[1.0,2],[3,4]],[[5,6],[7,8]]])
>>> y[0,:,:] # Panel 0 of 3D y
array([[1, 2],
[3, 4]])
>>> y[0] # Same as y[0,:,:]
array([[1., 2.],
[3., 4.]])
>>> y[0,0,:] # Row 0 of panel 0
array([1., 2.])
>>> y[0,1,0] # Panel 0, row 1, column 0
3.0
*** Linear Slicing using flat ***
Assigns index to each element.
>>> y = reshape(arange(25.0),(5,5))
>>> y
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.],
[ 20., 21., 22., 23., 24.]])
>>> y[0]
array([ 0., 1., 2., 3., 4.])
>>> y.flat[0]
0
>>> y[6] # Error
IndexError: index out of bounds
>>> y.flat[6]
6.0
>>> y.flat[:]
array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21.,
22., 23., 24.]])
*** Import Moules ***
import only specific functions, otherwise you may have conflicting loads!!
>>> from module import *
>>> from pylab import load # Will import load only
>>> from numpy import array, matrix # Will not import the load from NumPy
>>> import pylab as pl
>>> import scipy as sp
>>> import numpy as np
*** Calling Functions ***
>>> y = var(x)
>>> mean(y)
or
>>> mean(var(x))
*** Required Arguments ***
without argument
>>> array([[1.0,2.0],[3.0,4.0]])
array([[ 1., 2.],
[ 3., 4.]])
with argument
>>> array([[1.0,2.0],[3.0,4.0]], 'int32')
array([[1, 2],
[3, 4]])
*** Keyword Arguments ***
array(object=[[1.0,2.0],[3.0,4.0]])
array([[1.0,2.0],[3.0,4.0]], dtype=None, copy=True, order=None, subok=False, ndmin=0)
>>> array(dtype='complex64', object = [[1.0,2.0],[3.0,4.0]], copy=True)
array([[ 1.+0.j, 2.+0.j],
[ 3.+0.j, 4.+0.j]], dtype=complex64)
************************************************************************************************************
*** Math ***************************************************************************************************
************************************************************************************************************
*** Operators ***
+ Addition
- Subtraction
* Multiplication
/ Division (Left divide)
** Exponentiation
>>> x = 9
>>> y = 5
>>> (type(x), type(y))
(int, int)
>>> x/y 1
>>> float(x)/y
1.8
*** Broadcasting ***
Under the normal rules of array mathematics, addition and subtraction are only defined for arrays with the same shape or between an array and a scalar. For example, there is no obvious method to add a 5-element vector and a 5 by 4 matrix. NumPy uses a technique called broadcasting to allow mathematical operations on arrays (and matrices) which would not be compatible under the normal rules of array mathematics.
[0] if one array has fewer dimensions, it is treated as having the same number of dimensions as the larger array by prepending 1s
[1]arrays will only be broadcastable is either (a) they have the same dimension along axis i or (b) one has dimension 1 along axis i.
>>> x = reshape(arange(15),(3,5))
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> y = 1
>>> x + y - x
array([[5, 5, 5, 5, 5],
[5, 5, 5, 5, 5],
[5, 5, 5, 5, 5]])
>>> y = arange(5)
>>> y
array([0, 1, 2, 3, 4])])
>>> x + y - x
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
>>> y = arange(3)
>>> y
array([0, 1, 2])
>>> x + y - x # Error
ValueError: operands could not be broadcast together with shapes (3,5) (3)
*** Adding & Substraction ***
as usual
*** Array Multiplication ***
For arrays * is element- by-element multiplication and arrays must be broadcastable. For matrices, * is matrix multiplication as defined by linear algebra, and there is no broadcasting.
dot(x[i],y)
*** Array and Matrix Division ***
Division is always element-by-element, and the rules of broadcasting are used.
*** Array Exponentiation ***
Array exponentiation operates element-by-element, and the rules of broadcasting are used.
*** Matrix Multiplication ***
z=multiply(x,y)
*** Matrix Exponentiation ***
Can only be used on square matrices
transpose() or shortcut .T
>>> x = randn(2,2)
>>> xpx1 = x.T * x
>>> xpx2 = x.transpose() * x
>>> xpx3 = transpose(x) * x
*** Operator Precedence ***
() Parentheses
** Exponentiation
+,- Unary Plus, Unary Minus
*,/,% Multiply, Divide, Modulo
+,- Addition and Subtraction
<,<=,>,>= Comparison operators
==, != Equality operators
=,+=,-=,/=,*=,**= Assignment Operators
is, is not Identity Operators
in, not in Membership Operators
and, or,not Logical Operators
************************************************************************************************************
*** Basic Functions ****************************************************************************************
************************************************************************************************************
*** linspace ***
>>> x = linspace(0, 10, 11)
>>> x
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
*** logspace ***
>>> x = linspace(0, 10, 11)
>>> x
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
*** arrange ***
>>> x = arange(11)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> x = arange(11.0)
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
>>> x = arange(4, 10, 1.25)
array([ 4. , 5.25, 6.5 , 7.75, 9. ])
*** meshgrid ***
broadcasts two vectors into grids when plotting functions in 3 dimensions
>>> x = arange(5)
>>> y = arange(3)
>>> X,Y = meshgrid(x,y)
>>> X
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
>>> Y
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2]])
*** r_ ***
generates one dimensional row array
>>> r_[0:10:1] # arange equiv
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> r_[0:10:.5] # arange equiv
array([0., 0.5, 1., 1.5, 2., 2.5, 3., 3.5, 4., 4.5, 5., 5.5, 6., 6.5, 7., 7.5, 8., 8.5, 9., 9.5])
>>> r_[0:10:5j] # linspace equiv, includes end point
array([ 0. , 2.5, 5. , 7.5, 10. ])
*** c_ ***
generates one dimensional column array
>>> c_[0:5:2]
array([[0],
[2],
[4]])
>>> c_[1:5:4j]
array([[ 1. ],
[ 2.33333333],
[ 3.66666667],
[ 5. ]])
*** ix_ ***
constructs an n-dimensional open mesh from n 1-dimensional lists or arrays
>>> x = reshape(arange(25.0),(5,5))
>>> x
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.],
[ 20., 21., 22., 23., 24.]])
>>> x[ix_([2,3],[0,1,2])] # Rows 2 & 3, cols 0, 1 and 2
array([[ 10., 11., 12.],
[ 15., 16., 17.]])
*** mgrid ***
identical to ogrid, but better for vectors
>>> mgrid[0:3,0:2:.5]
array([[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]],
[[ 0. , 0.5, 1. , 1.5],
[0., 0.5, 1., 1.5],
[ 0. , 0.5, 1. , 1.5]]])
>>> mgrid[0:3:3j,0:2:5j]
array([[[0., 0., 0., 0., 0.],
[ 1.5, 1.5, 1.5, 1.5, 1.5],
[3., 3., 3., 3., 3.]],
[[0., 0.5, 1., 1.5, 2.],
[0., 0.5, 1., 1.5, 2.],
[0., 0.5, 1., 1.5, 2.]]])
*** ogrid ***
identical to mgrid, but better for loops
>>> ogrid[0:3,0:2:.5]
[array([[ 0.],
[ 1.],
[ 2.]]), array([[ 0. , 0.5, 1. , 1.5]])]
>>> ogrid[0:3:3j,0:2:5j]
[array([[ 0. ],
[ 1.5],
[ 3. ]]),
array([[ 0. , 0.5, 1. , 1.5, 2. ]])]
*** around, round ***
>>> x= randn(3)
array([ 0.60675173, -0.3361189 , -0.56688485])
>>> around(x)
array([ 1., 0., -1.])
>>> around(x, 2)
array([ 0.61, -0.34, -0.57])
*** floor ***
rounds to the next smallest integer
>>> x= randn(3)
array([ 0.60675173, -0.3361189 , -0.56688485])
>>> floor(x)
array([ 0., -1., -1.])
*** ceil ***
rounds to the next largest integer
>>> x= randn(3)
array([ 0.60675173, -0.3361189 , -0.56688485])
>>> ceil(x)
array([ 1., -0., -0.])
************************************************************************************************************
*** Mathematics ********************************************************************************************
************************************************************************************************************
*** sum, cumsum ***
sums all elements in an array
>>> x= randn(3,4)
>>> x
array([[-0.08542071, -2.05598312, 2.1114733 , 0.7986635 ],
[-0.17576066, 0.83327885, -0.64064119, -0.25631728],
[-0.38226593, -1.09519101, 0.29416551, 0.03059909]])
>>> sum(x) # all elements
-0.62339964288008698
>>> sum(x, 0) # Down rows, 4 elements
array([-0.6434473 , -2.31789529, 1.76499762, 0.57294532])
>>> sum(x, 1) # Across columns, 3 elements
array([ 0.76873297, -0.23944028, -1.15269233])
>>> cumsum(x,0) # Down rows
array([[-0.08542071, -2.05598312, 2.1114733 , 0.7986635 ],
[-0.26118137, -1.22270427, 1.47083211, 0.54234622],
[-0.6434473 , -2.31789529, 1.76499762, 0.57294532]])
*** prod, cumprod ***
work identically to sum and cumsum, except that the produce and cumulative product are returned. prod and cumprod can be called as function or methods.
*** diff ***
computes the finite difference on an vector (also array), and so return n-1 element when used on an n element vector
>>> x= randn(3,4)
>>> x
array([[-0.08542071, -2.05598312, 2.1114733 , 0.7986635 ],
[-0.17576066, 0.83327885, -0.64064119, -0.25631728],
[-0.38226593, -1.09519101, 0.29416551, 0.03059909]])
>>> diff(x) # Same as diff(x,1)
-0.62339964288008698
>>> diff(x, axis=0)
array([[-0.09033996, 2.88926197, -2.75211449, -1.05498078],
[-0.20650526, -1.92846986, 0.9348067 , 0.28691637]])
>>> diff(x, 2, axis=0) # Double difference, collumn-by-column
array([[-0.11616531, -4.81773183, 3.68692119, 1.34189715]])
*** exp ***
exp returns the element-by-element exponential (e x ) for an array.
*** log ***
log returns the element-by-element natural logarithm (ln(x )) for an array.
*** log10 ***
log10 returns the element-by-element base-10 logarithm (log10 (x )) for an array.
*** sqrt ***
sqrt returns the element-by-element square root ( x ) for an array.
*** square ***
square returns the element-by-element square (x2) for an array.
*** absolute ***
absolute returns the element-by-element absolute value for an array. For complex values inputs, |a + b i | √
= a2+b2.
*** sign ***
sign returns the element-by-element sign function which is defined as 0 if x = 0, and x /|x | otherwise.
*** real imag ***
x.real
x.imag
************************************************************************************************************
*** Set Functions ******************************************************************************************
************************************************************************************************************
*** Unique ***
return unique elements in array
>>> x = repeat(randn(3),(2))
array([ 0.11335982, 0.11335982, 0.26617443, 0.26617443, 1.34424621,
1.34424621])
>>> unique(x)
array([ 0.11335982, 0.26617443, 1.34424621])
*** in1d ***
returns a Boolean array with the same size as the first input array indicating the elements which are also in a second array
>>> x = arange(10.0)
>>> y = arange(5.0,15.0)
>>> in1d(x,y)
array([False, False, False, False, False, True, True, True, True, True], dtype=bool)
*** intersect1d ***
returns the elements instead of a Boolean array
>>> x = arange(10.0)
>>> y = arange(5.0,15.0)
>>> intersect1d(x,y)
array([ 5., 6., 7., 8., 9.])
*** union1d ***
returns unique set of elements from 2 arrays
>>> x = arange(10.0)
>>> y = arange(5.0,15.0)
>>> union1d(x,y)
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
11., 12., 13., 14.])
*** setdiff1d ***
returns set of elements from 1st array
>>> x = arange(10.0)
>>> y = arange(5.0,15.0)
>>> setdiff1d(x,y)
array([ 0., 1., 2., 3., 4.])
*** setxor1d ***
returns unique values from both arrays
>>> x = arange(10.0)
>>> y = arange(5.0,15.0)
>>> setxor1d(x,y)
array([ 0., 1., 2., 3., 4., 10., 11., 12., 13., 14.])
*** sort ***
sort array, option 1 or 0 to define axis
>>> x = randn(4,2)
>>> x
array([[ 1.29185667, 0.28150618],
[ 0.15985346, -0.93551769],
[ 0.12670061, 0.6705467 ],
[ 2.77186969, -0.85239722]])
>>> sort(x) # sort(x,1)
array([[ 0.28150618, 1.29185667],
[-0.93551769, 0.15985346],
[ 0.12670061, 0.6705467 ],
[-0.85239722, 2.77186969]])
*** ndarray.sort, argsort ***
>>> x= randn(3)
>>> x
array([ 2.70362768, -0.80380223, -0.10376901])
>>> sort(x)
array([-0.80380223, -0.10376901, 2.70362768])
>>> x
array([ 2.70362768, -0.80380223, -0.10376901])
*** max, amax, argmax, min, amin, argmin ***
self explanatory
*** minimum, maximum ***
min, max of 2 arrays
>>> x = randn(4)
>>> x
array([-0.00672734, 0.16735647, 0.00154181, -0.98676201])
>>> y = randn(4)
array([-0.69137963, -2.03640622, 0.71255975, -0.60003157])
>>> maximum(x,y)
array([-0.00672734, 0.16735647, 0.71255975, -0.60003157])
*** ones ***
generates array of ones
M, N = 5, 5
# Produces a N by M array of 1s
x = ones((M,N))
# Produces a M by M by N 3D array of 1s
x = ones((M,M,N))
# Produces a M by N array of 1s using 32 bit integers
x = ones((M,N), dtype='int32')
*** zeros ***
generates an array of zeros
# Produces a M by N array of 0s
x = zeros((M,N))
# Produces a M by M by N 3D array of 0s
x = zeros((M,M,N))
# Produces a M by N array of 0s using 64 bit integers
x = zeros((M,N),dtype='int64')
*** empty ***
generates an empty/ uninitialized array
# Produces a M by N array of 0s
x = zeros((M,N))
# Produces a M by M by N 3D array of 0s
x = zeros((M,M,N))
# Produces a M by N array of 0s using 64 bit integers
x = zeros((M,N),dtype='int64')
*** eye, identity ***
In = eye(N)
************************************************************************************************************
*** Array and Matrix Functions *****************************************************************************
************************************************************************************************************
*** Views **************************************************************************************************
*** view ***
views generate a representation of an array, creating objects which behave like other objects wihtout copying data
>>> x = arange(5)
>>> type(x)
numpy.ndarray
>>> x.view(np.matrix)
matrix([[0, 1, 2, 3, 4]])
>>> x.view(np.recarray)
rec.array([0, 1, 2, 3, 4])
*** asmatrix, mat ***
view an array as a matrix
>>> x = array([[1,2],[3,4]])
>>> x * x # element by element
array([[ 1, 4],
[ 9, 16]])
>>> mat(x) * mat(x) # matrix multiplication
matrix([[ 7, 10],
[15, 22]])
*** asarray ***
asarray work in a similar matter as asmatrix, only that the view produced is that of np.ndarray.
*** ravel ***
returns a flattened view (1-dimensional) of an array or matrix
>>> x = array([[1,2],[3,4]])
>>> x
array([[ 1, 2],
[ 3, 4]])
>>> x.ravel()
array([1, 2, 3, 4])
************************************************************************************************************
*** Array and Matrix Functions *****************************************************************************
************************************************************************************************************
*** Shape Infos and Transformation *************************************************************************
*** shape ***
returns size of all dimensions
>>> x = randn(4,3)
>>> x.shape
(4L, 3L)
>>> shape(x)
(4L, 3L)
>>> M,N = shape(x)
>>> x.shape = 3,4
>>> x.shape
(3L, 4L)
*** reshape ***
transforms set
>>> x = array([[1,2],[3,4]])
>>> y = reshape(x,(4,1))
>>> y
array([[1],
[2],
[3],
[4]])
>>> z=reshape(y,(1,4))
>>> z
array([[1, 2, 3, 4]])
>>> w = reshape(z,(2,2))
array([[1, 2],
[3, 4]])
*** size ***
>>> x = randn(4,3)
>>> size(x)
2
>>> x.size 12
*** ndim ***
returns the size of all dimensions or an array or matrix as a tuple
>>> x = randn(4,3)
>>> ndim(x)
2
*** tile ***
tile replicates an array according to a specified size vector
tile provides an alterntive to this complicated construct:
x = array([[1,2],[3,4]])
z = concatenate((x,x,x))
y = concatenate((z.T,z.T),axis=0)
same as
y = tile(x,(2,3))
*** flatten ***
like ravel
*** flat ***
produces a numpy.flatiter object iteratively
>>> x = array([[1,2],[3,4]])
>>> x.flat
<numpy.flatiter at 0x6f569d0>
>>> x.flat[2]
3
>>> x.flat[1:4] = -1
>>> x
array([[ 1, -1],
[-1, -1]])
*** broadcast, broadcast_arrays ***
broadcast two broadcastable arrays without actually copying any data
broadcast_array copies into new array
>>> x = array([[1,2,3,4]])
>>> y = reshape(x,(4,1))
>>> b = broadcast(x,y)
>>> b.shape
(4L, 4L)
>>> for u,v in b:
... print('x: ', u, ' y: ',v)
x: 1 y: 1
x: 2 y: 1
x: 3 y: 1
x: 4 y: 1
x: 1 y: 2
... ... ...
>>> x = array([[1,2,3,4]])
>>> y = reshape(x,(4,1))
>>> b = broadcast_arrays(x,y)
>>> b[0]
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
>>> b[1]
array([[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]])
*** vstack, hstack ***
stacks compatible arrays
>>> x = reshape(arange(6),(2,3))
>>> y = x
>>> vstack((x,y))
array([[0, 1, 2],
[3, 4, 5],
[0, 1, 2],
[3, 4, 5]])
>>> hstack((x,y))
array([[0, 1, 2, 0, 1, 2],
[3, 4, 5, 3, 4, 5]])
*** concatenate ***
like vstack or hstack
*** split, vsplit, hsplit ***
split arrays and matrices vertically and horizontally
>>> x = reshape(arange(20),(4,5))
>>> y = vsplit(x,2)
>>> len(y)
2
>>> y[0]
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> y = hsplit(x,[1,3])
>>> len(y)
3
>>> y[0]
array([[ 0],
[ 5],
[10],
[15]])
>>> y[1]
array([[ 1, 2],
[ 6, 7],
[11, 12],
[16, 17]])
*** delete ***
>>> x = reshape(arange(20),(4,5))
>>> delete(x,1,0) # Same as x[[0,2,3]]
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> delete(x,[2,3],1) # Same as x[:,[0,1,4]]
array([[ 0, 1, 4],
[ 5, 6, 9],
[10, 11, 14],
[15, 16, 19]])
>>> delete(x,[2,3]) # Same as hstack((x.flat[:2],x.flat[4:]))
array([0, 1, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19])
*** squeeze ***
removes singleton dimensions from an array
>>> x = ones((5,1,5,1))
>>> shape(x)
(5L, 1L, 5L, 1L)
>>> y = x.squeeze()
>>> shape(y)
(5L, 5L)
>>> y = squeeze(x)
*** fliplr, flipud ***
as name says...
>>> x = reshape(arange(4),(2,2))
>>> x
array([[0, 1],
[2, 3]])
>>> fliplr(x)
array([[1, 0],
[3, 2]])
>>> flipud(x)
array([[2, 3],
[0, 1]])
*** diag ***
as name says....
>>> x = matrix([[1,2],[3,4]])
>>> x
matrix([[1, 2],
[3, 4]])
>>> y = diag(x)
>>> y
array([1, 4])
>>> z = diag(y)
>>> z
array([[1, 0],
[0, 4]])
*** triu, tril ***
upper and lower triangular
>>> x = matrix([[1,2],[3,4]])
>>> triu(x)
matrix([[1, 2],
[0, 4]])
>>> tril(x)
matrix([[1, 0],
[3, 4]])
*** Linear Algebra Functions *******************************************************************************
*** matrix_power ***
matrix_power(x,n) equal to x**n
*** svd ***
singular value decomposition
*** cond ***
computes the condition number of a matrix, which measures how close to singular a matrix is
>>> x = matrix([[1.0,0.5],[.5,1]])
>>> cond(x)
3
>>> x = matrix([[1.0,2.0],[1.0,2.0]]) # Singular
>>> cond(x)
inf
** slogdet ***
computes the sign and log of the absolute value of the determinant
** solve ***
solves the system X � = y when X is square and invertible so that the solution is exact
>>> X = array([[1.0,2.0,3.0],[3.0,3.0,4.0],[1.0,1.0,4.0]])
>>> y = array([[1.0],[2.0],[3.0]])
>>> solve(X,y)
array([[ 0.625],
[-1.125],
[ 0.875]])
*** lstsq ***
solves the system X � = y when X is n by k , n > k by finding the least squares solution
>>> X = randn(100,2)
>>> y = randn(100)
>>> lstsq(X,y)
(array([ 0.03414346, 0.02881763]),
array([ 3.59331858]),
2,
array([ 3.045516 , 1.99327863]))array([[ 0.625],[-1.125],[ 0.875]])
*** cholesky ***
computes the Cholesky factor of a positive definite matrix or array - http://en.wikipedia.org/wiki/Cholesky_decomposition
>>> x = matrix([[1,.5],[.5,1]])
>>> y = cholesky(x)
>>> y*y.T
matrix([[ 1. , 0.5],
[ 0.5, 1. ]])
*** det ***
computes determinant - http://en.wikipedia.org/wiki/Determinant
>>> x = matrix([[1,.5],[.5,1]])
>>> det(x)
0.75
*** eig ***
computes the eigenvalues and eigenvector of a square matrix - http://en.wikipedia.org/wiki/Eigenvector
>>> x = matrix([[1,.5],[.5,1]])
>>> val,vec = eig(x)
>>> vec*diag(val)*vec.T
matrix([[ 1. , 0.5],
[ 0.5, 1. ]])
*** eigh ***
computes the eigenvalues and eigenvector of a square, symmetric matrix - http://en.wikipedia.org/wiki/Eigenvector
*** inv ***
computes the inverse of a matrix
>>> x = matrix([[1,.5],[.5,1]])
>>> xInv = inv(x)
>>> x*xInv
matrix([[ 1., 0.],
[ 0., 1.]])
*** kron ***
computes the Kronecker product - http://en.wikipedia.org/wiki/Kronecker_product
*** trace ***
trace computes the trace of a square matrix (sum of diagonal elements) and so trace(x) equals sum(diag(x))
*** matrix_rank ***
computes rank of a matrix - http://en.wikipedia.org/wiki/Rank_(linear_algebra)
>>> x = matrix([[1,.5],[1,.5]])
>>> x
matrix([[ 1. , 0.5],
[ 1. , 0.5]])
>>> matrix_rank(x)
1
************************************************************************************************************
*** Importing & Exporting Data *****************************************************************************
************************************************************************************************************
*** loadtxt ***
loadtxt (numpy.lib.npyio.loadtxt) returns array
>>> data = loadtxt('FTSE_1984_2012.csv',delimiter=',') # Error
ValueError: could not convert string to float: Date
# Fails since csv has a header
>>> data = loadtxt('FTSE_1984_2012_numeric.csv',delimiter=',') # Error
ValueError: could not convert string to float: Date
>>> data = loadtxt('FTSE_1984_2012_numeric.csv',delimiter=',',skiprows=1)
>>> data[0]
array([ 4.09540000e+04, 5.89990000e+03, 5.92380000e+03, 5.88060000e+03, 5.89220000e+03,
8.01550000e+08, 5.89220000e+03])
*** genfromtxt ***
genfromtxt (numpy.lib.npyio.genfromtxt) returns array
>>> data = genfromtxt('FTSE_1984_2012.csv',delimiter=',')
>>> data[0]
array([ nan, nan, nan, nan, nan, nan, nan])
>>> data[1]
array([ nan, 5.89990000e+03, 5.92380000e+03, 5.88060000e+03, 5.89220000e+03, 8.01550000e+08,
5.89220000e+03])
>>> data = genfromtxt('FTSE_1984_2012_numeric_tab.txt',delimiter='\t') # import tab delimited
*** csv2rec ***
(matplotlib.mlab.csv2rec) returns recarray
>>> data = csv2rec('FTSE_1984_2012.csv',delimiter=',')
>>> data[0]
(datetime.date(2012, 2, 15), 5899.9, 5923.8, 5880.6, 5892.2, 801550000L, 5892.2)
# usually you need to create an array to store the data
>>> open = data['open']
>>> open
array([ 5899.9, 5905.7, 5852.4, ..., 1095.4, 1095.4, 1108.1])
** Reading 97-2003 Excel Files ***
from __future__ import print_function
import xlrd
wb = xlrd.open_workbook('FTSE_1984_2012.xls')
sheetNames = wb.sheet_names()
# Assumes 1 sheet name
sheet = wb.sheet_by_name(sheetNames[0])
excelData = []
for i in xrange(sheet.nrows):
excelData.append(sheet.row_values(i))
# - 1 since excelData has the header row
open = empty(len(excelData) - 1)
for i in xrange(len(excelData) - 1):
open[i] = excelData[i+1][1]
*** Reading 2007 & 2010 Excel Files ***
from __future__ import print_function
import openpyxl
wb = openpyxl.load_workbook('FTSE_1984_2012.xlsx')
sheetNames = wb.get_sheet_names()
# Assumes 1 sheet name
sheet = wb.get_sheet_by_name(sheetNames[0])
excelData = []
rows = sheet.rows
# - 1 since excelData has the header row
open = empty(len(rows) - 1)
for i in xrange(len(excelData) - 1):
open[i] = rows[i+1][1].value
*** Reading MATLAB Data Files (.mat) ***
from __future__ import print_function
import scipy.io as io
matData = io.loadmat('FTSE_1984_2012.mat')
open = matData['open']
*** Manually Reading Poorly Formatted Text ***
f = file('IBM_TAQ.txt', 'r')
line = f.readline()
# Burn the first list as a header
line = f.readline()
date = []
time = []
price = []
volume = []
while line:
data = line.split(',')
date.append(int(data[1]))
price.append(float(data[3]))
volume.append(int(data[4]))
t = data[2]
time.append(int(t.replace(':','')))
line = f.readline()
# Convert to arrays, which are more useful than lists
# for numeric data
date = array(date)
price = array(price)
volume = array(volume)
time = array(time)
allData = array([date,price,volume,time])
f.close()
*** Stat Transfer ***
Handy for exporting data to files!
*** Saving and Exporting Data (NumPy and NumPy) ***
NumPy can export matlab files!
*** writing via NumPy ***
x = arange(10)
y = zeros((100,100))
savez('test',x,y)
data = load('test.npz')
# If no name is given, arrays are generic names arr_1, arr_2, etc
x = data['arr_1']
savez('test',x=x,otherData=y)
data = load('test.npz')
# x=x provides the name x for the data in x
x = data['x']
# otherDate = y saves the data in y as otherData
y = data['otherData']
** writing to Matlab ***
from __future__ import print_function
import scipy.io
x = array([1.0,2.0,3.0])
y = zeros((10,10))
# Set up the dictionary
saveData = {'x':x, 'y':y}
io.savemat('test',saveData,do_compression=True)
# Read the data back
matData = io.loadmat('test.mat')
*** writing to CSV ***
x = randn(10,10)
# Save using tabs
savetxt('tabs.txt',x)
# Save to CSV
savetxt('commas.csv',x,delimiter=',')
# Reread the data
xData = loadtxt('commas.csv',delimiter=',')
************************************************************************************************************
*** Logical Operators and Find *****************************************************************************
************************************************************************************************************
*** core operators ***
> greater
>= greater_equal
< less
<= less_equal
== equal
!= not_equal
and logical_and
or logical_or
not logical_not
xor logical_xor
*** all and any ***
>>> x = matrix([[1,2][3,4]])
>>> y = x <= 2
>>> y
matrix([[ True, True],
[False, False]], dtype=bool)
>>> any(y)
True
>>> any(y,0)
matrix([[ True, True]], dtype=bool)
>>> any(y,1)
matrix([[ True],
[False]], dtype=bool)
*** allclose ***
can be used to compare two arrays, while allowing for a tolerance. This type of function is impor- tant when comparing floating point values which may be effectively the same, but not identical
>>> eps = np.finfo(np.float64).eps
>>> eps
2.2204460492503131e-16
>>> x = randn(2)
>>> y = x + eps
>>> x == y
array([False, False], dtype=bool)
>>> allclose(x,y)
True
*** array_equal ***
checks is 2 arrays have same shape and elements
*** array_equiv ***
checks if 2 arrays are equivalend despte not having the same shape. equivalence is defined as one array being broadcastable to produce the other.
>>> x = randn(10,1)
>>> y = tile(x,2)
>>> array_equal(x,y)
False
>>> array_equiv(x,y)
True
*** find ***
>>> x = matrix([[1,2],[3,4]])
>>> y = x <= 2
>>> indices = find(y)
>>> indices
array([0, 1], dtype=int64)
>>> x.flat[indices]
matrix([[1, 2]])
# Wrong output
>>> x[indices]
>>> x = matrix([[1,2],[3,4]]);
>>> y = x <= 4
>>> indices = find(y)
>>> x.flat[indices]
matrix([[1, 2, 3, 4]])
# Produces and error since x has only 2 rows
>>> x[indices] # Error
IndexError: index (2) out of range (0<=index<1) in dimension 0
*** argwhere ***
return array when logical condition is met
>>> x = randn(3)
>>> x
array([-0.5910316 , 0.51475905, 0.68231135])
>>> argwhere(x<0)
array([[0]], dtype=int64)
>>> where(x<-10.0) # Empty array
array([], shape=(0L, 1L), dtype=int64)
>>> x = randn(3,2)
>>> x
array([[ 0.72945913, 1.2135989 ],
[ 0.74005449, -1.60231553],
[ 0.16862077, 1.0589899 ]])
>>> argwhere(x<0)
array([[1, 1]], dtype=int64)
>>> x = randn(3,2,4)
>>> argwhere(x<0)
array([[0, 0, 1],
[0, 0, 2],
[0, 1, 2],
[0, 1, 3],
[1, 0, 2],
[1, 1, 0],
[2, 0, 1],
[2, 1, 0],
[2, 1, 1],
[2, 1, 3]], dtype=int64)
*** extract ***
similar to argwhere except that it returns the values where the condition is true rather then the indices
>>> x = randn(3)
>>> x
array([-0.5910316 , 0.51475905, 0.68231135])
>>> extract(x<0, x)
array([-0.5910316])
>>> extract(x<-10.0, x) # Empty array
array([], dtype=float64)
>>> x = randn(3,2)
>>> x
array([[ 0.72945913, 1.2135989 ],
[ 0.74005449, -1.60231553],
[ 0.16862077, 1.0589899 ]])
>>> extract(x<0,x)
array([-1.60231553])
*** is* ***
isnan => 1 if nan
isinf => 1 if inf
isfinite => 1 if not inf and not nan
isposfin,isnegfin => 1 for positive or negative inf
isreal => 1 if not complex valued
iscomplex => 1 if complex valued
isreal => 1 if real valued
is_string_like => 1 if argument is a string
is_numlike => 1 if is a numeric type
isscalar => 1 if scalar
isvector => 1 if input is a vector
************************************************************************************************************
*** Flow Control, Loops and Exception Handling *************************************************************
************************************************************************************************************
*** if ... elif ... else *** WHITESPACE SENSITIVE ***
if logical_1:
Code to run if logical_1
elif logical_2:
Code to run if logical_2
elif logical_3:
Code to run if logical_3
... ... else:
Code to run if all previous logicals are false
*** for (break & continue) *** WHITESPACE SENSITIVE ***
for item in iterable:
Code to run
break can stop a loop
x = randn(1000)
for i in x:
print(i)
if i > 2:
break
skip can skip an itneration of a loop
x = randn(10)
for i in x:
if i < 0:
print(i)
for i in x:
if i >= 0:
continue
print(i)
*** while (break & continue)***
while logical:
Code to run
*** try . . . except ***
try:
Dangerous Code
except ExceptionType1:
Code to run if ExceptionType1 is raised
except ExceptionType2:
Code to run if ExceptionType1 is raised
...
...
except:
Code to run if an unlisted exception type is raised
*** List Comprehension ***
can save lines in loops -> check
************************************************************************************************************
*** Custom Function and Modules ****************************************************************************
************************************************************************************************************
*** Functions - simple case ***
from __future__ import print_function
from __future__ import division
def square(x):
return x**2
# Call the function
x=2
y = square(x) print(x,y)
*** Functions - multiple inputs ***
from __future__ import print_function
from __future__ import division
def l2distance(x,y):
return (x-y)**2
# Call the function
x=3
y = 10
z = l2distance(x,y) print(x,y,z)
*** Functions - defined using NumPy arrays and matrices ***
from __future__ import print_function
from __future__ import division
import numpy as np
def l2_norm(x,y): d=x-y
return np.sqrt(np.dot(d,d))
# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
z = l2_norm(x,y)
print(x-y)
print("The L2 distance is ",z)
*** Functions - multiple outputs ***
from __future__ import print_function
from __future__ import division
import numpy as np
def l1_l2_norm(x,y): d=x-y
return sum(np.abs(d)),np.sqrt(np.dot(d,d))
# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
# Using 1 output returns a tuple
z = l1_l2_norm(x,y)
print(x-y)
print("The L1 distance is ",z[0])
print("The L2 distance is ",z[1])
# Using 2 output returns the values
l1,l2 = l1_l2_norm(x,y)
print("The L1 distance is ",l1)
print("The L2 distance is ",l2)
** Functions - Keyword arguments ***
keyword = value
from __future__ import print_function
from __future__ import division
import numpy as np
def lp_norm(x,y,p): d=x-y
return sum(abs(d)**p)**(1/p)
# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
z1 = lp_norm(x,y,2)
z2 = lp_norm(p=2,x=x,y=y)
print("The Lp distances are ",z1,z2)
*** Functions - default value ***
from __future__ import print_function
from __future__ import division
import numpy as np
def lp_norm(x,y,p = 2): d=x-y
return sum(abs(d)**p)**(1/p)
# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
# Inputs with default values can be ignored
l2 = lp_norm(x,y)
l1 = lp_norm(x,y,1)
print("The l1 and l2 distances are ",l1,l2)
print("Is the default value overridden?", sum(abs(x-y))==l1)
*** Function - variable inputs ***
from __future__ import print_function
from __future__ import division
import numpy as np
def lp_norm(x,y,p = 2, *arguments): d=x-y
print('The L' + str(p) + ' distance is :', sum(abs(d)**p)**(1/p))
out = [sum(abs(d)**p)**(1/p)]
for p in arguments:
print('The L' + str(p) + ' distance is :', sum(abs(d)**p)**(1/p))
out.append(sum(abs(d)**p)**(1/p))
return tuple(out)
# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
# Inputs with default values can be ignored
lp = lp_norm(x,y,1,2,3,4,1.5,2.5,0.5)
The alternative syntax, **keywords, generates a dictionary with all keyword inputs which are not in the function signature. One reason for using **keywords is to allow a long list of optional inputs without having to have an excessively long function definition
from __future__ import print_function
from __future__ import division
import numpy as np
def lp_norm(x,y,p = 2, **keywords): d=x-y
for key in keywords:
print('Key :', key, ' Value:', keywords[key])
return sum(abs(d)**p)
# Call the function
x = np.random.randn(10)
y = np.random.randn(10)
# Inputs with default values can be ignored
lp = lp_norm(x,y,kword1=1,kword2=3.2)
# The p keyword is in the function def, so not in **keywords
lp = lp_norm(x,y,kword1=1,kword2=3.2,p=0)
*** Docstring - help - '' or "" ***
from __future__ import print_function
from __future__ import division
import numpy as np
def lp_norm(x,y,p = 2):
''' The docstring contains any available help for
the function. A good docstring should explain the
inputs and the outputs, provide an example and a list
of any other related function.
'''
d=x-y
return sum(abs(d)**p)
>>> help(lp_norm) # call help
*** variable scope ***
Python : [0] where does variable appear? [1] is it inside a function?
from __future__ import print_function
from __future__ import division
import numpy as np
a, b, c = 1, 3.1415, 'Python'
def scope():
print(a)
print(b)
print(c)
# print(d) #Error, d has not be declared yet
scope()
d = np.array(1)
def scope2():
print(a)
print(b)
print(c)
print(d) # Ok now
scope2()
def scope3():
a = 'Not a number' # Local variable
print('Inside scope3, a is ', a)
print('a is ',a)
scope3()
print('a is now ',a)
*** Modules ***
use import module and then module.function syntax
from __future__ import division
from __future__ import print_function
import <samplefile>
y = -3
print(<samplefile>.square(y))
print(<samplefile>.cube(y))
*** __main__ ***
module could be both directly important and also directly runnable. it is important that the directly runnable code should not be executed when the module is imported by other code. therefore, use -> if __name__=="__main__":
---somefile.py---
from __future__ import division
from __future__ import print_function
def square(x):
return x**2
if __name__=="__main__":
print('Program called directly.')
else:
print('Program called indirectly using name: ', __name__)
---end somefile.py---
>>> %run somefile.py
Program called directly.
>>> import somefile
Program called indirectly using name: test
*** PYTHONPATH ***
check current path
>>> import sys
>>> sys.path
check additional directories
import sys
# New directory is first to be searched
sys.path.insert(0, 'c:\\path\\to\add')
# New directory is last to be searched
sys.path.append
*** Packages ***
************************************************************************************************************
*** Probability and Statistics Functions *******************************************************************
************************************************************************************************************
*** NumPy **************************************************************************************************
*** Simulating Random Variables ****************************************************************************
NumPy random number generators are all stored in the module numpy.random. These can be imported with using import numpy as np and then calling np.random.rand(), for example, or by importing import numpy.randomas rndandusingrnd.rand()
*** rand, random_sample ***
>>> x = rand(3,4,5)
>>> y = random_sample((3,4,5))
*** randn, standard_normal ***
>>> x = randn(3,4,5)
>>> y = standard_normal((3,4,5))
*** randint, random_integers ***
3 inputs, low, high and size
>>> x = randint(0,10,(100))
>>> x.max() # Is 9 since range is [0,10)
9
>>> y = random_integers(0,10,(100))
>>> y.max() # Is 10 since range is [0,10]
10
*** shuffle ***
randomly orders elements of array
>>> x = arange(10)
>>> shuffle(x)
>>> x
array([4, 6, 3, 7, 9, 0, 2, 1, 8, 5])
*** permutation ***
>>> x = arange(10)
>>> permutation(x)
array([2, 5, 3, 0, 6, 1, 9, 8, 4, 7])
>>> x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
*** Bernoulli ***
http://en.wikipedia.org/wiki/Bernoulli_distribution
There is no Bernoulli generator.Instead use1 - (rand()>p) to generate a single draw or1 - (rand(10,10)>p) to generate an array
*** beta ***
https://en.wikipedia.org/wiki/Beta_distribution
beta(a,b) generates a draw from the Beta(a , b ) distribution. beta(a,b,(10,10)) generates a 10 by 10 array of draws from a Beta(a , b ) distribution.
*** binomial ***
https://en.wikipedia.org/wiki/Binomial_distribution
binomial(n,p) generates a draw from the Binomial(n , p ) distribution. binomial(n,p,(10,10)) generates a 10 by 10 array of draws from the Binomial(n , p ) distribution.
*** chisquare ***
http://en.wikipedia.org/wiki/Chi-squared_distribution
chisquare(nu) generates a draw from the �⌫2 distribution, where ⌫ is the degree of freedom. chisquare(nu,(10,10)) generates a 10 by 10 array of draws from the �⌫2 distribution.
*** exponential ***
http://en.wikipedia.org/wiki/Exponential_distribution
exponential() generates a draw from the Exponential distribution with scale parameter � = 1. exponential( lambda, (10,10)) generates a 10 by 10 array of draws from the Exponential distribution with scale parame- ter �.
*** f ***
f(v1,v2) generates a draw from the distribution F⌫1 ,⌫2 distribution where ⌫1 is the numerator degree of free- dom and ⌫2 is the denominator degree of freedom. f(v1,v2,(10,10)) generates a 10 by 10 array of draws from the F⌫1 ,⌫2 distribution.
*** gamma ***
http://math.wikia.com/wiki/Gamma_distribution
gamma(a) generates a draw from the Gamma(↵, 1) distribution, where ↵ is the shape parameter. gamma(a, theta, (10,10))generatesa10by10arrayofdrawsfromtheGamma(↵,✓)distributionwhere✓isthescale parameter.
*** multivariate_normal ***
http://en.wikipedia.org/wiki/Multivariate_normal
multivariate_normal(mu, Sigma) generates a draw from a multivariate Normal distribution with mean μ (k -element array) and covariance ⌃ (k by k array). multivariate_normal(mu, Sigma, (10,10)) generates a 10 by 10 by k array of draws from a multivariate Normal distribution with mean μ and covariance ⌃.
*** negative_binomial ***
http://en.wikipedia.org/wiki/Negative_binomial
negative_binomial(n, p)generatesadrawfromtheNegativeBinomialdistributionwherenisthenumber offailuresbeforestoppingandpisthesuccessrate.negative_binomial(n, p, (10, 10))generatesa10by 10 array of draws from the Negative Binomial distribution where n is the number of failures before stopping and p is the success rate.
*** normal ***
http://en.wikipedia.org/wiki/Normal_distribution
normal() generates draws from a standard Normal (Gaussian). normal(mu, sigma) generates draws from a Normal with mean μ and standard deviation �. normal(mu, sigma, (10,10)) generates a 10 by 10 ar- ray of draws from a Normal with mean μ and standard deviation �. normal(mu, sigma) is equivalent to mu + sigma * rand()ormu + sigma * standard_normal().
*** poisson ***
http://en.wikipedia.org/wiki/Poisson_distribution
poisson() generates a draw from a Poisson distribution with � = 1. poisson(lambda) generates a draw from a Poisson distribution with expectation �. poisson(lambda, (10,10)) generates a 10 by 10 array of draws from a Poisson distribution with expectation �.
*** standard_t ***
http://en.wikipedia.org/wiki/Student%27s_t-distribution
standard_t(nu) generates a draw from a Student's t with shape parameter ⌫. standard_t(nu, (10,10)) generates a 10 by 10 array of draws from a Student's t with shape parameter ⌫ .
*** uniform ***
uniform() generates a uniform random variable on (0, 1). uniform(low, high) generates a uniform on (l,h).uniform(low, high, (10,10))generatesa10by10arrayofuniformson(l,h).
*** laplace ***
http://en.wikipedia.org/wiki/Laplace_distribution
laplace() generates a draw from the Laplace (Double Exponential) distribution with centered at 0 and unit scale. laplace(loc, scale, (10,10)) generates a 10 by 10 array of Laplace distributed data with location locandscalescale.Usinglaplace(loc, scale)isequivalenttocallingloc + scale*laplace().
*** lognormal ***
http://en.wikipedia.org/wiki/Log-normal_distribution
lognormal() generates a draw from a Log-Normal distribution with μ = 0 and � = 1. lognormal(mu, sigma, (10,10))generatesa10by10arrayorLog-NormallydistributeddatawheretheunderlyingNormal distribution has mean parameter μ and scale parameter �.
*** multinomial ***
http://en.wikipedia.org/wiki/Multinomial_distribution
multinomial(n, p) generates a draw from a multinomial distribution using n trials and where each out- come has probability p, a k-element array where ⌃ki=1p = 1. The output is a k-element array containing the number of successes in each category. multinomial(n, p, (10,10)) generates a 10 by 10 by k array of multinomially distributed data with n trials and probabilities p.
*** NumPy **************************************************************************************************
*** Simulation and Random Number Generation ****************************************************************
*** RandomState ***
can initialize multiple generators
>>> gen1 = np.random.RandomState()
>>> gen2 = np.random.RandomState()
>>> gen1.uniform() # Generate a uniform
0.6767614077579269
>>> state1 = gen1.get_state()
>>> gen1.uniform()
0.6046087317893271
>>> gen2.uniform() # Different, since gen2 has different seed
0.04519705909244154
>>> gen2.set_state(state1)
>>> gen2.uniform() # Same uniform as gen1 produces after assigning state
0.6046087317893271
*** State ***
NumPy can be read using numpy.random.get_state and can be restored using numpy.random.set_state
>>> st = get_state()
>>> randn(4)
array([ 0.37283499, 0.63661908, -1.51588209, -1.36540624])
>>> set_state(st)
>>> randn(4)
array([ 0.37283499, 0.63661908, -1.51588209, -1.36540624])
*** get_state ***
self explanatory
*** set_state ***
self explanatory
*** seed ***
seed() => random number
seed(0) => same sequence of random numbers
>>> seed()
>>> randn(1)
array([ 0.62968838])
>>> seed()
>>> randn(1)
array([ 2.230155])
>>> seed(0)
>>> randn(1)
array([ 1.76405235])
>>> seed(0)
>>> randn(1)
array([ 1.76405235])
*** replicatig simulation data - 2 methods ***
[0] Call seed() and then st = get_state(), and save st to a file which can then be loaded in the future when running the simulation study.
[1] Callseed(s)atthestartoftheprogram(wheresisaconstant).
*** NumPy **************************************************************************************************
*** Statistics Functions ***********************************************************************************
*** mean ***
http://en.wikipedia.org/wiki/Mean
>>> x = arange(10.0)
>>> x.mean()
4.5
>>> mean(x)
4.5
>>> x= reshape(arange(20.0),(4,5))
>>> mean(x,0)
array([ 7.5, 8.5, 9.5, 10.5, 11.5])
>>> x.mean(1)
array([ 2., 7., 12., 17.])
*** median ***
http://en.wikipedia.org/wiki/Median
>>> x= randn(4,5)
>>> x
array([[-0.74448693, -0.63673031, -0.40608815, 0.40529852, -0.93803737],
[ 0.77746525, 0.33487689, 0.78147524, -0.5050722 , 0.58048329],
[-0.51451403, -0.79600763, 0.92590814, -0.53996231, -0.24834136],
[-0.83610656, 0.29678017, -0.66112691, 0.10792584, -1.23180865]])
>>> median(x)
-0.45558017286810903
>>> median(x, 0)
array([-0.62950048, -0.16997507, 0.18769355, -0.19857318, -0.59318936])
*** std ***
http://en.wikipedia.org/wiki/Standard_deviation
*** var ***
http://en.wikipedia.org/wiki/Variance
*** corrcoef ***
http://en.wikipedia.org/wiki/Goodness_of_fit
*** cov ***
http://en.wikipedia.org/wiki/Covariance
*** histogram ***
http://en.wikipedia.org/wiki/Histogram
*** histogram2d ***
*** SciPy **************************************************************************************************
*** Continuous Random Variables ****************************************************************************
*** dist.rvs ***
Pseudo-randomnumbergeneration.Generically,rvsiscalledusingdist.rvs(*args, loc=0,scale=1, size=size) where size is an n -element tuple containing the size of the array to be generated.
*** dist.pdf ***
Probability density function evaluation for an array of data (element-by-element). Generically, pdf is called usingdist.pdf(x, *args, loc=0, scale=1)wherexisanarraythatcontainsthevaluestousewhenevalu- atingPDF.
*** dist.logpdf ***
Log probability density function evaluation for an array of data (element-by-element). Generically, logpdf iscalledusingdist.logpdf(x, *args, loc=0, scale=1)wherexisanarraythatcontainsthevaluestouse when evaluating log PDF.
*** dist.cdf ***
Cumulative distribution function evaluation for an array of data (element-by-element). Generically, cdf is calledusingdist.cdf(x, *args, loc=0, scale=1)wherexisanarraythatcontainsthevaluestousewhen evaluating CDF.
*** dist.ppf ***
Inverse CDF evaluation (also known as percent point function) for an array of values between 0 and 1. Generically,ppfiscalledusingdist.ppf(p, *args, loc=0, scale=1)wherepisanarraywithallelements between 0 and 1 that contains the values to use when evaluating inverse CDF.
*** dist.fit ***
Estimate shape, location, and scale parameters from data by maximum likelihood using an array of data. Generically,fitiscalledusingdist.fit(data, *args, floc=0, fscale=1)wheredataisadataarrayused to estimate the parameters. floc forces the location to a particular value (e.g. floc=0). fscale similarly forces the scale to a particular value (e.g. fscale=1) . It is necessary to use floc and/or fscale when com- puting MLEs if the distribution does not have a location and/or scale. For example, the gamma distribution is defined using 2 parameters, often referred to as shape and scale. In order to use ML to estimate parame- ters from a gamma, floc=0 must be used.
*** dist.median ***
Returnsthemedianofthedistribution.Generically,medianiscalledusingdist.median(*args, loc=0, scale=1).
*** dist.mean ***
Returns the mean of the distribution.Generically,mean is called using dist.mean(*args, loc=0, scale=1).
*** dist.moment ***
nth non-central momente valuation of the distribution. Generically, moment is called using dist.moment(r, *args,
loc=0, scale=1) where r is the order of the moment to compute.
*** dist.varr ***
Returns the variance of the distribution. Generically,var is called using dist.var(*args, loc=0, scale=1).
*** dist.std ***
Returns the standard deviation of the distribution. Generically, std is called using dist.std(*args,loc=0, scale=1).
*** NumPy **************************************************************************************************
*** Select Statistics Functions ****************************************************************************
*** mode ***
http://en.wikipedia.org/wiki/Mode_(statistics)
computes mode of an array; option defines axis
>>> x=randint(1,11,1000)
>>> stats.mode(x)
(array([ 4.]), array([ 112.]))
*** moment ***
http://en.wikipedia.org/wiki/Moment_matrix
computes rth moment of array; option defines axis
>>> x = randn(1000)
>>> moment = stats.moment
>>> moment(x,2) - moment(x,1)**2
0.94668836546169166
>>> var(x)
0.94668836546169166
>>> x = randn(1000,2)
>>> moment(x,2,0) # axis 0
array([ 0.97029259, 1.03384203])
*** skew ***
http://en.wikipedia.org/wiki/Skew-symmetric_matrix
computes skewness of an array
>>> x = randn(1000)
>>> skew = stats.skew
>>> skew(x)
0.027187705042705772
>>> x = randn(1000,2)
>>> skew(x,0)
array([ 0.05790773, -0.00482564])
*** kurtosis ***
http://en.wikipedia.org/wiki/Kurtosis
computes the excess kurtosis (actual kurtosis minus 3) of an array
>>> x = randn(1000)
>>> kurtosis = stats.kurtosis
>>> kurtosis(x)
-0.2112381820194531
>>> kurtosis(x, fisher=False)
2.788761817980547
*** pearsonr ***
http://en.wikipedia.org/wiki/Pearson_correlation
pearson correlation between 2 dimensional array
>>> x = randn(10)
>>> y = x + randn(10)
>>> pearsonr = stats.pearsonr
>>> corr, pval = pearsonr(x, y)
>>> corr
0.40806165708698366
>>> pval
0.24174029858660467
*** spearmanr ***
http://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficient
computes the Spearman correlation
>>> x = randn(10,3)
>>> spearmanr = stats.spearmanr
>>> rho, pval = spearmanr(x)
>>> rho
array([[ 1. , -0.02087009, -0.05867387],
[-0.02087009, 1. , 0.21258926],
[-0.05867387, 0.21258926, 1. ]])
>>> pval
array([[ 0. , 0.83671325, 0.56200781],
[ 0.83671325, 0. , 0.03371181],
[ 0.56200781, 0.03371181, 0. ]])
>>> rho, pval = spearmanr(x[:,1],x[:,2])
>>> corr
-0.020870087008700869
>>> pval
0.83671325461864643
*** kendalltau ***
http://en.wikipedia.org/wiki/Kendall_tau
computed Kendall's ⌧ between 2 1-dimensonal arrays
>>> x = randn(10)
>>> y = x + randn(10)
>>> kendalltau = stats.kendalltau
>>> tau, pval = kendalltau(x,y)
>>> tau
0.46666666666666673
>>> pval
0.06034053974834707
*** linregress ***
http://en.wikipedia.org/wiki/Linear_regression
estimates a linear regression between 2 1-dimensional arrays
>>> x = randn(10)
>>> y = x + randn(10)
>>> linregress = stats.linregress
>>> slope, intercept, rvalue, pvalue, stderr = linregress(x,y)
>>> slope
1.6976690163576993
>>> rsquare = rvalue**2
>>> rsquare
0.59144988449163494
>>> x.shape = 10,1
>>> y.shape = 10,1
>>> z = hstack((x,y))
>>> linregress(z) # Alternative form, [x y]
(1.6976690163576993,
-0.79983724584931648,
0.76905779008578734,
0.0093169560056056751,
0.4988520051409559)
*** NumPy **************************************************************************************************
*** Select Statistical tests *******************************************************************************
*** normaltest ***
http://en.wikipedia.org/wiki/Jarque-Bera_test
Returns the test statistic and the p-value of the test. small sample modified version of the Jarque-Bera test statistic
*** kstest ***
http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
>>> x = randn(100)
>>> kstest = stats.kstest
>>> stat, pval = kstest(x, 'norm')
>>> stat
0.11526423481470172
>>> pval
0.12963296757465059
>>> ncdf = stats.norm().cdf # No () on cdf to get the function
>>> kstest(x, ncdf)
(0.11526423481470172, 0.12963296757465059)
>>> x = gamma.rvs(2, size = 100)
>>> kstest(x, 'gamma', (2,)) # (2,) contains the shape parameter
(0.079237623453142447, 0.54096739528138205)
>>> gcdf = gamma(2).cdf
>>> kstest(x, gcdf)
(0.079237623453142447, 0.54096739528138205)
*** ks_2samp ***
http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
2-sample version of the Kolmogorov-Smirnov test
*** shapiro ***
http://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test
Shapiro-Wilk test for normality on a 1-dimensional array of data
************************************************************************************************************
*** Optimization *******************************************************************************************
************************************************************************************************************
import scipy.optimize as opt
*** Unconstrained Optimization *****************************************************************************
optimizer(f, x0)
fprime => Function returning derivative of f. Must take same inputs as f (1)
args => Tuple containing extra parameters to pass to f
gtol => Gradient norm for terminating optimization (1)
norm => Order of norm (e.g. inf or 2) (1)
epsilon => Step size to use when approximating f 0 (1)
maxiter => Integer containing the maximum number of iterations
disp => Boolean indicating whether to print convergence message
full_output => Boolean indicating whether to return additional output
retall => Boolean indicating whether to return results for each iteration.
callback => User supplied function to call after each iteration.
*** fmin_bfgs ***
http://en.wikipedia.org/wiki/Limited-memory_BFGS
derivative information in the 1st derivative to estimate the sec- ond derivative
# basic use case
>>> opt.fmin_bfgs(optim_target1, 2)
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 2
Function evaluations: 12
Gradient evaluations: 4
array([ -7.45132576e-09])
# analytic derivatives improves performance
>>> opt.fmin_bfgs(optim_target1, 2, fprime = optim_target1_grad)
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 2
Function evaluations: 4
Gradient evaluations: 4
array([ 2.71050543e-20])
# multivariate optimization
>>> opt.fmin_bfgs(optim_target2, array([1.0,2.0]))
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 3
Function evaluations: 20
Gradient evaluations: 5
array([ 1. , 0.99999999])
# with arguments
>>> hyperp = array([1.0,2.0,3.0])
>>> opt.fmin_bfgs(optim_target3, array([1.0,2.0]), args=(hyperp ,))
Optimization terminated successfully.
Current function value: -0.333333
Iterations: 3
Function evaluations: 20
Gradient evaluations: 5
array([ 0.33333332, -1.66666667])
# Derivative funcitons
def optim_target3_grad(params,hyperparams):
x, y = params
c1, c2, c3=hyperparams
return array([2*x+c1+y,x+c3+2*y])
# Analytical derivative
>>> opt.fmin_bfgs(optim_target3, array([1.0,2.0]), fprime=optim_target3_grad, args=(hyperp ,))
Optimization terminated successfully.
Current function value: -0.333333
Iterations: 3
Function evaluations: 5
 Gradient evaluations: 5
array([ 0.33333333, -1.66666667])
*** fmin_cg ***
http://en.wikipedia.org/wiki/Nonlinear_conjugate_gradient_method
>>> opt.fmin_cg(optim_target3, array([1.0,2.0]), args=(hyperp ,))
Optimization terminated successfully.
Current function value: -0.333333
Iterations: 7
Function evaluations: 59
Gradient evaluations: 12
array([ 0.33333334, -1.66666666])
*** fmin_ncg ***
http://en.wikipedia.org/wiki/Nonlinear_conjugate_gradient_method
>>> opt.fmin_ncg(optim_target3, array([1.0,2.0]), optim_target3_grad, args=(hyperp,))
Optimization terminated successfully.
Current function value: -0.333333
Iterations: 5
Function evaluations: 6
Gradient evaluations: 21
Hessian evaluations: 0
array([ 0.33333333, -1.66666666])
************************************************************************************************************
*** Derivative-free Optimization ***************************************************************************
************************************************************************************************************
xtol => Change in x to terminate optimization
ftol => Change in function to terminate optimization
maxfun => Maximum number of function evaluations
direc => Initial direction set, same size as x0 by m
*** fmin ***
http://en.wikipedia.org/wiki/Simplex_algorithm
def tick_loss(quantile, data, alpha):
e = data - quantile
return dot((alpha - (e<0)),e)
# with alpha = 0.5
>>> data = randn(1000)
>>> opt.fmin(tick_loss, 0, args=(data, 0.5))
Optimization terminated successfully.
Current function value: -0.333333
Iterations: 48
Function evaluations: 91
array([ 0.33332751, -1.66668794])
>>> median(data)
-0.0053901030307567602
*** fmin_powell ***
http://en.wikipedia.org/wiki/Powell%27s_method
>>> data = randn(1000)
>>> opt.fmin_powell(tick_loss, 0, args=(data, 0.5))
Optimization terminated successfully.
Current function value: 396.760642
Iterations: 1
Function evaluations: 17
array(-0.004659496638722056)
************************************************************************************************************
*** Constrained Optimization *******************************************************************************
************************************************************************************************************
Constrained optimization is frequently encountered in economic problems where parameters are only mean- ingful in some particular range – for example, a variance must be weakly positive.
*** fmin_slsqp ***
check if needed...
*** fmin_tnc ***
check if needed... supports only bounds constraints
*** fmin_l_bfgs_b ***
check if needed... supports only bounds constraints
*** fmin_cobyla ***
check if needed...
************************************************************************************************************
*** Scalar Function Optimization ***************************************************************************
************************************************************************************************************
*** fminbound ***
fminbound finds the minimum of a scalar function between two bounds.
>>> hyperp = array([1.0, -2.0, 3])
>>> opt.fminbound(optim_target5, -10, 10, args=(hyperp,))
1.0000000000000002
>>> opt.fminbound(optim_target5, -10, 0, args=(hyperp,))
-5.3634455116374429e-06
*** golden ***
http://en.wikipedia.org/wiki/Golden_section_search
golden section search algorithm to find the minimum of a scalar function. It can optionally be provided with bracketing information which can speed up the solution.
>>> hyperp = array([1.0, -2.0, 3])
>>> opt.golden(optim_target5, args=(hyperp,))
0.999999992928981
>>> opt.golden(optim_target5, args=(hyperp,), brack=[-10.0,10.0])
0.9999999942734483
*** brent ***
http://en.wikipedia.org/wiki/Brent%27s_method
uses Brent's method to find the minimum of a scalar function.
>>> opt.brent(optim_target5, args=(hyperp,))
0.99999998519
************************************************************************************************************
*** Nonlinear Least Squares ********************************************************************************
************************************************************************************************************
http://en.wikipedia.org/wiki/Non-linear_least_squares
def nlls_objective(beta, y, X):
b0 = beta[0]
b1 = beta[1]
b2 = beta[2]
return y - b0 - b1 * (X**b2)
Ddun => Function to compute the Jacobian of the problem. Element i, j should be @ ei /@ �j
col_deriv => Direction to use when computing Jacobian numerically
epsfcn => Step to use in numerical Jacobian calculation.
diag => Scalar factors for the parameters. Used to rescale if scale is very different.
factor => used to determine the initial step size.
************************************************************************************************************
*** Dates and Times ****************************************************************************************
************************************************************************************************************
*** Creating Dates and Times *******************************************************************************
>>> import datetime as dt
>>> yr = 2012; mo = 12; dd = 21
>>> dt.date(yr, mo, dd)
datetime.date(2012, 12, 21)
>>> hr = 12; mm = 21; ss = 12; ms = 21
>>> dt.time(hr, mm, ss, ms)
dt.time(12,21,12,21)
# use this for timestamps
>>> dt.datetime(yr, mo, dd, hr, mm, ss, ms)
datetime.datetime(2012, 12, 21, 12, 21, 12, 21)
*** Dates Mathematics **************************************************************************************
*** timedelta ***
>>> d1 = dt.datetime(yr, mo, dd, hr, mm, ss, ms)
>>> d2 = dt.datetime(yr + 1, mo, dd, hr, mm, ss, ms)
>>> d2-d1
datetime.timedelta(365)
>>> d2 + dt.timedelta(30,0,0)
datetime.datetime(2014, 1, 20, 12, 21, 12, 20)
>>> dt.date(2012,12,21) + dt.timedelta(30,12,0)
datetime.date(2013, 1, 20)
# for improved accuracy use datetime and combine
>>> d3 = dt.date(2012,12,21)
>>> dt.datetime.combine(d3, dt.time(0))
datetime.datetime(2012, 12, 21, 0, 0)
# modify using replace
>>> d3 = dt.datetime(2012,12,21,12,21,12,21)
>>> d3.replace(month=11,day=10,hour=9,minute=8,second=7,microsecond=6)
datetime.datetime(2012, 11, 10, 9, 8, 7, 6)
************************************************************************************************************
*** Graphics ***********************************************************************************************
************************************************************************************************************
ALLWAYS import these modules:
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import scipy.stats as stats
*** 2D Plotting ********************************************************************************************
*** line plots *********************************************************************************************
>>> y = np.random.randn(100)
# normal plot
>>> plt.plot(y)
# in color
>>> plt.plot(y,'g--')
*** Options Color ***
Blue => b
Green => g
Red => r
Cyan => c
Magenta => m
Yellow => y
Black => k
White => w
*** Options Marker ***
Point => .
Pixel => ,
Circle => o
Square => s
Diamond => D
Thin diamond => d
Cross => x
Plus => +
Star => *
Hexagon Alt. => H
Hexagon => h
Pentagon => p
Triangles => ^, v, <, >
Vertical Line =>
Horizontal Line => -
*** Options Line ***
Solid => -
Dashed => --
Dash-dot => -.
Dotted => :
*** define x, y plotting ***
>>> x = np.cumsum(np.random.rand(100))
>>> plt.plot(x,y,'r-')
*** multiple opions ***
>>> plt.plot(x,y,alpha = 0.5, color = '#FF7F00', \
... label = 'Line Label', linestyle = '-.', \
... linewidth = 3, marker = 'o', markeredgecolor = '#000000', \
... markeredgewidth = 2, markerfacecolor = '#FF7F00', \
... markersize=30)
alpha => Alpha (transparency) of the plot. Default is 1 (no transparency)
color => Color description for the line.
label => Label for the line. Used when creating legends
linestyle => A line style symbol
linewidth => A positive integer indicating the width of the line
marker => A marker shape symbol or character
markeredgecolor => Color of the edge (a line) around the marker
markeredgewidth => Width of the edge (a line) around the marker
markerfacecolor => Face color of the marker
markersize => A positive integer indicating the size of the marker
# get all options when running these 2 cmds
>>> h = plot(randn(10))
>>> matplotlib.artist.getp(h)
# set property with
matplotlib.artist.setp
*** 2D Plotting *******************************************************************************************
*** scatter plot ******************************************************************************************
*** basic ***
>>> z = np.random.randn(100,2)
>>> z[:,1] = 0.5*z[:,0] + np.sqrt(0.5)*z[:,1]
>>> x=z[:,1]
>>> y=z[:,1]
>>> plt.scatter(x,y)
*** with options ***
>>> plt.scatter(x,y, s = 60, c = '#FF7F00', marker='s', \
... alpha = .5, label = 'Scatter Data')
*** 3D scatter - the shit! ***
>>> s = np.exp(np.exp(np.exp(np.random.rand(100))))
>>> s = 200 * s/np.max(s)
*** 2D Plotting *******************************************************************************************
*** bar charts ********************************************************************************************
*** basic ***
>>> y = np.random.rand(5)
>>> x = np.arange(5)
>>> plt.bar(x,y)
*** with option ***
>>> plt.bar(x,y, width = 0.5, color = '#ffffff',edgecolor = '#ahsbdf', lnewidth = 5)
*** horizontal barchart ***
>>> colors = ['#asdhhf', '#sadfsfd', '#iouhns', '#asdhls']
>>> plt.barh(x,y, height = 0.5, color = colors, edgecolor = '#000000', linewidth = 5)
*** 2D Plotting *******************************************************************************************
*** Pie charts ********************************************************************************************
*** basic ***
>>> y = np.random.ran(5)
>>> y = y/sum(y)
>>> y =[y,0.5] = .05
>>> plt.pie(y)
*** 2D Plotting *******************************************************************************************
*** histograms ********************************************************************************************
*** basic ***
>>> x = np.random.rand(1000)
>>> plt.hist(x, bins = 30)
*** with option ***
>>> plt.hist(x, bins = 30, cummulative = true, color = '#agdffh')
*** Advanced Plotting **************************************************************************************
*** Multiple Plots *****************************************************************************************
*** Figure with Subplots ***
fig = plt.figure()
# Add the subplot to the figure
# Panel 1
>>> ax = fig.add_subplot(2,2,1)
>>> y = np.random.randn(100)
>>> plt.plot(y)
>>> ax.set_title('1')
# Panel 2
>>> y = np.random.rand(5)
>>> x = np.arange(5)
>>> ax = figadd_subplot(2,2,2)
>>> plt.bar(x,y)
>>> ax.set_title('2')
*** Multiple Plots on the same Axes with hold(True) ***
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.hist(x, bins = 30, label = 'Empirical')
>>> xlim = ax.get_xlim()
>>> ylim = ax.get_ylim()
>>> pdfx = p.linespace(xlim[0],xlim[1],200)
>>> pdfy = stats.norm.pdf(pdfx)
>>> pdfy = pdfy / pdfy.max() + ylim[1]
>>> plt.hold(True)
>>> plt.plot(pdfx,pdfy, '-r',label = "PDF")
>>> ax.set_ylim((ylim[0],1,2+ylim[1]))
>>> plt.legend()
>>> hold(False)
*** Adding and Placing of Title and Legend ***
>>> x = np.cumsum(np.random.randn(100,3), axis = 0)
>>> plt.plot(x[;,0],'b-',label = 'Series 1')
>>> plt.hold(True)
>>> plt.plot(x[:,1],'g-.',label = 'Series 2')
>>> plt.plot(x[:,2],'r:',label = 'Series 3')
>>> plt.legend()
>>> plt.title('Basic Legend')
# placing of the legend
>>> plt.plot(x[:,0],'b-',label = 'Series 1')
>>> plt.hold(True)
>>> plt.plot(x[:,1],'g-.',label = 'Series 2')
>>> plt.plot(x[:,2],'r:',label = 'Series 3')
>>> plt.legend(loc = 0, frameon = False, title = 'Data')
>>> plt.title('Improved Legend')
*** Dates on Plots for Timeseries Analysis ***
import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
# Simulate data = 2000 dates
>>> T = 2000
>>> x = []
>>> for i in xrange(T):
x.append(dt.datetime(2012,3,1)+dt.timedelta(i,0,0))
>>> y = np.cumsum(rnd.randn(T))
# Draw plot
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.plot(x,y)
>>> plt.draw()
# format axis
>>> fig.autofmt_xdate()
>>> plt.draw()
# fix overlapping labels
>>> fig.autofmt_xdate()
>>> plt.draw()
# format dates
>>> months = mdates.MonthLocator()
>>> ax.xaxis.set_major_locator(months)
>>> fmt = mdates.DateFormatter('%b %Y')
>>> ax.xaxis.set_major_formatter(fmt)
>>> fig.autofmt_xdate()
>>> plt.draw()
*** 3D Plotting ********************************************************************************************
*** Line Plots *********************************************************************************************
*** 3 vektor drawing ***
>>> from mpl_toolkits.mplot3d import Axes3D
>>> x = np.linspace(0,6*np.pi,600)
>>> z = x.copy()
>>> y = np.sin(x)
>>> x= np.cos(x)
>>> fig = plt.figure()
>>> ax = Axes3D(fig) # Different usage
>>> ax.plot(x, y, zs=z, label='Spiral')
>>> ax.view_init(15,45)
>>> plt.draw()
*** Surfaces and Mesh (Wireframe) Plots ***
x = np.linspace(-3,3,100)
y = np.linspace(-3,3,100)
x,y = np.meshgrid(x,y)
z = np.mat(np.zeros(2))
p = np.zeros(np.shape(x))
R = np.matrix([[1,.5],[.5,1]])
Rinv = np.linalg.inv(R)
for i in xrange(len(x)):
for j in xrange(len(y)):
z[0,0] = x[i,j]
z[0,1] = y[i,j]
p[i,j] = 1.0/(2*np.pi)*np.sqrt(np.linalg.det(R))*np.exp(-(z*Rinv*z.T)/2)
>>> fig = plt.figure()
# 3D Mesh
>>> import matplotlib.cm as cm
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111, projection ='3d')
>>> ax.plot_surface(x,y,p, rstride=5, cstride=5, cmap=cm.jet)
>>> ax.view_init(29,80)
>>> plt.draw()
*** contour Plots ***
>>> fig = plt.figure()
>>> ax = fig.gca()
>>> ax.contour(x,y,p)
>>> plt.draw()
*** General Plotting Functions ********************************************************************************************
*** figure ***
open figure window
*** add_subplot ***
add plot to figure
*** close ***
closes figure
*** show ***
update figure or pause execution
*** draw ***
forces update of figure
*** Export ***
>>> plt.plot(randn(10,2))
>>> savefig('figure.pdf') # PDF export
>>> savefig('figure.png') # PNG export
>>> savefig('figure.svg') # Scalable Vector Graphics export
************************************************************************************************************
*** String Manipulation ************************************************************************************
************************************************************************************************************
*** String Building *******************************************************************************
*** Adding Strings + or join***
>>> a = 'Python is'
>>> b = 'a rewarding language.'
# using + concatenation
>>> a + ' ' + b
'Python is a rewarding language.'
# using join
>>> ' '.join([a,b])
'Python is a rewarding language.'
# using empty string
>>> ''.join([a,' ',b])
'Python is a rewarding language.'
*** Multiplying Strings ***
>>> a = 'Python is'
>>> 2*a
'Python isPython is'
*** String Functions *******************************************************************************
*** split ***
>>> s = 'Python is a rewarding language.'
>>> s.split(' ')
['Python', 'is', 'a', 'rewarding', 'language.']
>>> s.split(' ',3)
['Python', 'is', 'a', 'rewarding language.']
>>> s.rsplit(' ',3)
['Python is', 'a', 'rewarding', 'language.']
*** join ***
>>> import string
>>> a = 'Python is'
>>> b = 'a rewarding language.'
>>> string.join((a,b))
'Python is a rewarding language.'
>>> string.join((a,b),':')
'Python is:a rewarding language.'
*** strip, lstrip, rstrip ***
>>> s = ' Python is a rewarding language. '
>>> s=s.strip()
'Python is a rewarding language.'
>>> s.strip('P')
'ython is a rewarding language.'
*** find and rfind ***
>>> s = 'Python is a rewarding language.'
>>> s.find('i')
7
>>> s.find('i',10,20)
18
>>> s.rfind('i')
18
*** index and rindex ***
>>> s = 'Python is a rewarding language.'
>>> s.index('i')
7
>>> s.index('q') # Error
ValueError: substring not found
*** count ***
>>> s = 'Python is a rewarding language.'
>>> s.count('i')
2
>>> s.count('i', 10, 20)
1
*** lower and upper ***
>>> s = 'Python is a rewarding language.'
>>> s.upper()
'PYTHON IS A REWARDING LANGUAGE.'
>>> s.lower()
'python is a rewarding language.'
*** ljust, rjust and center ***
>>> s = 'Python is a rewarding language.'
>>> s.ljust(40)
'Python is a rewarding language. '
>>> s.rjust(40)
' Python is a rewarding language.'
>>> s.center(40)
' Python is a rewarding language. '
*** replace ***
>>> s = 'Python is a rewarding language.'
>>> s.replace('g','Q')
'Python is a rewardinQ lanQuaQe.'
>>> s.replace('is','Q')
'Python Q a rewarding language.'
>>> s.replace('g','Q',2)
'Python is a rewardinQ lanQuage.'
*** Formatting Numbers *******************************************************************************
*** format ***
>>> pi
3.141592653589793
>>> ’{:12.5f}’.format(pi)
’ 3.14159’
>>> ’{:12.5g}’.format(pi)
’ 3.1416’
>>> ’{:12.5e}’.format(pi)
’ 3.14159e+00’
e, E => Exponent notation, e produces e+ and E produces E+ notation
f, F => Display number using a fixed number of digits
g, G => General format, which uses f for smaller numbers, and e for larger. G is equivalent to switching between F and E. g is the default format if no presentation format is given
n => Similar to g, except that it uses locale specific information.
% => Multiplies numbers by 100, and inserts a % sign
*** Formatting Strings *******************************************************************************
*** format ***
>>> s = ’Python’
>>> ’{0:}’.format(s)
’Python’
>>> ’{0: >20}’.format(s)
’ Python’
>>> ’{0:!>20}’.format(s)
’!!!!!!!!!!!!!!Python’
>>> ’The formatted string is: {0:!<20}’.format(s)
’The formatted string is: Python!!!!!!!!!!!!!!’
*** RegEx *******************************************************************************
*** findall, finditer, sub. findall ***
>>> import re
>>> s = ’Find all numbers in this string: 32.43, 1234.98, and 123.8.’
>>> re.findall(’[\s][0-9]+\.\d*’,s)
[’ 32.43’, ’ 1234.98’, ’ 123.8’]
>>> matches = re.finditer(’[\s][0-9]+\.\d*’,s)
>>> for m in matches:
... print(s[m.span()[0]:m.span()[1]])
32.43
1234.98
123.8
# Compile and use RegexObject
>>> import re
>>> s = ’Find all numbers in this string: 32.43, 1234.98, and 123.8.’
>>> numbers = re.compile(’[\s][0-9]+\.\d*’)
>>> numbers.findall(s)
[’ 32.43’, ’ 1234.98’, ’ 123.8’]
*** Converting Strings *******************************************************************************
-> potentially non-numeric data -> test with try...except
************************************************************************************************************
*** File System and Navigation *****************************************************************************
************************************************************************************************************
*** Changing the working directory ************************************************************************
pwd = os.getcwd()
os.chdir(’c:\\temp’)
os.chdir(’c:/temp’) # Identical
os.chdir(’..’)
os.getcwd() # Now in ’c:\\’
*** Creating and Deleting Directories **********************************************************************
os.mkdir(’c:\\temp\\test’)
os.makedirs(’c:/temp/test/level2/level3’) # mkdir will fail
os.rmdir(’c:\\temp\\test\\level2\\level3’)
shutil.rmtree(’c:\\temp\\test’) # rmdir fails, since not empty
*** Listing the Content of a Directory *********************************************************************
# standard
os.chdir(’c:\\temp’)
files = os.listdir(’.’)
for f in files:
if os.path.isdir(f):
print(f, ’ is a directory.’)
elif os.path.isfile(f):
print(f, ’ is a file.’)
else:
print(f, ’ is a something else.’)
# more sophisticated
import glob
files = glob.glob(’c:\\temp\\*.txt’)
for file in files:
print(file)
*** Copying, Moving and Deleting Files *********************************************************************
shutil.copy will accept either a filename or a directory as dest. If a directory is given, the a file is created in the directory with the same name as the original file
shutil.copyfilerequiresafilenamefordest.
shutil.copy2isidenticaltoshutil.copy,exceptthatmetadatasuchasaccesstimes,isalsocopied.
*** Executing other Programs *******************************************************************************
import subprocess
# Copy using xcopy
os.system(’xcopy /S /I c:\\temp c:\\temp4’)
subprocess.call(’xcopy /S /I c:\\temp c:\\temp5’,shell=True)
# Extract using 7-zip
subprocess.call(’"C:\\Program Files\\7-Zip\\7z.exe" e -y c:\\temp\\zip.7z’)
*** Creating and Opening Archives **************************************************************************
# Creates files.zip
shutil.make_archive(’files’,’zip’,’c:\\temp\\folder_to_archive’)
# Creates files.tar.gz
shutil.make_archive(’files’,’gztar’,’c:\\temp\\folder_to_archive’)
# gzip sucks...
import gzip
csvin = file(’file.csv’,’rb’)
gz = gzip.GzipFile(’file.csv.gz’,’wb’)
gz.writelines(csvin.read())
gz.close()
csvin.close()
# extracting
import zipfile
import gzip
import tarfile
# Extract zip
zip = zipfile.ZipFile(’files.zip’)
zip.extractall(’c:\\temp\\zip\\’)
zip.close()
# Extract gzip tar ’r:gz’ indicates read gzipped
gztar = tarfile.open(’file.tar.gz’, ’r:gz’)
gztar.extractall(’c:\\temp\\gztar\\’)
gztar.close()
# Extract csv from gzipped csv
gz = gzip.GzipFile(’file.csv.gz’,’rb’)
csvout = file(’file.csv’,’wb’)
csvout.writelines(gz.read())
csvout.close()
gz.close()
*** Reading and Writing Files **************************************************************************
# Read all lines using readlines()
f = file(’file.csv’,’r’)
lines = f.readlines()
for line in lines:
print(line)
f.close()
# Using readline(n)
f = file(’file.csv’,’r’)
line = f.readline()
while line != ’’:
print(line)
line = f.readline()
f.close()
# Using readlines(n)
f = file(’file.csv’,’r’)
lines = f.readlines(2)
while lines != ’’:
for line in lines:
print(line)
lines = f.readline(2)
f.close()
************************************************************************************************************
*** Structured Arrays **************************************************************************************
************************************************************************************************************
*** can be initialized with array or zeros
>>> x = zeros(4,[(’date’,’int’),(’ret’,’float’)])
>>> x = zeros(4,{’names’: (’date’,’ret’), ’formats’: (’int’, ’float’)})
>>> x
array([(0, 0.0), (0, 0.0), (0, 0.0), (0, 0.0)],
dtype=[(’date’, ’<i4’), (’ret’, ’<f8’)])
*** data types
Boolean => b
Integers => i1,i2,i4,i8
Unsigned Integers => u1,u2,u4,u8
Floating Point => f4,f8
Complex => c8,c16
Object => On
String => Sn,an
Unicode String => Un
# example
t = dtype([(’var1’,’f8’), (’var2’,’i8’), (’var3’,’u8’)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment