Skip to content

Instantly share code, notes, and snippets.

@marinados
Last active December 10, 2015 10:18
Show Gist options
  • Save marinados/572a5b3dba41548193a9 to your computer and use it in GitHub Desktop.
Save marinados/572a5b3dba41548193a9 to your computer and use it in GitHub Desktop.
Pandas, Numpy, SciPy - libraries for vectorized calculations of non structured data
Folium - geographic data
iPython - work environment
List ~ Array (methods: append, insert, del)
Dictionary (dict) ~ Hash
.type ~ .class
| ~ ||
& ~ &&
syntax -->
def method_name(argument):
// body
=======
NUMPY
import numpy as np
ndarray ~ matrix
np.array - creates a list
np.arange(15) -> (1..15)
np.dtype
array.astype(np.float64)
array[5:9]
array[0][1]
1. vector comparison
ndarray = [-10, 1, 7, -8]
ndarray > 0 --> [false, true, true, false]
2. vector comparison as condition
ndarray[ndarray > 0] --> [1, 7]
3. operations on each array element
array = np.arand(1,10,1)
np.sqrt(array) --> [sqrt(1), srrt(2) etc.]
4. max and min of several tables
np.maximum(array1, array2) --> [max of array1 and array2]
==========
PANDAS
import pandas as pd
1. Series - elements of several types but 1 dimension
ser = pd.Series(elements)
ser.index --> [0,1,2 etc.] (not necessarily integers)
ser * 2
ser[ser > 0]
ser.values
When created from dictionaries, dictionary's keys become indexes
2. DataFrames - series with multiple columns
pd.DataFrame(dictionary) --> columns = [column list], index = [1,2,3 etc.]
--> table with keys as headers,
concatenation of series
ACCESS LINE / COL
Columns accessible with ['name'] or .name
datafr.ix(2) --> selection of line 2
datafr.reindex[list of new indexes] (or .fill_value = 0 /sth else)
If line with this index is n/a --> NaN (not a number)
datafr.fillna(0)
datafr.drop[2, axis=0] (either index or col/line name)
axis0 - line
axis1 - column
CONCATENATION OF TWO DATAFRAMES
- datafr1 + datafr2 --> only takes the values existing in two tables
- datafr1.add(datafr2, fill_value = 0) --> takes all data
SORTING
.sort_index(axis=0)
.sort('col_name') - ASC by def (ascending=false)
.describe() --> classic stats like mean, max etc.
.mean() --> every line mean
.sum()
.dropna() --> drop all lines with at least 1 NULL value
.dropna(axis=1, how='all') --> only if all values are NULL
.fillna(0) --> creates a copy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment