Last active
February 19, 2016 19:24
-
-
Save oliviac12/635be5ecdb6cee1f0c3e to your computer and use it in GitHub Desktop.
It's Python Time
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Python interative data viz | |
http://walkerke.github.io/geog30323/slides/interactive/#/ | |
https://s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+Pandas,+SciPy,+NumPy+Cheat+Sheet.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[dataframe].shape | |
#it has to be data frame? and it gets the dimension of the data set of you | |
dataframe.isnull().sum() | |
gives you the number of missing data in each column in your data set | |
dataframe.info() | |
gives you data type of the varibales | |
series/varible/colum.describe() | |
gives you stat summary of that seires/varibale/column | |
datafram.loc() is a pandas lable based slicing becasue normal slicing won't work in pandas | |
there's also conditional slicing see example: | |
dataframe.loc[dataframe.age > 95, 'age'] = np.nan (when age is larger than 95, value is NaN | |
For a long numerical number, can get rid of last n digit by doing | |
number // 10*n, this way, the last n digits will be gone, no decimal points | |
dataframe.series.value_counts gives you the frequency of different value in same series | |
dataframe.fillna('-1') #fill up all the Nan with -1, could be other values too | |
np.vstack #stack all the list vertically | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment