Skip to content

Instantly share code, notes, and snippets.

@arogulin
Last active May 9, 2018 01:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arogulin/43e9a6d4768a302a8029891981224665 to your computer and use it in GitHub Desktop.
Save arogulin/43e9a6d4768a302a8029891981224665 to your computer and use it in GitHub Desktop.
Machine Learning Cheat Sheet
Pandas methods:
- df.info() - quick description about data and amount of non-null values.
- df['category_field'].value_counts() - shows all categories and how many rows share each category.
- df.describe() - show summary of numerical attributes.
- df.hist(bins=50, figsize=(20,15)) - plots a histogram for each numerical attribute.
- df.where(df['age'] < 100, 100, inplace=True) - where the condition is false - replaces the value with the second argument.
- df.mask(df['age'] > 100, 100, inplace=True) - the opposite of where().
- df.corr() - returns linear correlation matrix between every pair of attributes
cm = df.corr()
cm['some_attribte'].sort_values(ascending=False)
- pandas.plotting.scatter_matrix(df[categories], figsize=(12,8)) - plots a correlation matrix of every category pair.
- df.plot(kind='scatter', x='age', y='income', alpha=0.1) - same as scatter_matrix, but for single category pair.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment