Last active
May 9, 2018 01:37
-
-
Save arogulin/43e9a6d4768a302a8029891981224665 to your computer and use it in GitHub Desktop.
Machine Learning Cheat Sheet
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pandas methods: | |
- df.info() - quick description about data and amount of non-null values. | |
- df['category_field'].value_counts() - shows all categories and how many rows share each category. | |
- df.describe() - show summary of numerical attributes. | |
- df.hist(bins=50, figsize=(20,15)) - plots a histogram for each numerical attribute. | |
- df.where(df['age'] < 100, 100, inplace=True) - where the condition is false - replaces the value with the second argument. | |
- df.mask(df['age'] > 100, 100, inplace=True) - the opposite of where(). | |
- df.corr() - returns linear correlation matrix between every pair of attributes | |
cm = df.corr() | |
cm['some_attribte'].sort_values(ascending=False) | |
- pandas.plotting.scatter_matrix(df[categories], figsize=(12,8)) - plots a correlation matrix of every category pair. | |
- df.plot(kind='scatter', x='age', y='income', alpha=0.1) - same as scatter_matrix, but for single category pair. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment