This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| BIM Info = | |
| VAR TableInfo = | |
| ADDCOLUMNS ( | |
| INFO.VIEW.TABLES (), | |
| "Component", "Tables" | |
| ) | |
| VAR ColumnInfo = | |
| ADDCOLUMNS ( | |
| INFO.VIEW.COLUMNS (), | |
| "Component", "Columns" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def impute_missing(df): | |
| ''' | |
| Impute categorical with mode | |
| Impute numeric with mean | |
| ''' | |
| categorical_cols = df.select_dtypes(include=['object','category']).columns | |
| numeric_cols = df.select_dtypes(include=['number']).columns | |
| for cat_col in categorical_cols: | |
| df[cat_col] = df[cat_col].fillna(df[cat_col].value_counts()[0]) | |
| for num_col in numeric_cols: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def trim_all_columns(df): | |
| """ | |
| https://stackoverflow.com/questions/40950310/strip-trim-all-strings-of-a-dataframe | |
| Trim whitespace from ends of each value across all series in dataframe | |
| """ | |
| trim_strings = lambda x: x.strip() if isinstance(x, str) else x | |
| return df.applymap(trim_strings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pandas_profiling import ProfileReport | |
| profile = ProfileReport(training_data, title='Pandas Profiling Report', explorative=True) | |
| profile.to_file("training_data_profile.html") | |
| profile.to_notebook_iframe() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def categorical_summary(df): | |
| ''' | |
| Adapted from https://www.kaggle.com/nextbigwhat/eda-for-categorical-variables-part-2 | |
| Returns a dataframe containing information about categorical columns | |
| Column name is set as the index | |
| ''' | |
| categorical_cols = df.select_dtypes(include='object').columns | |
| summary_df = pd.DataFrame(columns= | |
| [ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def plot_correlaton_heatmap(df): | |
| numeric_cols = df.select_dtypes(exclude='object').columns | |
| plt.figure(figsize=(10,8)) | |
| sns.heatmap(df[numeric_cols].corr(), cmap='RdBu_r', annot=True) | |
| print(plt.show()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import dtale | |
| import dtale.app as dtale_app | |
| dtale_app.USE_NGROK = True | |
| dtale.show(training_data, ignore_duplicate=True) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| git config --global init.defaultBranch main |
NewerOlder