Skip to content

Instantly share code, notes, and snippets.

View salilathalye's full-sized avatar

Salil Athalye salilathalye

View GitHub Profile
@salilathalye
salilathalye / gist:80bb089f40564cef9c74457e383e5e62
Created February 20, 2025 23:09 — forked from bjulius/gist:bbe97b5a954fc15fd58dea76e066e5b4
DAX Measure to Extract Data Model Info for Prompting AI
BIM Info =
VAR TableInfo =
ADDCOLUMNS (
INFO.VIEW.TABLES (),
"Component", "Tables"
)
VAR ColumnInfo =
ADDCOLUMNS (
INFO.VIEW.COLUMNS (),
"Component", "Columns"
@salilathalye
salilathalye / impute_missing_in_df.py
Created March 19, 2021 23:52
Imputes missing categorical with mode, numeric with mean
def impute_missing(df):
'''
Impute categorical with mode
Impute numeric with mean
'''
categorical_cols = df.select_dtypes(include=['object','category']).columns
numeric_cols = df.select_dtypes(include=['number']).columns
for cat_col in categorical_cols:
df[cat_col] = df[cat_col].fillna(df[cat_col].value_counts()[0])
for num_col in numeric_cols:
@salilathalye
salilathalye / trim_string_columns_in_dataframe.py
Created February 15, 2021 18:35
trim all string columns in a dataframe (from StackOverflow)
def trim_all_columns(df):
"""
https://stackoverflow.com/questions/40950310/strip-trim-all-strings-of-a-dataframe
Trim whitespace from ends of each value across all series in dataframe
"""
trim_strings = lambda x: x.strip() if isinstance(x, str) else x
return df.applymap(trim_strings)
@salilathalye
salilathalye / pathlib_cookiecutter_paths.py
Created February 15, 2021 12:38
Using pathlib to navigate cookiecutter directory structures
from pathlib import Path
# Uses cookiecutter datascience template
# This jupyter notebook is in the notebooks directory
notebook_path = Path('.').resolve()
project_path = notebook_path.parents[0]
data_raw_path = project_path / 'data' / 'raw'
data_interim_path = project_path / 'data' / 'interim'
data_processed_path = project_path / 'data' / 'processed'
@salilathalye
salilathalye / pandas_profiling_eda_jupyter.py
Created February 13, 2021 15:23
Profile training data using pandas_profiling
from pandas_profiling import ProfileReport
profile = ProfileReport(training_data, title='Pandas Profiling Report', explorative=True)
profile.to_file("training_data_profile.html")
profile.to_notebook_iframe()
@salilathalye
salilathalye / categorical_summary.py
Last active February 7, 2021 22:47
Create a dataframe summarizing categorical columns
def categorical_summary(df):
'''
Adapted from https://www.kaggle.com/nextbigwhat/eda-for-categorical-variables-part-2
Returns a dataframe containing information about categorical columns
Column name is set as the index
'''
categorical_cols = df.select_dtypes(include='object').columns
summary_df = pd.DataFrame(columns=
[
@salilathalye
salilathalye / seaborn_correlation_heatmap.py
Created February 7, 2021 22:07
Plot a correlation heatmap
def plot_correlaton_heatmap(df):
numeric_cols = df.select_dtypes(exclude='object').columns
plt.figure(figsize=(10,8))
sns.heatmap(df[numeric_cols].corr(), cmap='RdBu_r', annot=True)
print(plt.show())
@salilathalye
salilathalye / dtale_colab
Created February 6, 2021 19:42
setup for dtale on colab
import dtale
import dtale.app as dtale_app
dtale_app.USE_NGROK = True
dtale.show(training_data, ignore_duplicate=True)
@salilathalye
salilathalye / cookiecutter-datascience-template
Created January 23, 2021 15:09
create a data science project using the cookicutter template
cookiecutter https://github.com/drivendata/cookiecutter-data-science
@salilathalye
salilathalye / git_init_main_branch
Created January 23, 2021 14:46
Change git to use main instead of master with git init
git config --global init.defaultBranch main