You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If category_col is only one col, agg_func should change dict to list.
And, agg_df.columns change to [category_col+'_'_col for in agg_df.columns.values]
Function to expand and store a column containing multiple values with separate characters.
defexpand_multiple_choice_columns(df,multiple_cols,symbol=';'):
forcinmultiple_cols:
# Check if there are multiple entries in this columntemp=df[c].str.split(';', expand=True)
# Get all the possible values in this columnnew_columns=pd.unique(temp.values.ravel())
fornew_cinnew_columns:
ifnew_candnew_cisnotnp.nan:
# Create new column for each unique columnidx=df[c].str.contains(new_c, regex=False).fillna(False)
df.loc[idx, f"{c}_{new_c}"] =1print(f">> Multiple entries in {c}. Added {len(new_columns)} one-hot-encoding columns")
# Drop the original columndf.drop(c, axis=1, inplace=True)
returndf
defplot_each_features(df, nrow=4, ncol=5, figsize=(20,8), res=100):
''' Plot the index vs. value for each column '''fig, axes=plt.subplots(nrow, ncol,figsize=figsize)
axes=axes.flatten()
forcol,axinzip (df.columns,axes):
ax.plot(df[col][0::res])
ax.set_title(col)
plt.tight_layout()
plt.show()
verify that the DataFrame does not contain any odd values
defvalid_dataframe(df):
nulls=df.isnull().sum().sum()
assertnulls==0 , f'df includes null value at {df.isnull().any(axis=1).values}'assertlen(np.unique(df.columns)) ==len(df.columns) , f'df includes same name columns'returnTrue
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters