conda create --name ds python=3.11 numpy pandas scikit-learn matplotlib seaborn jupyter plotly ipykernel pyodbc
activate ds
def compare_df_columns(df1, df2): | |
""" | |
Compare the columns of two dataframes (including their types) | |
""" | |
matched = True | |
# Compare number of rows | |
if df1.shape[0] != df2.shape[0]: | |
print(f'Row numbers do not match {df1.shape[0]:,} vs {df2.shape[0]:,}') | |
matched=False |
# SHAP's force plot does not label all the important features | |
# We usually need to get the top (20) feautures that affect a decision for a particular instance | |
# In addition to their name, the features' values and their shapley values are also required. | |
# The below snippet | |
# 1. creates a dataframe containing all the features, their shapley value and their actual value | |
# 2. and exports the dataframe to a csv file | |
# 3. It also displays the force plot | |
import shap | |
shap.initjs() |
# ======================================================================= | |
# Print a summary of a pandas dataframe and its columns | |
# ======================================================================= | |
def df_summary(df): | |
print(f'Dataframe has {df.shape[0]:,} rows and {df.shape[1]:,} columns') | |
if len(df) > 1: | |
summary = pd.DataFrame(df.dtypes, columns=['dtype']).reset_index() | |
summary.rename(columns={'index': 'feature'}, inplace=True) | |
summary['missing'] = df.isnull().sum().values | |
summary['uniques'] = df.nunique().values |
To start the docker container, run the following command:
docker run -d -p 8787:8787 -v C:\source:/home/rstudio -e ROOT=TRUE -e PASSWORD=rstudio rocker/rstudio
In the above command, I'm mounting "C:\source" to "/home/rstudio", thus providing the container access to all the contents of "C:\source".
If you want to mount multiple paths use -v multiple time. Example:
The extensions to install:
For R LSP, you need to have R language server installed.