Skip to content

Instantly share code, notes, and snippets.

View goyetc's full-sized avatar
🦦
taking a bath

Colin Goyette goyetc

🦦
taking a bath
View GitHub Profile
@goyetc
goyetc / tru_explainer_feature_value_influence_relationships_ISPs
Created August 2, 2023 16:34
TruEra Explainer - Feature Influence Sensitivity Plots
#get a single influence-sensitivity plot (ISP)
RF_explainer.plot_isp("Age")
#get all ISPs
RF_explainer.plot_isps()
@goyetc
goyetc / tru_explainer_segmentation
Created August 2, 2023 16:32
TruEra Explainer - Segmentation Overlays
#retrieve existing segment groups that have been created in project
#Note: segment groups can also be created programmatically, see Python SDK reference
tru.get_segment_groups()
#create a new explainer object that is set to data and segment of interest
segment_explainer = tru.get_explainer('train')
segment_explainer.set_comparison_data_splits(tru.get_data_splits())
segment_explainer.set_segment('Gender', 'Male')
#observe performance for male segment
@goyetc
goyetc / tru_explainer_drift_and_feat_contributions
Created August 2, 2023 16:28
TruEra Explainer - Drift Measurement & Feature Contributions
#Drift metrics -- select metric of interest with optional parameter. Defaults to project setting.
RF_explainer.compute_model_score_instability()
#feature contributions to score drift -- related to shifts in influence density, not feature value
RF_explainer.compute_feature_contributors_to_instability().transpose().sort_values(by='test', ascending=False)
@goyetc
goyetc / tru_explainer_global_explanations
Created August 2, 2023 16:24
TruEra Explainer - Global Explanations / Average Feature Importance
#get avg(abs(feature influences)), per feature, and sort highest to lowest
RF_explainer.get_global_feature_importances().transpose().sort_values(by=0, ascending=False)
@goyetc
goyetc / tru_explainer_get_feature_influences
Created August 2, 2023 16:22
TruEra Explainer - Row Level Explanations
#feature influences, row level
#returns results based on current truera workspace context (tru), as dataframe
RF_explainer.get_feature_influences()
@goyetc
goyetc / tru_explainer_find_hotspots
Created August 2, 2023 16:14
TruEra Explainer - Performance Hotspots (High Error Segments)
#Hotspots: error analysis / performance debugging
#note ability to select performance metric of interest
RF_explainer.find_hotspots(metric_of_interest='RECALL',
max_num_responses=3)
#alternatively, also include what-if performance analyses, if hotspot were eliminated
RF_explainer.find_hotspots(metric_of_interest='CLASSIFICATION_ACCURACY',
show_what_if_performance=True,
max_num_responses=3)
@goyetc
goyetc / tru_explainer_get_aucs
Created August 2, 2023 16:12
TruEra Explainer - Get Performance Metrics
#for quick performance comparison, let's set set a base and comparison split
RF_explainer.set_base_data_split("train")
#optional: check which splits are available
tru.get_data_splits()
#TruEra will automatically ignore any splits already set as the base data, e.g., training data
RF_explainer.set_comparison_data_splits(tru.get_data_splits())
#optional: list available performance metrics
@goyetc
goyetc / create_truera_explainer_trushap
Last active August 2, 2023 16:08
TruEra Explainers with TruSHAP
#instantiate truera workspace, if not already active in your current python kernel/environment
tru = explainer.get_truera_workspace()
tru.set_environment("remote") #retrieve project information from TruEra Web App
#set truera worksapce context
tru.set_project("Titanic Survival")
tru.set_data_collection("Titanic Passenger Data")
tru.set_model("Random Forest")
tru.set_data_split("Train")
@goyetc
goyetc / Approach2_TruSHAP_generate_shapley_values
Last active August 2, 2023 15:42
TruSHAP Generate Shapley Values
#Note: TruEra's use of an ID column / unique identifier enables proper type handling, higher data volume limits, and delayed addition of data such as labels or extra data for segmentation
#Note 2: by specifying a "split name", we automatically add this data, and corresponding Shapley values, to the project created in the TruSHAP explainer step
GBM_shap_values = GBM_explainer(X_train,
y=y_df,
id_col_name='index',
data_split_name = "train"
)
#import trushap as shap
from truera.client.experimental.trushap import trushap as shap
#TruEra Web App - connection details
CONNECTION_STRING = os.getenv('url')
TOKEN = os.getenv('token')
#use connection string and token as arguments in shap.Explainer method
#define project resource names, as desired
GBM_explainer = shap.Explainer(GBM,