Skip to content

Instantly share code, notes, and snippets.

@bgalvao
Created November 9, 2022 07:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bgalvao/2a9d59f29328e3096d6e4087e1bdb43c to your computer and use it in GitHub Desktop.
Save bgalvao/2a9d59f29328e3096d6e4087e1bdb43c to your computer and use it in GitHub Desktop.
Deepchecks suite result

Train Test Validation Suite

The suite is composed of various checks such as: Train Test Label Drift, Train Test Feature Drift, Date Train Test Leakage Overlap, etc...
Each check may contain conditions (which will result in pass / fail / warning / error , represented by / / ! / ) as well as other outputs such as plots or tables.
Suites, checks and conditions can all be modified. Read more about custom suites .

Conditions Summary

Status Check Condition More Info
Datasets Size Comparison Test-Train size ratio is greater than 0.01 Test-Train size ratio is 0.5
New Label Train Test Number of new label values is less or equal to 0 No new labels found
Category Mismatch Train Test Ratio of samples with a new category is less or equal to 0% Passed for 8 relevant columns
String Mismatch Comparison No new variants allowed in test data Passed for 9 relevant columns
Train Test Samples Mix Percentage of test data samples that appear in train data is less or equal to 10% Percent of test data samples that appear in train data: 0.14%
Feature Label Correlation Change Train-Test features' Predictive Power Score difference is less than 0.2 Passed for 14 relevant columns
Feature Label Correlation Change Train features' Predictive Power Score is less than 0.7 Passed for 14 relevant columns
Train Test Feature Drift categorical drift score < 0.2 and numerical drift score < 0.1 Passed for 14 columns out of 14 columns. Found column "relationship" has the highest categorical drift score: 4.25E-3 Found column "hours-per-week" has the highest numerical drift score: 4.24E-3
Train Test Label Drift categorical drift score < 0.2 and numerical drift score < 0.1 for label drift Label's drift score Cramer's V is 2.16E-3
Multivariate Drift Drift value is less than 0.25 Found drift value of: 4.21E-3, corresponding to a domain classifier AUC of: 0.5

Check With Conditions Output

Datasets Size Comparison

Verify test dataset size comparing it to the train dataset size. Read More...

Conditions Summary
Status Condition More Info
Test-Train size ratio is greater than 0.01 Test-Train size ratio is 0.5
Additional Outputs
Train Test
Size 32561 16281

Category Mismatch Train Test

Find new categories in the test set. Read More...

Conditions Summary
Status Condition More Info
Ratio of samples with a new category is less or equal to 0% Passed for 8 relevant columns
Additional Outputs
Number of new categories Percent of new categories in sample Feature importance New categories examples
Column
workclass 0 0% 0.00 []
marital-status 0 0% 0.14 []
native-country 0 0% 0.00 []
relationship 0 0% 0.11 []
education 0 0% -0.00 []

Train Test Samples Mix

Detect samples in the test data that appear also in training data. Read More...

Conditions Summary
Status Condition More Info
Percentage of test data samples that appear in train data is less or equal to 10% Percent of test data samples that appear in train data: 0.14%
Additional Outputs
0.14% (23 / 16281) of test data samples appear in train data
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
Train indices: 24667 Test indices: 4152 17.00 Private 153021.00 12th 8.00 Never-married Sales Own-child White Female 0.00 0.00 20.00 United-States <=50K
Train indices: 30345 Test indices: 10826 23.00 Private 250630.00 Bachelors 13.00 Never-married Sales Not-in-family White Female 0.00 0.00 40.00 United-States <=50K
Train indices: 17867 Test indices: 13504 45.00 Private 82797.00 Bachelors 13.00 Married-civ-spouse Exec-managerial Husband White Male 0.00 0.00 45.00 United-States >50K
Train indices: 20486 Test indices: 14838 43.00 Private 195258.00 HS-grad 9.00 Married-civ-spouse Craft-repair Husband White Male 0.00 0.00 40.00 United-States >50K
Train indices: 3445 Test indices: 5907 41.00 Private 116391.00 Bachelors 13.00 Married-civ-spouse Exec-managerial Husband White Male 0.00 0.00 40.00 United-States >50K
Train indices: 2195 Test indices: 12488 39.00 Private 184659.00 HS-grad 9.00 Married-civ-spouse Machine-op-inspct Husband White Male 0.00 0.00 40.00 United-States <=50K
Train indices: 14581 Test indices: 14487 31.00 Private 228873.00 HS-grad 9.00 Married-civ-spouse Craft-repair Husband White Male 0.00 0.00 40.00 United-States <=50K
Train indices: 21974 Test indices: 7350 30.00 Private 111567.00 HS-grad 9.00 Never-married Craft-repair Own-child White Male 0.00 0.00 48.00 United-States <=50K
Train indices: 8908 Test indices: 5078 29.00 ? 41281.00 Bachelors 13.00 Married-spouse-absent ? Not-in-family White Male 0.00 0.00 50.00 United-States <=50K
Train indices: 4325, 4881 Test indices: 14308 25.00 Private 308144.00 Bachelors 13.00 Never-married Craft-repair Not-in-family White Male 0.00 0.00 40.00 Mexico <=50K

Feature Label Correlation Change

Return the Predictive Power Score of all features, in order to estimate each feature's ability to predict the label. Read More...

Conditions Summary
Status Condition More Info
Train-Test features' Predictive Power Score difference is less than 0.2 Passed for 14 relevant columns
Train features' Predictive Power Score is less than 0.7 Passed for 14 relevant columns
Additional Outputs
The Predictive Power Score (PPS) is used to estimate the ability of a feature to predict the label by itself. (Read more about Predictive Power Score )
In the graph above , we should suspect we have problems in our data if:
1. Train dataset PPS values are high :
Can indicate that this feature's success in predicting the label is actually due to data leakage,
meaning that the feature holds information that is based on the label to begin with.
2. Large difference between train and test PPS (train PPS is larger):
An even more powerful indication of data leakage, as a feature that was powerful in train but not in test
can be explained by leakage in train that is not relevant to a new dataset.
3. Large difference between test and train PPS (test PPS is larger):
An anomalous value, could indicate drift in test dataset that caused a coincidental correlation to the target label.

Train Test Feature Drift

Calculate drift between train dataset and test dataset per feature, using statistical measures. Read More...

Conditions Summary
Status Condition More Info
categorical drift score < 0.2 and numerical drift score < 0.1 Passed for 14 columns out of 14 columns. Found column "relationship" has the highest categorical drift score: 4.25E-3 Found column "hours-per-week" has the highest numerical drift score: 4.24E-3
Additional Outputs
The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the features, sorted by the sum of the drift score and the feature importance and showing only the top 5 features, according to the sum of the drift score and the feature importance.
For discrete distribution plots, showing the top 10 categories with largest difference between train and test.
If available, the plot titles also show the feature importance (FI) rank

Train Test Label Drift

Calculate label drift between train dataset and test dataset, using statistical measures. Read More...

Conditions Summary
Status Condition More Info
categorical drift score < 0.2 and numerical drift score < 0.1 for label drift Label's drift score Cramer's V is 2.16E-3
Additional Outputs
The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the label.
For discrete distribution plots, showing the top 10 categories with largest difference between train and test.

Check Without Conditions Output

Other Checks That Weren't Displayed

Check Reason
Date Train Test Leakage Duplicates DatasetValidationError: Dataset does not contain a datetime. see Dataset docs
Date Train Test Leakage Overlap DatasetValidationError: Dataset does not contain a datetime. see Dataset docs
Index Train Test Leakage DatasetValidationError: Dataset does not contain an index. see Dataset docs
New Label Train Test Nothing found
String Mismatch Comparison Nothing found
Multivariate Drift Nothing found
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment