bgalvao/suite_result.md

## suite_result.md

      
    Raw
  

              suite_result.md
            
          
   Train Test Validation Suite
  

  The suite is composed of various checks such as: Train Test Label Drift, Train Test Feature Drift, Date Train Test Leakage Overlap, etc...
  

  Each check may contain conditions (which will result in pass / fail / warning / error
                , represented by
  
   ✓
  
  /
  
   ✖
  
  /
  
   !
  
  /
  
   ⁈
  
  ) as well as other outputs such as plots or tables.
  

  Suites, checks and conditions can all be modified. Read more about
  
   custom suites
  
  .
 
 
    Conditions Summary
   

      Status
     
     
      Check
     
     
      Condition
     
     
      More Info
     
    
       ✓
      
     
      Datasets Size Comparison
     
     
      Test-Train size ratio is greater than 0.01
     
     
      Test-Train size ratio is 0.5
     
    
       ✓
      
     
      New Label Train Test
     
     
      Number of new label values is less or equal to 0
     
     
      No new labels found
     
    
       ✓
      
     
      Category Mismatch Train Test
     
     
      Ratio of samples with a new category is less or equal to 0%
     
     
      Passed for 8 relevant columns
     
    
       ✓
      
     
      String Mismatch Comparison
     
     
      No new variants allowed in test data
     
     
      Passed for 9 relevant columns
     
    
       ✓
      
     
      Train Test Samples Mix
     
     
      Percentage of test data samples that appear in train data is less or equal to 10%
     
     
      Percent of test data samples that appear in train data: 0.14%
     
    
       ✓
      
     
      Feature Label Correlation Change
     
     
      Train-Test features' Predictive Power Score difference is less than 0.2
     
     
      Passed for 14 relevant columns
     
    
       ✓
      
     
      Feature Label Correlation Change
     
     
      Train features' Predictive Power Score is less than 0.7
     
     
      Passed for 14 relevant columns
     
    
       ✓
      
     
      Train Test Feature Drift
     
     
      categorical drift score < 0.2 and numerical drift score < 0.1
     
     
      Passed for 14 columns out of 14 columns.
Found column "relationship" has the highest categorical drift score: 4.25E-3
Found column "hours-per-week" has the highest numerical drift score: 4.24E-3
     
    
       ✓
      
     
      Train Test Label Drift
     
     
      categorical drift score < 0.2 and numerical drift score < 0.1 for label drift
     
     
      Label's drift score Cramer's V is 2.16E-3
     
    
       ✓
      
     
      Multivariate Drift
     
     
      Drift value is less than 0.25
     
     
      Found drift value of: 4.21E-3, corresponding to a domain classifier AUC of: 0.5
     
    
    Check With Conditions Output
   

    Datasets Size Comparison
   
  
   Verify test dataset size comparing it to the train dataset size.
   
    Read More...
   
  
    Conditions Summary
   
  
      Status
     
     
      Condition
     
     
      More Info
     
    
       ✓
      
     
      Test-Train size ratio is greater than 0.01
     
     
      Test-Train size ratio is 0.5
     
    
    Additional Outputs
   
  
      Train
     
     
      Test
     
    
      Size
     
     
      32561
     
     
      16281
     
    
    Category Mismatch Train Test
   
  
   Find new categories in the test set.
   
    Read More...
   
  
    Conditions Summary
   
  
      Status
     
     
      Condition
     
     
      More Info
     
    
       ✓
      
     
      Ratio of samples with a new category is less or equal to 0%
     
     
      Passed for 8 relevant columns
     
    
    Additional Outputs
   
  
      Number of new categories
     
     
      Percent of new categories in sample
     
     
      Feature importance
     
     
      New categories examples
     
    
      Column
     
     
      workclass
     
     
      0
     
     
      0%
     
     
      0.00
     
     
      []
     
    
      marital-status
     
     
      0
     
     
      0%
     
     
      0.14
     
     
      []
     
    
      native-country
     
     
      0
     
     
      0%
     
     
      0.00
     
     
      []
     
    
      relationship
     
     
      0
     
     
      0%
     
     
      0.11
     
     
      []
     
    
      education
     
     
      0
     
     
      0%
     
     
      -0.00
     
     
      []
     
    
    Train Test Samples Mix
   
  
   Detect samples in the test data that appear also in training data.
   
    Read More...
   
  
    Conditions Summary
   
  
      Status
     
     
      Condition
     
     
      More Info
     
    
       ✓
      
     
      Percentage of test data samples that appear in train data is less or equal to 10%
     
     
      Percent of test data samples that appear in train data: 0.14%
     
    
    Additional Outputs
   
  
   0.14% (23 / 16281)                      of test data samples appear in train data
  
  
      age
     
     
      workclass
     
     
      fnlwgt
     
     
      education
     
     
      education-num
     
     
      marital-status
     
     
      occupation
     
     
      relationship
     
     
      race
     
     
      sex
     
     
      capital-gain
     
     
      capital-loss
     
     
      hours-per-week
     
     
      native-country
     
     
      income
     
    
      Train indices: 24667
Test indices: 4152
     
     
      17.00
     
     
      Private
     
     
      153021.00
     
     
      12th
     
     
      8.00
     
     
      Never-married
     
     
      Sales
     
     
      Own-child
     
     
      White
     
     
      Female
     
     
      0.00
     
     
      0.00
     
     
      20.00
     
     
      United-States
     
     
      <=50K
     
    
      Train indices: 30345
Test indices: 10826
     
     
      23.00
     
     
      Private
     
     
      250630.00
     
     
      Bachelors
     
     
      13.00
     
     
      Never-married
     
     
      Sales
     
     
      Not-in-family
     
     
      White
     
     
      Female
     
     
      0.00
     
     
      0.00
     
     
      40.00
     
     
      United-States
     
     
      <=50K
     
    
      Train indices: 17867
Test indices: 13504
     
     
      45.00
     
     
      Private
     
     
      82797.00
     
     
      Bachelors
     
     
      13.00
     
     
      Married-civ-spouse
     
     
      Exec-managerial
     
     
      Husband
     
     
      White
     
     
      Male
     
     
      0.00
     
     
      0.00
     
     
      45.00
     
     
      United-States
     
     
      >50K
     
    
      Train indices: 20486
Test indices: 14838
     
     
      43.00
     
     
      Private
     
     
      195258.00
     
     
      HS-grad
     
     
      9.00
     
     
      Married-civ-spouse
     
     
      Craft-repair
     
     
      Husband
     
     
      White
     
     
      Male
     
     
      0.00
     
     
      0.00
     
     
      40.00
     
     
      United-States
     
     
      >50K
     
    
      Train indices: 3445
Test indices: 5907
     
     
      41.00
     
     
      Private
     
     
      116391.00
     
     
      Bachelors
     
     
      13.00
     
     
      Married-civ-spouse
     
     
      Exec-managerial
     
     
      Husband
     
     
      White
     
     
      Male
     
     
      0.00
     
     
      0.00
     
     
      40.00
     
     
      United-States
     
     
      >50K
     
    
      Train indices: 2195
Test indices: 12488
     
     
      39.00
     
     
      Private
     
     
      184659.00
     
     
      HS-grad
     
     
      9.00
     
     
      Married-civ-spouse
     
     
      Machine-op-inspct
     
     
      Husband
     
     
      White
     
     
      Male
     
     
      0.00
     
     
      0.00
     
     
      40.00
     
     
      United-States
     
     
      <=50K
     
    
      Train indices: 14581
Test indices: 14487
     
     
      31.00
     
     
      Private
     
     
      228873.00
     
     
      HS-grad
     
     
      9.00
     
     
      Married-civ-spouse
     
     
      Craft-repair
     
     
      Husband
     
     
      White
     
     
      Male
     
     
      0.00
     
     
      0.00
     
     
      40.00
     
     
      United-States
     
     
      <=50K
     
    
      Train indices: 21974
Test indices: 7350
     
     
      30.00
     
     
      Private
     
     
      111567.00
     
     
      HS-grad
     
     
      9.00
     
     
      Never-married
     
     
      Craft-repair
     
     
      Own-child
     
     
      White
     
     
      Male
     
     
      0.00
     
     
      0.00
     
     
      48.00
     
     
      United-States
     
     
      <=50K
     
    
      Train indices: 8908
Test indices: 5078
     
     
      29.00
     
     
      ?
     
     
      41281.00
     
     
      Bachelors
     
     
      13.00
     
     
      Married-spouse-absent
     
     
      ?
     
     
      Not-in-family
     
     
      White
     
     
      Male
     
     
      0.00
     
     
      0.00
     
     
      50.00
     
     
      United-States
     
     
      <=50K
     
    
      Train indices: 4325, 4881
Test indices: 14308
     
     
      25.00
     
     
      Private
     
     
      308144.00
     
     
      Bachelors
     
     
      13.00
     
     
      Never-married
     
     
      Craft-repair
     
     
      Not-in-family
     
     
      White
     
     
      Male
     
     
      0.00
     
     
      0.00
     
     
      40.00
     
     
      Mexico
     
     
      <=50K
     
    
    Feature Label Correlation Change
   
  
   Return the Predictive Power Score of all features, in order to estimate each feature's ability to predict the label.
   
    Read More...
   
  
    Conditions Summary
   
  
      Status
     
     
      Condition
     
     
      More Info
     
    
       ✓
      
     
      Train-Test features' Predictive Power Score difference is less than 0.2
     
     
      Passed for 14 relevant columns
     
    
       ✓
      
     
      Train features' Predictive Power Score is less than 0.7
     
     
      Passed for 14 relevant columns
     
    
    Additional Outputs
   
  
   The Predictive Power Score (PPS) is used to estimate the ability of a feature to predict the label by itself. (Read more about
   
    Predictive Power Score
   
   )
  
  
    In the graph above
   
   , we should suspect we have problems in our data if:
  
  
   1.
   
    Train dataset PPS values are high
   
   :
  
  
   Can indicate that this feature's success in predicting the label is actually due to data leakage,
  
  
   meaning that the feature holds information that is based on the label to begin with.
  
  
   2.
   
    Large difference between train and test PPS
   
   (train PPS is larger):
  
  
   An even more powerful indication of data leakage, as a feature that was powerful in train but not in test
  
  
   can be explained by leakage in train that is not relevant to a new dataset.
  
  
   3.
   
    Large difference between test and train PPS
   
   (test PPS is larger):
  
  
   An anomalous value, could indicate drift in test dataset that caused a coincidental correlation to the target label.
  
  
    Train Test Feature Drift
   
  
   Calculate drift between train dataset and test dataset per feature, using statistical measures.
   
    Read More...
   
  
    Conditions Summary
   
  
      Status
     
     
      Condition
     
     
      More Info
     
    
       ✓
      
     
      categorical drift score < 0.2 and numerical drift score < 0.1
     
     
      Passed for 14 columns out of 14 columns.
Found column "relationship" has the highest categorical drift score: 4.25E-3
Found column "hours-per-week" has the highest numerical drift score: 4.24E-3
     
    
    Additional Outputs
   
  
    The Drift score is a measure for the difference between two distributions, in this check - the test
                and train distributions.
    

    The check shows the drift score and distributions for the features, sorted
                by the sum of the drift score and the feature importance and showing only the top 5 features, according to the sum of the drift score and the feature importance.
   
  
   For discrete distribution plots, showing the top 10 categories with largest difference between train and test.
  
  
   If available, the plot titles also show the feature importance (FI) rank
  
  
    Train Test Label Drift
   
  
   Calculate label drift between train dataset and test dataset, using statistical measures.
   
    Read More...
   
  
    Conditions Summary
   
  
      Status
     
     
      Condition
     
     
      More Info
     
    
       ✓
      
     
      categorical drift score < 0.2 and numerical drift score < 0.1 for label drift
     
     
      Label's drift score Cramer's V is 2.16E-3
     
    
    Additional Outputs
   
  
    The Drift score is a measure for the difference between two distributions, in this check - the test
                and train distributions.
    

    The check shows the drift score and distributions for the label.
   
  
   For discrete distribution plots, showing the top 10 categories with largest difference between train and test.
  
  
    Check Without Conditions Output
   

    Other Checks That Weren't Displayed
   

      Check
     
     
      Reason
     
    
      Date Train Test Leakage Duplicates
     
     
      DatasetValidationError: Dataset does not contain a datetime. see
      
       Dataset docs
      
     
      Date Train Test Leakage Overlap
     
     
      DatasetValidationError: Dataset does not contain a datetime. see
      
       Dataset docs
      
     
      Index Train Test Leakage
     
     
      DatasetValidationError: Dataset does not contain an index. see
      
       Dataset docs
      
     
      New Label Train Test
     
     
      Nothing found
     
    
      String Mismatch Comparison
     
     
      Nothing found
     
    
      Multivariate Drift
     
     
      Nothing found
Status	Check	Condition	More Info
✓	Datasets Size Comparison	Test-Train size ratio is greater than 0.01	Test-Train size ratio is 0.5
✓	New Label Train Test	Number of new label values is less or equal to 0	No new labels found
✓	Category Mismatch Train Test	Ratio of samples with a new category is less or equal to 0%	Passed for 8 relevant columns
✓	String Mismatch Comparison	No new variants allowed in test data	Passed for 9 relevant columns
✓	Train Test Samples Mix	Percentage of test data samples that appear in train data is less or equal to 10%	Percent of test data samples that appear in train data: 0.14%
✓	Feature Label Correlation Change	Train-Test features' Predictive Power Score difference is less than 0.2	Passed for 14 relevant columns
✓	Feature Label Correlation Change	Train features' Predictive Power Score is less than 0.7	Passed for 14 relevant columns
✓	Train Test Feature Drift	categorical drift score < 0.2 and numerical drift score < 0.1	Passed for 14 columns out of 14 columns. Found column "relationship" has the highest categorical drift score: 4.25E-3 Found column "hours-per-week" has the highest numerical drift score: 4.24E-3
✓	Train Test Label Drift	categorical drift score < 0.2 and numerical drift score < 0.1 for label drift	Label's drift score Cramer's V is 2.16E-3
✓	Multivariate Drift	Drift value is less than 0.25	Found drift value of: 4.21E-3, corresponding to a domain classifier AUC of: 0.5
	Number of new categories	Percent of new categories in sample	Feature importance	New categories examples
Column
workclass	0	0%	0.00	[]
marital-status	0	0%	0.14	[]
native-country	0	0%	0.00	[]
relationship	0	0%	0.11	[]
education	0	0%	-0.00	[]
	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	hours-per-week	native-country	income
Train indices: 24667 Test indices: 4152	17.00	Private	153021.00	12th	8.00	Never-married	Sales	Own-child	White	Female	20.00	United-States	<=50K
Train indices: 30345 Test indices: 10826	23.00	Private	250630.00	Bachelors	13.00	Never-married	Sales	Not-in-family	White	Female	40.00	United-States	<=50K
Train indices: 17867 Test indices: 13504	45.00	Private	82797.00	Bachelors	13.00	Married-civ-spouse	Exec-managerial	Husband	White	Male	45.00	United-States	>50K
Train indices: 20486 Test indices: 14838	43.00	Private	195258.00	HS-grad	9.00	Married-civ-spouse	Craft-repair	Husband	White	Male	40.00	United-States	>50K
Train indices: 3445 Test indices: 5907	41.00	Private	116391.00	Bachelors	13.00	Married-civ-spouse	Exec-managerial	Husband	White	Male	40.00	United-States	>50K
Train indices: 2195 Test indices: 12488	39.00	Private	184659.00	HS-grad	9.00	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	40.00	United-States	<=50K
Train indices: 14581 Test indices: 14487	31.00	Private	228873.00	HS-grad	9.00	Married-civ-spouse	Craft-repair	Husband	White	Male	40.00	United-States	<=50K
Train indices: 21974 Test indices: 7350	30.00	Private	111567.00	HS-grad	9.00	Never-married	Craft-repair	Own-child	White	Male	48.00	United-States	<=50K
Train indices: 8908 Test indices: 5078	29.00	?	41281.00	Bachelors	13.00	Married-spouse-absent	?	Not-in-family	White	Male	50.00	United-States	<=50K
Train indices: 4325, 4881 Test indices: 14308	25.00	Private	308144.00	Bachelors	13.00	Never-married	Craft-repair	Not-in-family	White	Male	40.00	Mexico	<=50K
Check	Reason
Date Train Test Leakage Duplicates	DatasetValidationError: Dataset does not contain a datetime. see Dataset docs
Date Train Test Leakage Overlap	DatasetValidationError: Dataset does not contain a datetime. see Dataset docs
Index Train Test Leakage	DatasetValidationError: Dataset does not contain an index. see Dataset docs
New Label Train Test	Nothing found
String Mismatch Comparison	Nothing found
Multivariate Drift	Nothing found