Skip to content

Instantly share code, notes, and snippets.

@fabclmnt
Created September 23, 2021 01:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fabclmnt/4c30b8aeab8defec7235cc715ad8c700 to your computer and use it in GitHub Desktop.
Save fabclmnt/4c30b8aeab8defec7235cc715ad8c700 to your computer and use it in GitHub Desktop.
Warnings:
TOTAL: 5 warning(s)
Priority 1: 1 warning(s)
Priority 2: 4 warning(s)
Priority 1 - heavy impact expected:
* [DUPLICATES - DUPLICATE COLUMNS] Found 1 columns with exactly the same feature values as other columns.
Priority 2 - usage allowed, limited human intelligibility:
* [DATA RELATIONS - HIGH COLLINEARITY - NUMERICAL] Found 3 numerical variables with high Variance Inflation Factor (VIF>5.0). The variables listed in results are highly collinear with other variables in the dataset. These will make model explainability harder and potentially give way to issues like overfitting. Depending on your end goal you might want to remove the highest VIF variables.
* [ERRONEOUS DATA - PREDEFINED ERRONEOUS DATA] Found 1960 ED values in the dataset.
* [DATA RELATIONS - HIGH COLLINEARITY - CATEGORICAL] Found 10 categorical variables with significant collinearity (p-value < 0.05). The variables listed in results are highly collinear with other variables in the dataset and sorted descending according to propensity. These will make model explainability harder and potentially give way to issues like overfitting. Depending on your end goal you might want to remove variables following the provided order.
* [DUPLICATES - EXACT DUPLICATES] Found 3 instances with exact duplicate feature values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment