Skip to content

Instantly share code, notes, and snippets.

View statguy's full-sized avatar

Jussi Jousimo statguy

  • Bangkok, Thailand
View GitHub Profile
@thistleknot
thistleknot / random_feature_selection.py
Last active January 5, 2023 14:27
quick FEATURE SELECTION
"""
Based on Damien Benveniste, PhD 'quick Feature Selection' method
original post: https://lnkd.in/gCDSEJcF
quick FEATURE SELECTION
train a Supervised Learning algorithm with a Feature Importance measure
This is also a method that can be used for highly non-linear data as opposed to LASSO (for example) that tends to only understand linear relationships in the data. The random feature is a "Random Bar" because this is the minimum bar a feature needs to beat to be a part of the potentially useful features set. Now it doesn't mean there are not additional features that could be beneficial to further remove to optimize your model.
This is a technique I like to perform a quick FEATURE SELECTION for Machine Learning applications. I tend to call it the "Random Bar" method! Let's assume you have a feature set X and a target Y. Let's create a random vector V (for example np.random.normal(size=(1, 100))) and append that vector as a new feature to X: