Skip to content

Instantly share code, notes, and snippets.

@adimyth
Created January 19, 2021 11:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save adimyth/732e8affc543bc2c71d1a5b268edd28d to your computer and use it in GitHub Desktop.
Save adimyth/732e8affc543bc2c71d1a5b268edd28d to your computer and use it in GitHub Desktop.
StratifiedKFold Split for Regression Task
import numpy as np
import pandas as pd
from sklearn.model_selection import StratifiedKFold
df = pd.read_csv(path_to_data)
n_bins = 1+np.log2(df.shape[0]) # Sturge's rule
df["bins"] = pd.cut(df.target, n_bins, labels=False)
n_folds = 5
skf = StratifiedKFold(n_splits=n_folds)
df["fold"] = -1
for fold, (train_idx, valid_idx) in enumerate(skf.split(df.bins, df.bins)):
df.loc[valid_idx, "fold"] = fold
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment