Skip to content

Instantly share code, notes, and snippets.

@neomatrix369
Last active June 18, 2020 15:55
Show Gist options
  • Save neomatrix369/6600a1afe6dda9316ab38cca54dfad1c to your computer and use it in GitHub Desktop.
Save neomatrix369/6600a1afe6dda9316ab38cca54dfad1c to your computer and use it in GitHub Desktop.
Example of how to improve confidence in CV scores: using bagging, folds and using unique seed per bagging round, same seed per fold
bagging_count = 5
folds = 5
SEEDS = [1234, 4567, 8910, 1112, 1314] # arbitary seeds can be any non-repeating series of integers
for bagging_index in range(0, bagging_count): # 5 rounds
stf = StratifiedKFold(n_splits=folds, shuffle=True, random_state=SEEDS[bagging_index])
for fold, (training_index, validation_index) in skf.split(train_df, train_df.sentiment): # 5 rounds
<rest of the training code>
# seeds used for each bagging round: 1234, 4567, 8910, 1112, 1314
# or
bagging_count = 5
folds = 5
SEED = 1234
for bagging_index in range(0, bagging_count): # 5 rounds
stf = StratifiedKFold(n_splits=folds, shuffle=True, random_state=SEED + bagging_index)
for fold, (training_index, validation_index) in skf.split(train_df, train_df.sentiment): # 5 rounds
<rest of the training code>
# seeds used for each bagging round: 1234, 1235, 1236, 1237, 1238
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment