Skip to content

Instantly share code, notes, and snippets.

@sengstacken
Last active September 23, 2021 19:27
Show Gist options
  • Save sengstacken/87f894f27b1d2823e3778ecd9e4e0e70 to your computer and use it in GitHub Desktop.
Save sengstacken/87f894f27b1d2823e3778ecd9e4e0e70 to your computer and use it in GitHub Desktop.
code to perform the train test validation split on a pandas dataframe for a time sequence
trainpct = 0.7
trainidx = int(np.round(len(df)*trainpct))
train_df = df.iloc[0:trainidx,:]
valpct = 0.2
validx = int(np.round(len(df)*(trainpct+valpct)))
val_df = df.iloc[trainidx:validx,:]
test_df = df.iloc[validx::,:]
# split data using stratified folds
train_df, temp_df = train_test_split(
df,
test_size=(args.val_split+args.test_split),
random_state=4321,
shuffle=True,
stratify=df['Cover_Type']
)
val_df, test_df = train_test_split(
temp_df,
test_size=(args.test_split/(args.val_split+args.test_split)),
random_state=4321,
shuffle=True,
stratify=temp_df['Cover_Type']
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment