Skip to content

Instantly share code, notes, and snippets.

@alejio
Last active July 6, 2019 20:05
Show Gist options
  • Save alejio/3b5ae8249ccfb28c92d91e5f0b0c8ccd to your computer and use it in GitHub Desktop.
Save alejio/3b5ae8249ccfb28c92d91e5f0b0c8ccd to your computer and use it in GitHub Desktop.
Simple train test splitting in pandas
import pandas as pd
import numpy as np
def split_train_test(df_in, test_ratio):
# Reserve an out-of-place sample for final evaluation
# Could have used train_test_split from sklearn
df = df_in.copy(deep=True)
np.random.seed(42)
shuffled_indices = np.random.permutation(len(df))
test_set_size = int(len(df) * test_ratio)
test_indices = shuffled_indices[:test_set_size]
train_indices = shuffled_indices[test_set_size:]
return df.iloc[train_indices], df.iloc[test_indices]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment