Skip to content

Instantly share code, notes, and snippets.

@mohammedri
Created October 25, 2018 01:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mohammedri/74867eef9bbfe212dee420267158b5fa to your computer and use it in GitHub Desktop.
Save mohammedri/74867eef9bbfe212dee420267158b5fa to your computer and use it in GitHub Desktop.
A python to split a pandas dataset into a test sample and a training sample given a ratio
import numpy as np
import pandas as pd
def load_data(path):
return pd.read_csv(csv_path)
def split_dataset_into_train_test(data, test_ratio):
shuffler = np.random.permutation(len(data))
test_set_size = int(len(data)*test_ratio)
test_indices = shuffler[:test_set_size]
train_indices = shuffler[test_set_size:]
return data.iloc[train_indices], data.iloc[test_indices]
train_set, test_set = split_dataset_into_train_test(load_data("sample.csv"), 0.2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment