Skip to content

Instantly share code, notes, and snippets.

@kadnan
Created September 23, 2016 05:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kadnan/27e75a16b6f5f93e5dcbdca91ba84267 to your computer and use it in GitHub Desktop.
Save kadnan/27e75a16b6f5f93e5dcbdca91ba84267 to your computer and use it in GitHub Desktop.
Transformation of data for training and test data
import pandas as pd
file_name = 'data_england_test.csv'
fields = ['Result','Toss','Bat']
df = pd.read_csv(file_name,skipinitialspace=True,usecols=fields)
df.to_csv('data_england_test_filter.csv',index=False)
# Convert features and labels into digits
df_replace = df.replace(['lost','draw','won','1st','2nd'],[-1,0,1,-1,1])
dataset_length = len(df_replace)
# 67% of training data
ratio = 0.67
train_data_df = df_replace[:round(dataset_length*ratio)] # first 67% of data
test_data_df = df_replace[-(1-round(dataset_length*ratio)):] # rest for testing
# Create Respected CSV
train_data_df.to_csv('train.csv',index=False)
test_data_df.to_csv('test.csv',index=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment