Skip to content

Instantly share code, notes, and snippets.

@vidit0210
Created March 30, 2020 08:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vidit0210/ce2a874d42d67c15708a0c4e278d3dff to your computer and use it in GitHub Desktop.
Save vidit0210/ce2a874d42d67c15708a0c4e278d3dff to your computer and use it in GitHub Desktop.
video Link : https://www.youtube.com/watch?v=qMEtqJPhqpA
Julien Simon
-----
import pandas as pd
data = pd.read_csv('Your CSV File')
pd.set_option('display.max_columns', 500) # Make sure we can see all of the columns
pd.set_option('display.max_rows', 50) # Keep the output on one page
data[:10]
-----
Split The Dataset
-----
import numpy as np
train_data, test_data, _ = np.split(data.sample(frac=1, random_state=123),
[int(0.95 * len(data)), int(len(data))])
# Save to CSV files
train_data.to_csv('automl-train.csv', index=False, header=True, sep=',') # Need to keep column names
test_data.to_csv('automl-test.csv', index=False, header=True, sep=',')
------
%%sh
ls -l *.csv
-----
Upload Data to S3
----
import sagemaker
prefix = 'sagemaker/DEMO-automl-dm/input'
sess = sagemaker.Session()
uri = sess.upload_data(path="automl-train.csv", key_prefix=prefix)
print(uri)
---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment