Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
sf_crime_3.py
random_state = 42
train = pd.read_csv("/datasets/s3-data-bucket/train.csv")
train.drop_duplicates(inplace=True)
train.reset_index(inplace=True, drop=True)
print(f"Loaded the dataset of {train.shape[1]}-D features")
test = pd.read_csv("/datasets/s3-data-bucket/test.csv", index_col='Id')
print(f"# train examples: {len(train)}\n# test examples: {len(test)}")
del test
# remove from clear outliers from the data set, allowing fast.ai to impute the values via `FillMissing` later on
train.replace({'X': -120.5, 'Y': 90.0}, np.NaN, inplace=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment