Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
sample_df = con_df.sample(0.05)
train_df, test_df, validate_df = sample_df.randomSplit([0.8, 0.1, 0.1], seed=12345)
train_df.write.mode('overwrite').parquet(AWS_BUCKET_NAME+"/taxi_fare_feature_eng_train_sample1")
validate_df.write.mode('overwrite').parquet(AWS_BUCKET_NAME+"/taxi_fare_feature_eng_validate_sample1")
test_df.write.mode('overwrite').parquet(AWS_BUCKET_NAME+"/taxi_fare_feature_eng_test_sample1")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment