Skip to content

Instantly share code, notes, and snippets.

@priyanlc
Created April 30, 2020 21:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save priyanlc/b2fcc006cf6314a25888a65b474d7b67 to your computer and use it in GitHub Desktop.
Save priyanlc/b2fcc006cf6314a25888a65b474d7b67 to your computer and use it in GitHub Desktop.
sample_df = con_df.sample(0.05)
train_df, test_df, validate_df = sample_df.randomSplit([0.8, 0.1, 0.1], seed=12345)
train_df.write.mode('overwrite').parquet(AWS_BUCKET_NAME+"/taxi_fare_feature_eng_train_sample1")
validate_df.write.mode('overwrite').parquet(AWS_BUCKET_NAME+"/taxi_fare_feature_eng_validate_sample1")
test_df.write.mode('overwrite').parquet(AWS_BUCKET_NAME+"/taxi_fare_feature_eng_test_sample1")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment