Skip to content

Instantly share code, notes, and snippets.

@kovid-r
Created June 11, 2020 10:21
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kovid-r/f14f379ffc51af91905022a91d6529b3 to your computer and use it in GitHub Desktop.
Save kovid-r/f14f379ffc51af91905022a91d6529b3 to your computer and use it in GitHub Desktop.
Writing Files PySpark Cheatsheet
# Write file to disk in parquet format partitioned by year - overwrite any existing file
df.write.partitionBy('year').format('parquet').mode('overwrite').save(parquet_file_path)
# Write file to disk in parquet format partitioned by year - append to an existing file
df.write.partitionBy('year').format('parquet').mode('append').save(parquet_file_path)
# Write data frame as a Hive table
df.write.bucketBy(10, "year").sortBy("avg_ratings").saveAsTable("films_bucketed")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment