Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ismailsimsek/33c55d8e1fcfc79160483c38a978edbd to your computer and use it in GitHub Desktop.
Save ismailsimsek/33c55d8e1fcfc79160483c38a978edbd to your computer and use it in GitHub Desktop.
Spark Custom FileOutputCommitter for concurent jobs to write same destination

Following custom file commiter, enables concurently spark processes to save data to same destination.

for each spark execution/process provide different pending.dir

# enable 
spark.sql.parquet.output.committer.class=io.debezium.server.batch.spark.ParquetOutputCommitterV2
# provide custom pending.dir
mapreduce.fileoutputcommitter.pending.dir=_temporary
mapreduce.fileoutputcommitter.pending.dir=_temporary2
mapreduce.fileoutputcommitter.pending.dir=_temporary3
@steveloughran
Copy link

FWIW the committer factory in the cloud storage module lets you change to a new committer back end without having to patch FileOutputCommitter...just clone it and reference it in the relevant configs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment