Skip to content

Instantly share code, notes, and snippets.

@garystafford
Last active February 28, 2023 00:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save garystafford/8cc95a4549e69ae7b61fd31ecd7bcabf to your computer and use it in GitHub Desktop.
Save garystafford/8cc95a4549e69ae7b61fd31ecd7bcabf to your computer and use it in GitHub Desktop.
DATA_LAKE_BUCKET="<your_data_lake_s3_bucket>"
TARGET_TABLE="tickit.ecomm.sale"
spark-submit \
--name %{TARGET_TABLE} \
--jars /usr/lib/spark/jars/spark-avro.jar,/usr/lib/hudi/hudi-utilities-bundle.jar \
--conf spark.sql.catalogImplementation=hive \
--conf spark.yarn.submit.waitAppCompletion=false \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls /usr/lib/hudi/hudi-utilities-bundle.jar` \
--props file://${PWD}/${TARGET_TABLE}.properties \
--table-type COPY_ON_WRITE \
--source-ordering-field __source_ts_ms \
--source-class org.apache.hudi.utilities.sources.AvroDFSSource \
--schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
--target-table ${TARGET_TABLE} \
--target-base-path s3://${DATA_LAKE_BUCKET}/cdc_hudi_data_lake/silver/${TARGET_TABLE}/ \
--enable-sync \
--continuous \
--op UPSERT \
> ${TARGET_TABLE}.log 2>&1 &
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment