Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ad1happy2go/6185d594b5a983c18141002201ad1836 to your computer and use it in GitHub Desktop.
Save ad1happy2go/6185d594b5a983c18141002201ad1836 to your computer and use it in GitHub Desktop.
~/spark/spark-3.2.3-bin-hadoop3.2/bin/spark-submit \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
~/spark/code/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.13.0.jar --table-type MERGE_ON_READ \
--source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
--source-ordering-field ts \
--target-base-path /tmp/stock_ticks_cow \
--target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
--continuous \
--min-sync-interval-seconds 5
~/spark/spark-3.2.3-bin-hadoop3.2/bin/spark-submit \
--master local --class org.apache.hudi.utilities.HoodieCleaner \
~/spark/code/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.13.0.jar \
--target-base-path "/tmp/stock_ticks_cow" \
--hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS \
--hoodie-conf hoodie.cleaner.commits.retained=2 \
--hoodie-conf hoodie.cleaner.parallelism=200
--------------------SPARK SHELL----------------
~/spark/spark-3.2.3-bin-hadoop3.2/bin/spark-shell --jars packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.12-0.13.0.jar \
--conf 'spark.sql.filesourceTableRelationCacheSize=0' \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \
--conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment