Skip to content

Instantly share code, notes, and snippets.

@bvaradar
Created April 25, 2019 01:36
Show Gist options
  • Save bvaradar/6eba5c0fab7a0ab630206d64367d6cde to your computer and use it in GitHub Desktop.
Save bvaradar/6eba5c0fab7a0ab630206d64367d6cde to your computer and use it in GitHub Desktop.
#Apply patch to add DFS properties
https://github.com/bvaradar/hudi/commit/a4f79a7ab6955503e3cca0a36876305a544991ee
Instead of Step (1) in demo
varadarb-C02SH0P1G8WL:hudi varadarb$ docker exec -it adhoc-2 /bin/bash
# Creating DFS Root Directory
root@adhoc-2:/opt#hadoop fs -mkdir -p /var/data/input_batch/
## Copy batch_1 to DFS
root@adhoc-2:/opt# hadoop fs -copyFromLocal /var/hoodie/ws/docker/demo/data/batch_1.json /var/data/input_batch/.
## FOr delta-streamer invocations, change --source-class and --props
spark-submit --class com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE --storage-type COPY_ON_WRITE --source-class com.uber.hoodie.utilities.sources.JsonDFSSource --source-ordering-field ts --target-base-path /user/hive/warehouse/stock_ticks_cow --target-table stock_ticks_cow --props /var/demo/config/dfs-source.properties --schemaprovider-class com.uber.hoodie.utilities.schema.FilebasedSchemaProvider
# Do similar changes instead of Step 5 http://hudi.incubator.apache.org/docker_demo.html#step-5-upload-second-batch-to-kafka-and-run-deltastreamer-to-ingest
## Copy batch_2 to DFS
root@adhoc-2:/opt# hadoop fs -copyFromLocal /var/hoodie/ws/docker/demo/data/batch_2.json /var/data/input_batch/.
## Similar changes to delta-streamer invocations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment