Skip to content

Instantly share code, notes, and snippets.

@ottomata
Last active May 8, 2019 13:57
Show Gist options
  • Save ottomata/66520f352ad00a5ce263e1fd47572864 to your computer and use it in GitHub Desktop.
Save ottomata/66520f352ad00a5ce263e1fd47572864 to your computer and use it in GitHub Desktop.
/usr/bin/spark2-submit \
--name otto_test_refine_eventlogging_0 \
--class org.wikimedia.analytics.refinery.job.refine.Refine \
--master yarn \
--deploy-mode client \
--conf spark.driver.extraClassPath=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/srv/deployment/analytics/refinery/artifacts/hive-jdbc-1.1.0-cdh5.10.0.jar:/srv/deployment/analytics/refinery/artifacts/hive-service-1.1.0-cdh5.10.0.jar \
--driver-java-options='-Drefine.log.level=DEBUG -Drefinery.log.level=DEBUG -Dhttp.proxyHost=webproxy.eqiad.wmnet -Dhttp.proxyPort=8080 -Dhttps.proxyHost=webproxy.eqiad.wmnet -Dhttps.proxyPort=8080' \
/home/otto/refinery-source/refinery-job/target/refinery-job-0.0.89-SNAPSHOT.jar \
--database=otto_json_refine_test \
--hive_server_url=an-coord1001.eqiad.wmnet:10000 \
--input_path=/wmf/data/raw/eventlogging \
--input_path_regex='eventlogging_(.+)/hourly/(\d+)/(\d+)/(\d+)/(\d+)' \
--input_path_regex_capture_groups='table,year,month,day,hour' \
--output_path=/user/otto/external/eventlogging13 \
--table_blacklist_regex='^Edit|ChangesListHighlights|InputDeviceDynamics$' \
--transform_functions=org.wikimedia.analytics.refinery.job.refine.deduplicate_eventlogging,org.wikimedia.analytics.refinery.job.refine.geocode_ip \
--schema_base_uri=eventlogging \
--since=12 --until=11
/usr/bin/spark2-submit \
--name otto_test_refine_0 \
--class org.wikimedia.analytics.refinery.job.refine.Refine \
--files /etc/hive/conf/hive-site.xml,/srv/deployment/analytics/refinery/artifacts/hive-jdbc-1.1.0-cdh5.10.0.jar,/srv/deployment/analytics/refinery/artifacts/hive-service-1.1.0-cdh5.10.0.jar --master yarn --deploy-mode cluster --conf spark.driver.extraClassPath=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:hive-jdbc-1.1.0-cdh5.10.0.jar:hive-service-1.1.0-cdh5.10.0.jar \
--driver-java-options='-Drefine.log.level=DEBUG -Drefinery.log.level=DEBUG -Dhttp.proxyHost=webproxy.eqiad.wmnet -Dhttp.proxyPort=8080 -Dhttps.proxyHost=webproxy.eqiad.wmnet -Dhttps.proxyPort=8080' \
~/refinery-source/refinery-job/target/refinery-job-0.0.86-SNAPSHOT.jar \
--database=otto \
--hive_server_url=an-coord1001.eqiad.wmnet:10000 \
--input_path=/wmf/data/raw/eventlogging \
--input_path_regex='eventlogging_(.+)/hourly/(\d+)/(\d+)/(\d+)/(\d+)' \
--input_path_regex_capture_groups='table,year,month,day,hour' \
--output_path=/user/otto/external/eventlogging8 \
--since=10 \
--table_whitelist_regex=^NavigationTiming$ \
--transform_functions=org.wikimedia.analytics.refinery.job.refine.deduplicate_eventlogging,org.wikimedia.analytics.refinery.job.refine.geocode_ip \
--ignore_failure_flag=true \
--schema_base_uri=eventlogging --until=5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment