Skip to content

Instantly share code, notes, and snippets.

@anhtranbk
Last active October 24, 2018 04:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anhtranbk/be0bfca8abee5c80029db730f83415af to your computer and use it in GitHub Desktop.
Save anhtranbk/be0bfca8abee5c80029db730f83415af to your computer and use it in GitHub Desktop.
Spark submit examples
#!/bin/bash
export SPARK_HOME=/usr/hdp/2.6.3.0-235/spark2
export OUTPUT_PATH=/user/anhtn/sampling/friends2
hdfs dfs -rm -r -f ${OUTPUT_PATH}
$SPARK_HOME/bin/spark-submit \
--master yarn \
--deploy-mode client \
--driver-memory 1g \
--executor-cores 4 \
--num-executors 3 \
--executor-memory 10g \
--conf "spark.yarn.driver.memoryOverhead=512m" \
--conf "spark.yarn.executor.memoryOverhead=2g" \
--conf "spark.network.timeout=1800s" \
--class vn.five9.bigdata.sampling.SubUserFriendSampling \
/data/spark/spark-common*.jar \
/user/anhtn/friends/parquet parquet \
"${OUTPUT_PATH}" 400
/data1/spark/spark-2.3.0-bin-hadoop2.7//bin/spark-submit \
--class com.vcc.bigdata.hbasespark.SyncProfilesv8 \
--master spark://10.5.36.185:7077 \
--executor-memory 25G \
--executor-cores 10 \
--supervise \
--num-executors 7 \
--conf spark.driver.memory=20G \
--conf spark.local.dir=/data1/datacollection/syncprofile/tmp \
--conf "spark.driver.extraJavaOptions=-Dapp.configuration=/data1/datacollection/syncprofile/config/server.properties" \
/data1/datacollection/syncprofile/*.jar t3 500 datacollection-profiles-20180925
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment