Skip to content

Instantly share code, notes, and snippets.

@shilad
Last active November 26, 2017 20:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shilad/d67e306cf145c9c9de309e9ee713af4a to your computer and use it in GitHub Desktop.
Save shilad/d67e306cf145c9c9de309e9ee713af4a to your computer and use it in GitHub Desktop.
#!/usr/bin/env bash
START_DATE="2017-09-01"
END_DATE="2017-10-01"
beg=${START_DATE}
while [ "$beg" != ${END_DATE} ]; do
end=$(date -I -d "$beg + 1 day")
echo "LOADING SESSIONS FOR $beg to $end"
spark-submit \
--master yarn \
--driver-memory 12G \
--executor-memory 8G \
--driver-cores 2 \
--executor-cores 2 \
--queue nice \
--conf spark.dynamicAllocation.maxExecutors=32 \
--conf spark.kryoserializer.buffer.max=1024m \
--conf spark.kryoserializer.buffer.max=256m \
--class org.wikimedia.analytics.refinery.job.pagesessions.SessionPagesJobRunner \
refinery-job/target/refinery-job-0.0.55-SNAPSHOT.jar \
--snapshot 2017-09 \
--sitelinks-table shilad.sitelinks \
--output-table shilad.sessions \
--begin ${beg}T00:00:00 \
--end ${end}T00:00:00
beg=${end}
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment