Skip to content

Instantly share code, notes, and snippets.

@acmiyaguchi
Last active September 18, 2017 20:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save acmiyaguchi/8fb19ea223e2e400439fa74ae1906afb to your computer and use it in GitHub Desktop.
Save acmiyaguchi/8fb19ea223e2e400439fa74ae1906afb to your computer and use it in GitHub Desktop.
1 Day Retention v2
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
#!/bin/bash
if [[ -z "$bucket" ]]; then
echo "Missing arguments!" 1>&2
exit 1
fi
cd /tmp
git clone https://github.com/mozilla/telemetry-batch-view.git
cd telemetry-batch-view
sbt assembly
group=$(cat << END
subsession_start,cohort_date,day_number,channel,current_version,country,distribution_id,is_funnelcake
END
)
echo $group
spark-submit --master yarn \
--deploy-mode client \
--class com.mozilla.telemetry.views.GenericCountView \
target/scala-2.11/telemetry-batch-view-1.1.jar \
--files "s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/retention_intermediate/cleaned/v2" \
--submission-date-col "submission_date" \
--count-column "client_id" \
--select "*, date_format(subsession_start, 'yyyyMMdd') as submission_date" \
--grouping-columns "$group" \
--where "client_id IS NOT NULL" \
--output "$bucket/retention_dev" \
--version "v2"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment