Skip to content

Instantly share code, notes, and snippets.

@vitillo

vitillo/push.sql Secret

Created October 11, 2016 13:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vitillo/73851c25ef33987e24bbc175808fc3d4 to your computer and use it in GitHub Desktop.
Save vitillo/73851c25ef33987e24bbc175808fc3d4 to your computer and use it in GitHub Desktop.
-- RV: if we are using only 1% of users could we use the longitudinal dataset for this query?
WITH sample AS (
SELECT client_id,
date_parse(submission_date_s3, '%Y%m%d') as normalized_submission_date,
popup_notification_stats['web-notifications'].action_1 AS push_acceptance,
web_notification_shown
FROM main_summary
WHERE submission_date_s3 > date_format(CURRENT_DATE - interval '6' MONTH, '%Y%m%d')
AND submission_date_s3 < date_format(CURRENT_DATE, '%Y%m%d')
AND sample_id='7'-- RV: please use sample_id 42 which is the id we are using for the longitudinal dataset
),
push_accepters AS (
SELECT DISTINCT client_id,
normalized_submission_date AS eligibility_start,
normalized_submission_date + interval '30' DAY AS eligibility_end
FROM sample
WHERE push_acceptance > 0 -- RV: why do we need to check explicitly for push acceptance? I.e. can’t we infer push acceptance from a push notification and treat push notifications and acceptance more generally as push events?
),
combined_push_users AS (
SELECT DISTINCT p.client_id,
eligibility_start,
eligibility_end
FROM sample s
INNER JOIN push_accepters p ON s.client_id=p.client_id
WHERE normalized_submission_date BETWEEN eligibility_start AND eligibility_end
AND web_notification_shown > 0 -- RV: it seems that a user who accepted the push permission dialog 31 days ago and had the first push notification today is not considered a Push user, which doesn't look quite right to me
),
all_dau AS (
SELECT DISTINCT client_id,
normalized_submission_date AS activity_date -- RV: submission_date != activity_date; activity date is the moment in time when an activity took place while submission date is the moment in time when our servers received a telemetry submission
FROM sample
),
push_dau AS (
-- Presto can't do RANGE window functions nor can it do count distinct in a window, so the array_agg here and in push_mau is a workaround hack.
-- I feel dirty.
SELECT count(DISTINCT p.client_id) AS dau,
array_agg(p.client_id) AS client_ids,
activity_date
FROM all_dau d
INNER JOIN combined_push_users p ON d.client_id = p.client_id
WHERE activity_date BETWEEN eligibility_start AND eligibility_end
GROUP BY 3
),
push_mau AS (
-- I'm sorry.
SELECT cardinality(array_distinct(flatten(array_agg(client_ids) over (ORDER BY activity_date ROWS BETWEEN 27 PRECEDING AND 0 FOLLOWING)))) AS mau,
activity_date
FROM push_dau
),
smoothed_dau AS (
SELECT activity_date,
avg(dau) OVER (ORDER BY activity_date ROWS BETWEEN 6 PRECEDING AND 0 FOLLOWING) AS smoothed_dau
FROM push_dau
)
SELECT mau * 100 AS mau,
dau * 100 AS dau,
smoothed_dau * 100 AS smoothed_dau,
m.activity_date,
smoothed_dau/mau AS ER
FROM push_mau m
JOIN push_dau d ON m.activity_date=d.activity_date
JOIN smoothed_dau s ON m.activity_date=s.activity_date
WHERE m.activity_date > (CURRENT_DATE - interval '6' MONTH + interval '30' DAY)
ORDER BY m.activity_date
@sunahsuh
Copy link

Re: submission_date vs activity_date, is there a standard that's usually used for ER? Wasn't sure what best practice was for this given clients' clock skew. I figured if I used Submission Date it'd just even out (i.e. the number of clients submitting on a different date than their activity date would be consistent over time.)

@sunahsuh
Copy link

Re: using longitudinal, not sure if you caught the discussion in IRC but my understanding of the longitudinal dataset is that it's not partitioned by date, and doing any date-based manipulation looks like it'd be difficult (even looking up the date that someone had a non-zero web_notification_shown looks like it'd require jumping through a lot of hoops)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment