-
-
Save vitillo/73851c25ef33987e24bbc175808fc3d4 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- RV: if we are using only 1% of users could we use the longitudinal dataset for this query? | |
WITH sample AS ( | |
SELECT client_id, | |
date_parse(submission_date_s3, '%Y%m%d') as normalized_submission_date, | |
popup_notification_stats['web-notifications'].action_1 AS push_acceptance, | |
web_notification_shown | |
FROM main_summary | |
WHERE submission_date_s3 > date_format(CURRENT_DATE - interval '6' MONTH, '%Y%m%d') | |
AND submission_date_s3 < date_format(CURRENT_DATE, '%Y%m%d') | |
AND sample_id='7'-- RV: please use sample_id 42 which is the id we are using for the longitudinal dataset | |
), | |
push_accepters AS ( | |
SELECT DISTINCT client_id, | |
normalized_submission_date AS eligibility_start, | |
normalized_submission_date + interval '30' DAY AS eligibility_end | |
FROM sample | |
WHERE push_acceptance > 0 -- RV: why do we need to check explicitly for push acceptance? I.e. can’t we infer push acceptance from a push notification and treat push notifications and acceptance more generally as push events? | |
), | |
combined_push_users AS ( | |
SELECT DISTINCT p.client_id, | |
eligibility_start, | |
eligibility_end | |
FROM sample s | |
INNER JOIN push_accepters p ON s.client_id=p.client_id | |
WHERE normalized_submission_date BETWEEN eligibility_start AND eligibility_end | |
AND web_notification_shown > 0 -- RV: it seems that a user who accepted the push permission dialog 31 days ago and had the first push notification today is not considered a Push user, which doesn't look quite right to me | |
), | |
all_dau AS ( | |
SELECT DISTINCT client_id, | |
normalized_submission_date AS activity_date -- RV: submission_date != activity_date; activity date is the moment in time when an activity took place while submission date is the moment in time when our servers received a telemetry submission | |
FROM sample | |
), | |
push_dau AS ( | |
-- Presto can't do RANGE window functions nor can it do count distinct in a window, so the array_agg here and in push_mau is a workaround hack. | |
-- I feel dirty. | |
SELECT count(DISTINCT p.client_id) AS dau, | |
array_agg(p.client_id) AS client_ids, | |
activity_date | |
FROM all_dau d | |
INNER JOIN combined_push_users p ON d.client_id = p.client_id | |
WHERE activity_date BETWEEN eligibility_start AND eligibility_end | |
GROUP BY 3 | |
), | |
push_mau AS ( | |
-- I'm sorry. | |
SELECT cardinality(array_distinct(flatten(array_agg(client_ids) over (ORDER BY activity_date ROWS BETWEEN 27 PRECEDING AND 0 FOLLOWING)))) AS mau, | |
activity_date | |
FROM push_dau | |
), | |
smoothed_dau AS ( | |
SELECT activity_date, | |
avg(dau) OVER (ORDER BY activity_date ROWS BETWEEN 6 PRECEDING AND 0 FOLLOWING) AS smoothed_dau | |
FROM push_dau | |
) | |
SELECT mau * 100 AS mau, | |
dau * 100 AS dau, | |
smoothed_dau * 100 AS smoothed_dau, | |
m.activity_date, | |
smoothed_dau/mau AS ER | |
FROM push_mau m | |
JOIN push_dau d ON m.activity_date=d.activity_date | |
JOIN smoothed_dau s ON m.activity_date=s.activity_date | |
WHERE m.activity_date > (CURRENT_DATE - interval '6' MONTH + interval '30' DAY) | |
ORDER BY m.activity_date |
Re: using longitudinal, not sure if you caught the discussion in IRC but my understanding of the longitudinal dataset is that it's not partitioned by date, and doing any date-based manipulation looks like it'd be difficult (even looking up the date that someone had a non-zero web_notification_shown looks like it'd require jumping through a lot of hoops)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Re: submission_date vs activity_date, is there a standard that's usually used for ER? Wasn't sure what best practice was for this given clients' clock skew. I figured if I used Submission Date it'd just even out (i.e. the number of clients submitting on a different date than their activity date would be consistent over time.)