-
-
Save nvquanghuy/bd0fda7b88e5b2fd8e46e047e391d25f to your computer and use it in GitHub Desktop.
-- Demonstration on how Cohort Analysis is done. In order to do this we just need 2 tables: | |
-- users (id, timestamp) | |
-- activities (user_id, timestamp) | |
-- The below example use the following table names and fields, feel free to replace them with yours. | |
-- - users (id, created_at) | |
-- - activities (user_id, created_at) | |
Demo: https://demo.holistics.io/queries/11118-buyer-cohort-retention-analysis | |
-- MONTH_DIFF is a custom Postgres function that calculate the number difference between 2 months. | |
-- (user_id, cohort_month), each | |
with cohort_items as ( | |
select | |
date_trunc('month', U.created_at)::date as cohort_month, | |
id as user_id | |
from public.users U | |
order by 1, 2 | |
), | |
-- (user_id, month_number): user X has activity in month number X | |
user_activities as ( | |
select | |
A.user_id, | |
MONTH_DIFF( | |
date_trunc('month', A.created_at)::date, | |
C.cohort_month | |
) as month_number | |
from public.activities A | |
left join cohort_items C ON A.user_id = C.user_id | |
group by 1, 2 | |
), | |
-- (cohort_month, size) | |
cohort_size as ( | |
select cohort_month, count(1) as num_users | |
from cohort_items | |
group by 1 | |
order by 1 | |
), | |
-- (cohort_month, month_number, cnt) | |
B as ( | |
select | |
C.cohort_month, | |
A.month_number, | |
count(1) as num_users | |
from user_activities A | |
left join cohort_items C ON A.user_id = C.user_id | |
group by 1, 2 | |
) | |
-- our final value: (cohort_month, size, month_number, percentage) | |
select | |
B.cohort_month, | |
S.num_users as total_users, | |
B.month_number, | |
B.num_users::float * 100 / S.num_users as percentage | |
from B | |
left join cohort_size S ON B.cohort_month = S.cohort_month | |
order by 1, 3 | |
where B.cohort_month IS NOT NULL |
+1 on the MONTH_DIFF
CREATE FUNCTION MONTH_DIFF (t_start timestamp, t_end timestamp)
RETURNS integer
AS $$
SELECT
(12 * extract('years' FROM a.i) + extract('months' FROM a.i))::integer
FROM (
values(justify_interval($2 - $1))) AS a (i)
$$
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
In Postgre, I think we can use below built-in function as well, instead of MONTH_DIFF. Please correct me if I'm wrong.
date_part('month', age(date_trunc('month', A.transaction_date)::date, C.cohort_month)) as month_number
+1 on one of the best explanation about Cohort out there.
+1 on the best explanation about Cohort using SQL. I was looking to implement one for our project next week - will follow these guidelines. Amazing find !!
+1 for the great explanation of cohort analysis using SQL. I am not able to create this custom function in Metabase, can you please help me with an alternate solution to the function the postgre one is not giving the correct output.Thanks!
Great explanation on cohort analysis using SQL!
But this query will not add those users in total_users result who only just signed_up but not performed any activity after signed up.
great job, thank you! maybe some hints on building a "rolling" version ?
what is in the activities table?
Can you post
MONTH_DIFF
function as well?