Skip to content

Instantly share code, notes, and snippets.

@nvquanghuy
Last active May 18, 2023 03:38
Show Gist options
  • Star 39 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save nvquanghuy/bd0fda7b88e5b2fd8e46e047e391d25f to your computer and use it in GitHub Desktop.
Save nvquanghuy/bd0fda7b88e5b2fd8e46e047e391d25f to your computer and use it in GitHub Desktop.
-- Demonstration on how Cohort Analysis is done. In order to do this we just need 2 tables:
-- users (id, timestamp)
-- activities (user_id, timestamp)
-- The below example use the following table names and fields, feel free to replace them with yours.
-- - users (id, created_at)
-- - activities (user_id, created_at)
Demo: https://demo.holistics.io/queries/11118-buyer-cohort-retention-analysis
-- MONTH_DIFF is a custom Postgres function that calculate the number difference between 2 months.
-- (user_id, cohort_month), each
with cohort_items as (
select
date_trunc('month', U.created_at)::date as cohort_month,
id as user_id
from public.users U
order by 1, 2
),
-- (user_id, month_number): user X has activity in month number X
user_activities as (
select
A.user_id,
MONTH_DIFF(
date_trunc('month', A.created_at)::date,
C.cohort_month
) as month_number
from public.activities A
left join cohort_items C ON A.user_id = C.user_id
group by 1, 2
),
-- (cohort_month, size)
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
-- (cohort_month, month_number, cnt)
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.user_id = C.user_id
group by 1, 2
)
-- our final value: (cohort_month, size, month_number, percentage)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users::float * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
order by 1, 3
where B.cohort_month IS NOT NULL
@chabior
Copy link

chabior commented Mar 4, 2019

Can you post MONTH_DIFF function as well?

@awhyit
Copy link

awhyit commented Jun 12, 2019

+1 on the MONTH_DIFF

@maxime
Copy link

maxime commented Nov 11, 2019

CREATE FUNCTION MONTH_DIFF (t_start timestamp, t_end timestamp)
	RETURNS integer
	AS $$
	SELECT
		(12 * extract('years' FROM a.i) + extract('months' FROM a.i))::integer
	FROM (
		values(justify_interval($2 - $1))) AS a (i)
$$
LANGUAGE SQL
IMMUTABLE
	RETURNS NULL ON NULL INPUT;

@ngtridung97
Copy link

In Postgre, I think we can use below built-in function as well, instead of MONTH_DIFF. Please correct me if I'm wrong.

date_part('month', age(date_trunc('month', A.transaction_date)::date, C.cohort_month)) as month_number

+1 on one of the best explanation about Cohort out there.

@RKaurGIT
Copy link

+1 on the best explanation about Cohort using SQL. I was looking to implement one for our project next week - will follow these guidelines. Amazing find !!

@limitless1991
Copy link

+1 for the great explanation of cohort analysis using SQL. I am not able to create this custom function in Metabase, can you please help me with an alternate solution to the function the postgre one is not giving the correct output.Thanks!

@holmes0078
Copy link

Great explanation on cohort analysis using SQL!

@rochitaggarwal54
Copy link

rochitaggarwal54 commented Jul 30, 2021

But this query will not add those users in total_users result who only just signed_up but not performed any activity after signed up.

@rmchrkv
Copy link

rmchrkv commented Sep 2, 2022

great job, thank you! maybe some hints on building a "rolling" version ?

@dunctho
Copy link

dunctho commented Oct 7, 2022

what is in the activities table?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment