Skip to content

Instantly share code, notes, and snippets.

@nvquanghuy

nvquanghuy/cohort.sql Secret

Last active Jul 30, 2021
Embed
What would you like to do?
-- Demonstration on how Cohort Analysis is done. In order to do this we just need 2 tables:
-- users (id, timestamp)
-- activities (user_id, timestamp)
-- The below example use the following table names and fields, feel free to replace them with yours.
-- - users (id, created_at)
-- - activities (user_id, created_at)
Demo: https://demo.holistics.io/queries/11118-buyer-cohort-retention-analysis
-- MONTH_DIFF is a custom Postgres function that calculate the number difference between 2 months.
-- (user_id, cohort_month), each
with cohort_items as (
select
date_trunc('month', U.created_at)::date as cohort_month,
id as user_id
from public.users U
order by 1, 2
),
-- (user_id, month_number): user X has activity in month number X
user_activities as (
select
A.user_id,
MONTH_DIFF(
date_trunc('month', A.created_at)::date,
C.cohort_month
) as month_number
from public.activities A
left join cohort_items C ON A.user_id = C.user_id
group by 1, 2
),
-- (cohort_month, size)
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
-- (cohort_month, month_number, cnt)
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.user_id = C.user_id
group by 1, 2
)
-- our final value: (cohort_month, size, month_number, percentage)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users::float * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
order by 1, 3
where B.cohort_month IS NOT NULL
@chabior

This comment has been minimized.

Copy link

@chabior chabior commented Mar 4, 2019

Can you post MONTH_DIFF function as well?

@awhyit

This comment has been minimized.

Copy link

@awhyit awhyit commented Jun 12, 2019

+1 on the MONTH_DIFF

@maxime

This comment has been minimized.

Copy link

@maxime maxime commented Nov 11, 2019

CREATE FUNCTION MONTH_DIFF (t_start timestamp, t_end timestamp)
	RETURNS integer
	AS $$
	SELECT
		(12 * extract('years' FROM a.i) + extract('months' FROM a.i))::integer
	FROM (
		values(justify_interval($2 - $1))) AS a (i)
$$
LANGUAGE SQL
IMMUTABLE
	RETURNS NULL ON NULL INPUT;
@ngtridung97

This comment has been minimized.

Copy link

@ngtridung97 ngtridung97 commented May 31, 2020

In Postgre, I think we can use below built-in function as well, instead of MONTH_DIFF. Please correct me if I'm wrong.

date_part('month', age(date_trunc('month', A.transaction_date)::date, C.cohort_month)) as month_number

+1 on one of the best explanation about Cohort out there.

@RKaurGIT

This comment has been minimized.

Copy link

@RKaurGIT RKaurGIT commented Jun 13, 2020

+1 on the best explanation about Cohort using SQL. I was looking to implement one for our project next week - will follow these guidelines. Amazing find !!

@limitless1991

This comment has been minimized.

Copy link

@limitless1991 limitless1991 commented Jul 18, 2020

+1 for the great explanation of cohort analysis using SQL. I am not able to create this custom function in Metabase, can you please help me with an alternate solution to the function the postgre one is not giving the correct output.Thanks!

@holmes0078

This comment has been minimized.

Copy link

@holmes0078 holmes0078 commented Sep 26, 2020

Great explanation on cohort analysis using SQL!

@rochitaggarwal54

This comment has been minimized.

Copy link

@rochitaggarwal54 rochitaggarwal54 commented Jul 30, 2021

But this query will not add those users in total_users result who only just signed_up but not performed any activity after signed up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment