Skip to content

Instantly share code, notes, and snippets.

@nvquanghuy

nvquanghuy/cohort.sql Secret

Last active Dec 28, 2020
Embed
What would you like to do?
-- Demonstration on how Cohort Analysis is done. In order to do this we just need 2 tables:
-- users (id, timestamp)
-- activities (user_id, timestamp)
-- The below example use the following table names and fields, feel free to replace them with yours.
-- - users (id, created_at)
-- - activities (user_id, created_at)
Demo: https://demo.holistics.io/queries/11118-buyer-cohort-retention-analysis
-- MONTH_DIFF is a custom Postgres function that calculate the number difference between 2 months.
-- (user_id, cohort_month), each
with cohort_items as (
select
date_trunc('month', U.created_at)::date as cohort_month,
id as user_id
from public.users U
order by 1, 2
),
-- (user_id, month_number): user X has activity in month number X
user_activities as (
select
A.user_id,
MONTH_DIFF(
date_trunc('month', A.created_at)::date,
C.cohort_month
) as month_number
from public.activities A
left join cohort_items C ON A.user_id = C.user_id
group by 1, 2
),
-- (cohort_month, size)
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
-- (cohort_month, month_number, cnt)
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.user_id = C.user_id
group by 1, 2
)
-- our final value: (cohort_month, size, month_number, percentage)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users::float * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
order by 1, 3
where B.cohort_month IS NOT NULL
@chabior

This comment has been minimized.

Copy link

@chabior chabior commented Mar 4, 2019

Can you post MONTH_DIFF function as well?

@awhyit

This comment has been minimized.

Copy link

@awhyit awhyit commented Jun 12, 2019

+1 on the MONTH_DIFF

@maxime

This comment has been minimized.

Copy link

@maxime maxime commented Nov 11, 2019

CREATE FUNCTION MONTH_DIFF (t_start timestamp, t_end timestamp)
	RETURNS integer
	AS $$
	SELECT
		(12 * extract('years' FROM a.i) + extract('months' FROM a.i))::integer
	FROM (
		values(justify_interval($2 - $1))) AS a (i)
$$
LANGUAGE SQL
IMMUTABLE
	RETURNS NULL ON NULL INPUT;
@ngtridung97

This comment has been minimized.

Copy link

@ngtridung97 ngtridung97 commented May 31, 2020

In Postgre, I think we can use below built-in function as well, instead of MONTH_DIFF. Please correct me if I'm wrong.

date_part('month', age(date_trunc('month', A.transaction_date)::date, C.cohort_month)) as month_number

+1 on one of the best explanation about Cohort out there.

@RKaurGIT

This comment has been minimized.

Copy link

@RKaurGIT RKaurGIT commented Jun 13, 2020

+1 on the best explanation about Cohort using SQL. I was looking to implement one for our project next week - will follow these guidelines. Amazing find !!

@limitless1991

This comment has been minimized.

Copy link

@limitless1991 limitless1991 commented Jul 18, 2020

+1 for the great explanation of cohort analysis using SQL. I am not able to create this custom function in Metabase, can you please help me with an alternate solution to the function the postgre one is not giving the correct output.Thanks!

@holmes0078

This comment has been minimized.

Copy link

@holmes0078 holmes0078 commented Sep 26, 2020

Great explanation on cohort analysis using SQL!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment