Skip to content

Instantly share code, notes, and snippets.

@codecademydev
Created September 7, 2020 14:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save codecademydev/4b94d407df6dfa582f2893c1aa158c30 to your computer and use it in GitHub Desktop.
Save codecademydev/4b94d407df6dfa582f2893c1aa158c30 to your computer and use it in GitHub Desktop.
Codecademy export
/* Codecademy Project: Churn Rates at Codeflix */
/* Get familiar with the data ----------------------------------------------------------------------------- */
/*
STEP1:
Take a look at the first 100 rows of data in the subscriptions table. How many different segments do you see?
*/
SELECT *
FROM subscriptions
LIMIT 100;
/* Here, we can see 2 segments: 30 and 87 */
/*
STEP 2:
Determine the range of months of data provided. Which months will you be able to calculate churn for?
*/
SELECT MIN(subscription_start) AS 'Early Subscription',
MAX(subscription_start) AS 'Last Subscription'
FROM subscriptions;
/* We will be able to calculate churn for December (2016), January, Febreuary and March (2017) */
/* Calculate churn rate for each segment----------------------------------------------------------------------------- */
/*
STEP 3:
You’ll be calculating the churn rate for both segments (87 and 30) over the first 3 months of 2017 (you can’t calculate it for December, since there are no subscription_end values yet). To get started, create a temporary table of months.
*/
/*Temporary table of months*/
WITH months AS
(
SELECT
'2017-01-01' AS first_day,
'2017-01-31' AS last_day
UNION
SELECT
'2017-02-01' AS first_day,
'2017-02-28' AS last_day
UNION
SELECT
'2017-03-01' AS first_day,
'2017-03-31' AS last_day
),
/*
STEP 4:
Create a temporary table, cross_join, from subscriptions and your months. Be sure to SELECT every column.
*/
/*Temporary table joined months + subscriptions*/
cross_join AS
(
SELECT *
FROM subscriptions
CROSS JOIN months
),
/*
STEP 5:
Create a temporary table, status, from the cross_join table you created. This table should contain:
id selected from cross_join
month as an alias of first_day
is_active_87 created using a CASE WHEN to find any users from segment 87 who existed prior to the beginning of the month. This is 1 if true and 0 otherwise.
is_active_30 created using a CASE WHEN to find any users from segment 30 who existed prior to the beginning of the month. This is 1 if true and 0 otherwise.
*/
/*
STEP 6:
Add an is_canceled_87 and an is_canceled_30 column to the status temporary table. This should be 1 if the subscription is canceled during the month and 0 otherwise.
*/
/*Temprary table for the subscription status for each segment*/
status AS
(
SELECT id,
first_day AS month,
CASE
WHEN segment = 87 AND (subscription_start < first_day)
AND (
(subscription_end > first_day) OR (subscription_end IS NULL)
)
THEN 1
ELSE 0
END AS 'is_active_87',
CASE
WHEN segment = 87 AND (subscription_end BETWEEN first_day AND last_day)
THEN 1
ELSE 0
END AS 'is_canceled_87',
CASE
WHEN segment = 30 AND (subscription_start < first_day)
AND (
(subscription_end > first_day) OR (subscription_end IS NULL)
)
THEN 1
ELSE 0
END AS 'is_active_30',
CASE
WHEN segment = 30 AND (subscription_end BETWEEN first_day AND last_day)
THEN 1
ELSE 0
END AS 'is_canceled_30'
FROM cross_join
),
/*
STEP 7:
ACreate a status_aggregate temporary table that is a SUM of the active and canceled subscriptions for each segment, for each month.
The resulting columns should be:
sum_active_87
sum_active_30
sum_canceled_87
sum_canceled_30
*/
/*Temporary table of the sum of active and canceled subscriptions for each segment*/
status_aggregate AS
(
SELECT month, SUM(is_active_87) AS 'sum_active_87',
SUM(is_active_30) AS 'sum_active_30',
SUM(is_canceled_87) AS 'sum_canceled_87',
SUM(is_canceled_30) AS 'sum_canceled_30'
FROM status
GROUP BY month
)
/*
STEP 8:
Calculate the churn rates for the two segments over the three month period. Which segment has a lower churn rate?
*/
/*Churn Rate Calculation*/
SELECT month,
ROUND(100.0 * (status_aggregate.sum_canceled_87) / (status_aggregate.sum_active_87)) AS 'Churn Rate 87',
ROUND(100.0 * (status_aggregate.sum_canceled_30) / (status_aggregate.sum_active_30)) AS 'Churn Rate 30'
FROM status_aggregate;
/*The segment 30 has the lower churn rate*/
/*---------------------------------------------------------------------------------*/
/* BONUS */
/*
STEP 9:
How would you modify this code to support a large number of segments?
*/
/*-------------------------------------------------------------------------*/
WITH months AS
(
SELECT
'2017-01-01' AS first_day,
'2017-01-31' AS last_day
UNION
SELECT
'2017-02-01' AS first_day,
'2017-02-28' AS last_day
UNION
SELECT
'2017-03-01' AS first_day,
'2017-03-31' AS last_day
),
cross_join AS
(
SELECT *
FROM subscriptions
CROSS JOIN months
),
status AS
(
SELECT id, segment,
first_day AS month,
CASE
WHEN (subscription_start < first_day)
AND ((subscription_end > first_day) OR (subscription_end IS NULL))
THEN 1
ELSE 0
END AS 'is_active',
CASE
WHEN (subscription_end BETWEEN first_day AND last_day)
THEN 1
ELSE 0
END AS 'is_canceled'
FROM cross_join
),
status_aggregate AS
(
SELECT segment, month, SUM(is_active) AS 'sum_active',
SUM(is_canceled) AS 'sum_canceled'
FROM status
GROUP BY segment, month
)
SELECT month, segment,
ROUND(100.0 * (status_aggregate.sum_canceled) / (status_aggregate.sum_active)) AS 'Churn Rate'
FROM status_aggregate;
/*---------------------------------------------------------------------------------*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment