Subscriptions can change their product ID over time, as captured in a subscription_change_log
table.
subscription_id | new_product_id | changed_at |
---|---|---|
1 | 52 | 2019-01-10 |
1 | 57 | 2019-01-15 |
2 | 53 | 2019-01-12 |
Here, subscription 1 changes multiple times, but the change only takes effect once the customer gets rebilled.
Fortunately, I also have a table of rebillings
, that represent a recurring charge for a customer.
rebilling_id | subscription_id | rebilled_at |
---|---|---|
1 | 1 | 2019-02-01 |
2 | 1 | 2019-03-01 |
3 | 3 | 2019-02-01 |
Note that rebills that happen without an associated change are not of interest to us in this case.
I want to end up with something like this:
subscription_id | new_product_id | effective_at |
---|---|---|
1 | 57 | 2019-02-01 |
Here's how I did it in a dbt model:
with subscription_changes as (
select * from {{ ref('subscription_changes') }}
),
rebillings as (
select * from {{ ref('rebillings') }}
),
-- fan out the rebillings
subscription_changes_with_all_subsequent_rebillings as (
select
subscription_changes.*,
rebillings.rebill_id,
rebillings.rebilled_at,
-- use a rank function to rank the rebillings that happened after a
-- change occurred
dense_rank() over (
partition by subscription_changes.subscription_change_id
order by rebillings.rebilled_at
) as subsequent_rebilling_rank,
-- if the rank = 1, then this is the next rebilling
subsequent_rebilling_rank = 1 as is_next_rebilling
from subscription_changes
-- this join uses an inequality to find the rebillings that happen _after_
-- a subscription change
inner join rebillings
on rebillings.subscription_id = subscription_changes.subscription_id
and rebillings.rebilled_at >= subscription_changes.changed_at
),
-- filter to only find the next rebilling
subscription_changes_with_next_rebilling as (
select
subscription_change_id,
subscription_id,
new_product_id,
changed_at,
rebilled_at as effective_at,
-- use a row_number function to rank the subscription changes that happened
-- before a rebill
row_number() over (
partition by rebill_id
order by changed_at desc
) as most_recent_rank,
-- if the rank = 1, then this is the change that was in play when the
-- rebill occurred
most_recent_rank = 1 as is_effective_change
from subscription_changes_with_all_subsequent_rebillings
where is_next_rebilling
),
effective_subscription_changes as (
select
*
from subscription_changes_with_next_rebilling
where is_effective_change
)
select * from effective_subscription_changes