Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save markomitranic/703675fb27e8c1e2b5195c05a0324f3b to your computer and use it in GitHub Desktop.
Save markomitranic/703675fb27e8c1e2b5195c05a0324f3b to your computer and use it in GitHub Desktop.
Sales forecasting model written for GOOGLE BIGQUERY
--------**********---------
/*
AUTHOR: CHRISTIAN SASSANO/DARYL PAUL : WELD TECHNOLOGIES ApS
PUBLISHED: MAY 2022
https://weld.app
Join us in slack: https://join.slack.com/t/weldcommunity/shared_invite/zt-19b372339-1m9ftp0YW2AU8ToZpBViAA
Sales forecasting model written for GOOGLE BIGQUERY using the following inputs
CUSTOMER_ID|FIRST_ENTERED_SALES_FUNNEL_DATE|FIRST_CONVERTED_DATE|DAYS_TO_WON|ENTERED_SALES_FUNNEL|CONVERTED
Explain each CTE
variables: Set Forecast Start/End Date, Perc of Sales Funnel to Exclude when calculating win rate, number of most recent days to take average of to estimate future qualified leads
NOTE: whatever FORECAST_START_DATE you set, the model will ignore all data that would not have been known by that date. (no cheating)
current companies: inputs from your business. Table with rows: CUSTOMER_ID|FIRST_ENTERED_SALES_FUNNEL_DATE|FIRST_CONVERTED_DATE|DAYS_TO_WON|ENTERED_SALES_FUNNEL|CONVERTED
date_dim: bq specific date series
daily deals created: establish how many additions to top funnel happen each day - add zeroes for days that do not have data
-----intermediate steps to establishing smoothed cumulative win curve-----
win_curve
current_smoothed_curve_prep
current_smoothed_curve
--------------------------------------------------------------------
current_forecast_periods: creating cumulative win curve from intermediate steps
days_to_exclude: takes % (from VARIABLE: perc_last_salesfunnel_exclude) to calculate the # of days to exclude from recent data when calculating Win Rate Assumption
current_win_rate: outputs ONE number -- overall Win Rate
daily_top_funnel_avg: outputs ONE number: average of most recent X days (depending on VARIABLE: num_days_to_est_top_funnel)
daily_top_funnel_est: takes number from daily_top_funnel_avg and assigns it to all days in forecast period
current_known_company_forecast: forecasting probability for each customer that exists in pipeline (as of beginning of forecast period)
current_unknown_company_forecast: forecasting probability for future pipeline customers (as of beginning of forecast period)
current_known_company_daily_forecast: aggregating known pipeline into rows per day
current_unknown_company_daily_forecast: aggregating future pipeline deals into row per day
current_daily_forecast: combining known and future pipeline into value per day
current_known_company_monthly_forecast: aggregating known pipeline into rows per month
current_unknown_company_monthly_forecastaggregating future pipeline deals into row per month
current_monthly_forecast: combining known and future pipeline into value per day
final_monthly: expected leads to have been won broken out by the month they are expected to be won in, and the month in which they were added to the sales funnel
final_daily_forecast_vs_actual: daily forecast vs actuals
*/
--------**********---------
with
variables as (
--enter your custom variables here
select
cast('2021-06-01' as date) as forecast_start_date ,
cast('2021-12-30' as date) as forecast_end_date ,
.95 as perc_last_salesfunnel_exclude, -- usually this would be set to something like .99, but as we've only had about 1 full sales funnel, i bumped it
92 as num_days_to_est_top_funnel -- this X day period will calculate the avg top funnel activities per day and use that to estimate the rest
),
--select * from variables
current_companies as (
select
customer_id as customer_id,
entered_sales_funnel as deal_created,
first_entered_sales_funnel_date as first_entered_sales_funnel_date,
case when first_converted_date < forecast_start_date then converted else 0 end as converted,
--we do not know that these companies will win if they are before the forecast_start_date
first_converted_date as first_converted_date,
days_to_won,
variables.*
from {{analytics.sales_forecast.company_simple_funnel_demo}}
join variables on 1=1
where
first_entered_sales_funnel_date < forecast_start_date
-- ^ only taking companies created before beginning of forecast period
),
date_dim as (
--BIGQUERY SYNTAX FOR GENERATING DATE DIMENSION TABLE
SELECT
day as date
, date_trunc(day, month) as month
, date_trunc(day, year) as year
, EXTRACT(YEAR FROM day) as year_int
, EXTRACT(MONTH FROM day) as month_int
, EXTRACT(day FROM day) as day_int
, EXTRACT(DAYOFWEEK FROM day) as day_of_week_int,
CONCAT(CAST(EXTRACT(YEAR FROM day) AS STRING), "-Q", CEIL(EXTRACT(MONTH FROM day) / 3)) AS quarter,
CONCAT(CAST(EXTRACT(YEAR FROM day) AS STRING), "-T", CEIL(EXTRACT(MONTH FROM day) / 4)) AS tertial
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2000-01-01'), '2099-12-31')) AS day
),
actuals_conversions as (
-- will use this to join to forecast at the end
select date, sum(coalesce(converted,0)) as num_conversions_actual
from {{analytics.sales_forecast.company_simple_funnel_demo}}
right join date_dim on first_converted_date = date_dim.date
group by 1
),
actuals_top_funnel as (
-- will use this to join to forecast at the end
select date, sum(coalesce(entered_sales_funnel,0)) as num_deals_created_actual
from `analytics.sales_forecast__company_simple_funnel_demo`
right join date_dim on first_entered_sales_funnel_date = date_dim.date
group by 1
),
daily_deals_created as (
select dim_day, sum(coalesce(deal_created,0)) deals_created
from current_companies
right join
(select
date as dim_day
from date_dim
where date between (select min(first_entered_sales_funnel_date) from current_companies) and current_date + interval 1 year)
on first_entered_sales_funnel_date = dim_day
group by 1
order by 1
),
win_curve as (
select
days_to_won,
(sum(converted) over (order by days_to_won) / sum(converted) over ()) as perc_won_by
from
current_companies
where
(case when first_converted_date >= forecast_start_date then 0 else converted end) = 1
-- removing deals won after forecast start from numerator but not denominator
order by
days_to_won
),
-- creating smooth win curve to create value for every day
current_smoothed_curve_prep as (
SELECT
*,
min(days_to_won) over (order by days desc) as next_known_day,
max(days_to_won) over (order by days) as last_known_day,
min(perc_won_by) over (order by days desc) as next_known_perc,
max(perc_won_by) over (order by days) as last_known_perc,
min(days_to_won) over (order by days desc) - coalesce(max(days_to_won) over (order by days),0) known_day_diff
FROM
UNNEST(GENERATE_ARRAY(1, (select max(days_to_won) from win_curve))) days -- BQ specific method for generating column of integers
left join win_curve on days = days_to_won
order by
days
),
current_smoothed_curve as
--lots of window functions, this fills in values for every day by doing some arithmetic on window functions from _prep step
(
select
*,
coalesce(
perc_won_by,
(((coalesce(days - last_known_day, days) * (next_known_perc - coalesce(last_known_perc, 0))) / known_day_diff) + coalesce(last_known_perc, 0))
) as smoothed_perc_won_by
from
current_smoothed_curve_prep
),
current_smoothed_win_curve as (
select distinct days, smoothed_perc_won_by
from current_smoothed_curve
--outputs for each day, an estimated % of total deals that have been won by that day since deal creation
),
current_forecast_periods as (
SELECT
days,
smoothed_perc_won_by,
(coalesce(smoothed_perc_won_by - lag(smoothed_perc_won_by) over (order by days),smoothed_perc_won_by)) as perc_won_discrete_forecast
from current_smoothed_win_curve
order by days
),
days_to_exclude as (
--this function returns one number, which will give us a bound on calculating the win rate on our upper funnel numbers
select min(days) as exclude_range
from current_smoothed_win_curve
join variables on 1=1
where smoothed_perc_won_by > perc_last_salesfunnel_exclude
),
--select * from days_to_exclude
current_win_rate as (
select
sum(converted) / sum(deal_created) as win_rate -- could easily overwrite this line to some static number
from current_companies
join days_to_exclude on 1=1
where
first_entered_sales_funnel_date < least(forecast_start_date, current_date) - exclude_range
and coalesce(first_converted_date, forecast_start_date-1) < forecast_start_date
--need to calculate win rate such that you don't count days in the excluded period against the overall win %
),
daily_top_funnel_avg as (
select avg(deals_created) as avg_deals_created_per_day
from daily_deals_created
left join variables on 1=1
where dim_day between least(forecast_start_date,current_date) - num_days_to_est_top_funnel and least(forecast_start_date,current_date) - 1
-- ^ only taking relevant dates
),
daily_top_funnel_est as (
select dim_day,
avg_deals_created_per_day as deals_created_est
from daily_deals_created
left join daily_top_funnel_avg on 1=1
left join variables on 1=1
where dim_day between least(forecast_start_date,current_date) and forecast_end_date
-- using the estimates from before to forecast future dates in forecast period
),
current_known_company_forecast as (
select
customer_id,
first_entered_sales_funnel_date,
date_add(first_entered_sales_funnel_date, interval fp.days day) as forecast_period_date,
--adding the number of days to each period
perc_won_discrete_forecast * wr.win_rate as perc_won_discrete_forecast
--^multiplying by win rate
from current_companies
join current_forecast_periods fp on 1=1
join current_win_rate wr on 1=1
where converted = 0
--don't want to forecast more wins from already won companies
),
current_unknown_company_forecast as (
select
dim_day,
date_add(dim_day, interval fp.days day) as forecast_period_date,
deals_created_est * (perc_won_discrete_forecast * wr.win_rate) as perc_won_discrete_forecast
--^multiplying by win rate AND future deal created est
from daily_top_funnel_est
join current_forecast_periods fp on 1=1
join current_win_rate wr on 1=1
order by 1,2
),
current_known_company_daily_forecast as (
-- for each month, the forecast for each company
select
date_trunc(forecast_period_date, day) forecast_day,
date_trunc(first_entered_sales_funnel_date, day) cohort_day,
round(sum(perc_won_discrete_forecast),3) expected_current_deals_won
from
current_known_company_forecast
join variables on 1=1
where
forecast_period_date between forecast_start_date and forecast_end_date
group by
1 ,2
order by
1 desc, 2
),
current_unknown_company_daily_forecast as (
-- forecast for each estimated additional future deal
select
date_trunc(forecast_period_date, day) forecast_day,
date_trunc(dim_day, day) cohort_day,
round(sum(perc_won_discrete_forecast),3) expected_future_deals_won
from
current_unknown_company_forecast
join variables on 1=1
where
forecast_period_date between forecast_start_date and forecast_end_date
group by
1 ,2
order by
1 desc, 2
),
current_daily_forecast as
(
select
coalesce(fp.forecast_day, cp.forecast_day) forecast_day,
-- coalesce(fp.cohort_day, cp.cohort_day) cohort_day, -- do not add cohort months to daily forecast
sum(cp.expected_current_deals_won) as expected_current_deals_won,
sum(fp.expected_future_deals_won) as expected_future_deals_won,
sum(coalesce(cp.expected_current_deals_won,0) + coalesce(fp.expected_future_deals_won,0)) as total_expected_deals_won
from current_known_company_daily_forecast cp
full outer join current_unknown_company_daily_forecast fp
on fp.forecast_day = cp.forecast_day and fp.cohort_day=cp.cohort_day
group by 1
order by 1,2
),
current_known_company_monthly_forecast as (
-- for each month, the forecast for each company
select
date_trunc(forecast_period_date, month) forecast_month,
date_trunc(first_entered_sales_funnel_date, month) cohort_month,
(sum(perc_won_discrete_forecast)) expected_current_deals_won
from
current_known_company_forecast
join variables on 1=1
where
forecast_period_date between forecast_start_date and forecast_end_date
group by
1 ,2
order by
1 desc, 2
),
current_unknown_company_monthly_forecast as (
-- forecast for each estimated additional future deal
select
date_trunc(forecast_period_date, month) forecast_month,
date_trunc(dim_day, month) cohort_month,
(sum(perc_won_discrete_forecast)) expected_future_deals_won
from
current_unknown_company_forecast
join variables on 1=1
where
forecast_period_date between forecast_start_date and forecast_end_date
group by
1 ,2
--order by
-- 1 desc, 2
),
-- select *
-- from current_unknown_company_monthly_forecast
current_monthly_forecast as
(
select
coalesce(fp.forecast_month, cp.forecast_month) forecast_month,
coalesce(fp.cohort_month, cp.cohort_month) cohort_month,
cp.expected_current_deals_won,
--cp.expected_current_deals_won_cum,
fp.expected_future_deals_won,
--fp.expected_future_deals_won_cum,
coalesce(cp.expected_current_deals_won,0) + coalesce(fp.expected_future_deals_won,0) as total_expected_deals_won
from current_known_company_monthly_forecast cp
full outer join current_unknown_company_monthly_forecast fp
on fp.forecast_month = cp.forecast_month and fp.cohort_month=cp.cohort_month
order by 1,2
),
final_monthly as (
select
cmf.*,
round(sum(total_expected_deals_won) over (order by forecast_month, cohort_month),3) as total_expected_deals_won_cum,
round(win_rate,3) as estimated_win_rate,
round(avg_deals_created_per_day,3) as estimated_avg_deals_created_per_day,
exclude_range
from current_monthly_forecast cmf
join current_win_rate wr on 1=1
join daily_top_funnel_avg tf on 1=1
join days_to_exclude de on 1=1
order by 1,2
),
final_daily_forecast_vs_actual as
(
select actuals_conversions.date,
round(sum(total_expected_deals_won) over (order by forecast_day),3) as total_expected_deals_won_cum,
case when forecast_day >= CURRENT_DATE() then null else round(sum(num_conversions_actual) over (order by forecast_day),3) end as total_actual_deals_won_cum
from current_daily_forecast
left join variables on 1=1
left join actuals_conversions on forecast_day = date
)
-------------**********--------------
/*
End of CTEs for modelling. Expain how to call the model with their data using one of the select *'s
select * from final_monthly --OUTPUTS MONTHLY FORECAST WITH ASSUMPTIONS INCLUDED AS COLUMNS
select * from final_daily_forecast_vs_actual --OUTPUTS DAILY FORECAST WITH ESTIMATES VS ACTUALS
##
select *,
round(sum(total_expected_deals_won) over (order by forecast_day),3) as total_expected_deals_won_cum
from current_daily_forecast
## OUTPUTS DAILY FORECAST
select * from current_forecast_periods -- Outputs Cumulative Distribution Function
##
select date,
case when date < forecast_start_date then num_deals_created_actual else null end as deals_created_previous,
case when date >= forecast_start_date and date <= CURRENT_DATE() then num_deals_created_actual else null end as deals_created_forecast_period,
case when date >= forecast_start_date then avg_deals_created_per_day else null end as est_avg_deals_created_per_day
from top_funnel_actuals
left join daily_top_funnel_avg on 1=1
left join variables on 1=1
where date between forecast_start_date - num_days_to_est_top_funnel and forecast_end_date
## OUTPUTS TOP FUNNEL ASSUMPTIONS / ACTUALS over forecast period (and pre-forecast period)
Join us in slack: https://join.slack.com/t/weldcommunity/shared_invite/zt-19b372339-1m9ftp0YW2AU8ToZpBViAA
*/
select * from final_monthly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment