Skip to content

Instantly share code, notes, and snippets.

@augescens
Last active July 5, 2016 18:54
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save augescens/01234cb80bc17712ebfb to your computer and use it in GitHub Desktop.
Save augescens/01234cb80bc17712ebfb to your computer and use it in GitHub Desktop.
Example frequency report for "Writing Analytics SQL with Common Table Expressions" blog post
/*
Copyright 2014 Neustar, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/* Frequency Report:
* For each impression frequency class for each campaign, calculate the total
* number of users, impressions, clicks, and conversions for the month of March
* 2014. */
WITH
/* ======== Filter input tables by date range. ======== */
filtered_impressions AS (
SELECT record_date,
user_id,
campaign_id
FROM impressions
WHERE record_date >= '2014-03-01' AND
record_date < '2014-04-01'
),
filtered_clicks AS (
SELECT record_date,
user_id,
campaign_id
FROM clicks
WHERE record_date >= '2014-03-01' AND
record_date < '2014-04-01'
),
/* Create a single filtered_conversions CTE by appending filtered
* click_attributed_conversions to filtered
* impression_attributed_conversions. */
filtered_conversions AS (
(
SELECT record_date,
user_id,
campaign_id
FROM impression_attributed_conversions
WHERE record_date >= '2014-03-01' AND
record_date < '2014-04-01'
)
UNION ALL
(
SELECT record_date,
user_id,
campaign_id
FROM click_attributed_conversions
WHERE record_date >= '2014-03-01' AND
record_date < '2014-04-01'
)
),
/* ======== Calculate user-level counts. ======== */
/* For each (user_id, campaign_id) tuple, calculate the sum total of impression,
* click, and conversion events. */
impression_counts AS (
SELECT user_id,
campaign_id,
SUM(1) AS impression_count
FROM filtered_impressions
GROUP BY 1, 2
),
click_counts AS (
SELECT user_id,
campaign_id,
SUM(1) AS click_count
FROM filtered_clicks
GROUP BY 1, 2
),
conversion_counts AS (
SELECT user_id,
campaign_id,
SUM(1) AS conversion_count
FROM filtered_conversions
GROUP BY 1, 2
),
/* ======== Collate user-level counts. ========= */
/* JOIN to combine impression, click, and conversion counts by (user_id,
* campaign_id) tuples. */
collated_counts AS (
SELECT imp.user_id AS user_id,
imp.campaign_id AS campaign_id,
imp.impression_count AS impression_count,
cl.click_count AS click_count,
conv.conversion_count AS conversion_count
FROM impression_counts imp
LEFT OUTER JOIN click_counts cl ON
imp.user_id = cl.user_id AND
imp.campaign_id = cl.campaign_id
LEFT OUTER JOIN conversion_counts conv ON
imp.user_id = conv.user_id AND
imp.campaign_id = conv.campaign_id
)
/* ======== Pivot to produce report by frequency classes. ======== */
/* For each (impression_count, campaign_id) tuple, calculate the total number of
* users, impressions, clicks, and conversions. Use COALESCE to replace any NULL
* values with 0s. */
SELECT impression_count AS frequency_class,
campaign_id AS campaign_id,
SUM(1) AS total_users,
SUM(impression_count) AS total_impressions,
SUM(COALESCE(click_count, 0)) AS total_clicks,
SUM(COALESCE(conversion_count, 0)) AS total_conversions
FROM collated_counts
GROUP BY 1, 2
;
@augescens
Copy link
Author

This is an example for the blog post "Writing Analytics SQL with Common Table Expressions."

Compare this version to the same query written with nested subqueries.

@hasokeric
Copy link

This Article has been very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment