Skip to content

Instantly share code, notes, and snippets.

View robert8138's full-sized avatar

Robert Chang robert8138

  • Airbnb
  • San Francisco
View GitHub Profile
@robert8138
robert8138 / view_count_youtube_api.py
Last active June 10, 2020 18:50
Use YouTube API to get view trends of popular MOOC course
import os
import pandas as pd
import google_auth_oauthlib.flow
import googleapiclient.discovery
import googleapiclient.errors
import matplotlib
scopes = ["https://www.googleapis.com/auth/youtube.readonly"]
# Define the CREATE TABLE statement here
{%- macro create_table() %}
...
{%- endmacro %}
# Main ETL logic, insert the results into a STAGING table
{%- macro main() %}
...
{%- endmacro %}
CREATE TABLE IF NOT EXISTS fct_bookings (
id_listing BIGINT COMMENT 'Unique ID of the listing'
, id_host BIGINT COMMENT 'Unique ID of the host who owns the listing'
, m_bookings BIGINT COMMENT 'Denoted 1 if a booking transaction occurred'
)
PARTITION BY ( -- this is how we define partition keys
ds STRING
);
@robert8138
robert8138 / paremterized_workflow.sql
Last active June 8, 2020 23:44
Parameterize workflow to incorporate backfill logic
{%- if backfill %}
INSERT OVERWRITE TABLE bookings_summary PARTITION (ds)
{%- else %}
INSERT OVERWRITE TABLE bookings_summary PARTITION (ds = '{{ ds }}')
{%- endif %}
SELECT
dim_market
, SUM(m_bookings) AS m_bookings
{%- if backfill %}
, ds
@robert8138
robert8138 / incremental_load.sql
Last active July 11, 2022 12:08
Demonstrate how to do incremental load
-- Not Recommended Approach: Scan the entire table and rebuild everyday
INSERT OVERWRITE TABLE dim_total_bookings PARTITION (ds = '{{ ds }}')
SELECT
dim_market
, SUM(m_bookings) AS m_bookings
FROM
fct_bookings
WHERE
ds <= '{{ ds }}' -- this is expensive, and can quickly run into scale issue
GROUP BY
@robert8138
robert8138 / partition_by_ds.sql
Created February 6, 2018 06:39
A table partitioned by datestamp
Column | Type | Extra | Comment
--------------------+---------+-----------------+-------------------------------------------
id_listing | bigint | | Unique id of the listing.
id_host | bigint | | Unique id of the host for the listing
dim_market | varchar | | The market in which the listing is located
ds | varchar | partition key |
(4 rows)
@robert8138
robert8138 / dynamic_partition.sql
Last active February 6, 2018 16:59
An example of dynamic partitions
INSERT OVERWRITE TABLE bookings_summary PARTITION (ds)
SELECT
dim_market
, SUM(m_bookings) AS m_bookings
, ds # For Hive to know we are using dynamic partitions
FROM
fct_bookings
WHERE
ds BETWEEN '{{ earliest_ds }}' AND '{{ latest_ds }}'
GROUP BY
@robert8138
robert8138 / non_dynamic_partition_end.sql
Created February 6, 2018 06:13
Dynamic Partition Example 2
INSERT OVERWRITE TABLE fct_bookings PARTITION (ds='{{ latest_ds }}')
SELECT
id_listing
, m_bookings
FROM
some_up_stream_booking_tables
WHERE
ds = '{{ latest_ds }}'
;
@robert8138
robert8138 / non_dynamic_partition.sql
Last active February 6, 2018 16:56
Dynamic Partitions Start
INSERT OVERWRITE TABLE bookings_summary PARTITION (ds= '{{ earliest_ds }}')
SELECT
dim_market
, SUM(m_bookings) AS m_bookings
FROM
fct_bookings
WHERE
ds = '{{ earliest_ds }}'
GROUP BY
dim_market
@robert8138
robert8138 / fct_and_dim_tables_in_action.sql
Last active February 24, 2020 13:31
Fact and Dimension Tables in Actions
SELECT
b.dim_market
, SUM(a.m_bookings) AS m_bookings
FROM (
SELECT
id_listing
, 1 AS m_bookings
, m_a # not used (for illustration only)
, m_b # not used (for illustration only)
, m_c # not used (for illustration only)