Skip to content

Instantly share code, notes, and snippets.

@timpamungkasudemy
Last active September 14, 2020 23:37
Show Gist options
  • Save timpamungkasudemy/13a10eafcb919db7160e597530a363e8 to your computer and use it in GitHub Desktop.
Save timpamungkasudemy/13a10eafcb919db7160e597530a363e8 to your computer and use it in GitHub Desktop.

BigQuery Course : Resource & Reference

This page is Resource & Reference for BigQuery course.

Table of Contents

Contents are grouped based on lecture title.
The order of contents is same as order at course, although not all lectures has resource.


Technology In This Course

Start with BigQuery

Functions

Common Data Types

Converting Data Types

Exploring Common Data Types

Clean & Transform With Dataprep

Essential BigQuery

Load Data Into BigQuery (Part 1)

The Basic

Load Data Into BigQuery (Part 2)

Handling Errors

DDL script for creating `sample_employee`
CREATE TABLE IF NOT EXISTS
`course_dataset.sample_employee` (
  employee_id STRING NOT NULL,
  first_name STRING NOT NULL,
  last_name STRING,
  email STRING NOT NULL OPTIONS(description = 'Email can be any domain, free or corporate domain'),
  gender STRING OPTIONS(description = 'F for Female, or M for Male'),
  birth_date DATE,
  salary INT64 NOT NULL ) OPTIONS (description = 'Table for saving sample employee',
  labels = [
    ("course_series", "google"),
    ("created_by", "timotius_pamungkas")
  ]
)
Script to generate sample_employee schema
bq show --schema --format=prettyjson course_dataset.sample_employee > sample_employee_schema.json

Load Data Into BigQuery (Part 3)

Efficient Load

  • Download sample data
  • Sample avro data from Google Cloud Storage. Use one of these bucket names to load into BigQuery:
    • without compression : course_bigquery_sample_data/load_data_into_bigquery_3/sample_employee_06.avro
    • deflate compression : course_bigquery_sample_data/load_data_into_bigquery_3/sample_employee_06_deflate.avro
    • snappy compression : course_bigquery_sample_data/load_data_into_bigquery_3/sample_purchase_06_snappy.avro
  • Avro specification
Loading avro using cloud shell script
 bq load --source_format=AVRO --use_avro_logical_types course_dataset.sample_employee "gs://course_bigquery_sample_data/load_data_into_bigquery_3/sample_employee_06.avro"

Load Data Into BigQuery (Part 4)

From Your Data to BigQuery

Sample Contractor

Load Data Into BigQuery (Part 5)

In Microservice Architecture

Further Reference for Modern Architecture (Microservice & Message Broker)

Load Data Into BigQuery (Part 6)

Recurring Load

BigQuery View

Query for stackoverflow question per user
SELECT
  user.display_name,
  user.location,
  COUNT(question.id) total_question
FROM
  `bigquery-public-data.stackoverflow.posts_questions` question
INNER JOIN
  `bigquery-public-data.stackoverflow.users` user
ON
  question.owner_user_id = user.id
GROUP BY
  user.display_name,
  user.location

Using Join

SQL for Cross Join
SELECT
  profile.*,
  merchandise.*
FROM
  `project-id.course_dataset.contractor_profile` profile,
  `project-id.course_dataset.contractor_merchandise` merchandise
SQL for Inner Join
SELECT
  profile.*,
  project.* EXCEPT(contractor_id)
FROM
  `project-id.course_dataset.contractor_profile` profile
JOIN
  `project-id.course_dataset.contractor_project` project
USING
  (contractor_id)
SQL for Left Outer Join
SELECT
  profile.*,
  award.* EXCEPT(contractor_id)
FROM
  `project-id.course_dataset.contractor_profile` profile
LEFT OUTER JOIN
  `project-id.course_dataset.contractor_award` award
USING
  (contractor_id)
ORDER BY
  award_name NULLS FIRST,
  contractor_id
SQL for Right Outer Join
SELECT
  profile.*,
  award.* EXCEPT(contractor_id)
FROM
  `project-id.course_dataset.contractor_profile` profile
RIGHT OUTER JOIN
  `project-id.course_dataset.contractor_award` award
USING
  (contractor_id)
ORDER BY
  award_name NULLS FIRST,
  contractor_id
SQL for Full Outer Join
SELECT
  profile.*,
  award.* EXCEPT(contractor_id)
FROM
  `project-id.course_dataset.contractor_profile` profile
FULL OUTER JOIN
  `project-id.course_dataset.contractor_award` award
USING
  (contractor_id)
ORDER BY
  award_name NULLS FIRST,
  contractor_id

Union and Intersect

SQL for Union & Intersect

Use keyword UNION ALL, UNION DISTINCT, or INTERSECT DISTINCT

SELECT
  company_name,
  contact_email,
  tax_identification_number
FROM
  `project-id.course_dataset.contractor_profile`
UNION ALL
SELECT
  company_name,
  contact_email,
  tax_identification_number
FROM
  `project-id.course_dataset.consultant_profile`
ORDER BY
  company_name

Basic Statistical Functions

Solution for basic statistical functions in loan_default
SELECT
  AVG(default_amount) AS mean
FROM
  `project-id.course_dataset.loan_default`
LIMIT
  1;

SELECT
  PERCENTILE_CONT(default_amount, 0.5) OVER() AS median_cont,
  PERCENTILE_DISC(default_amount, 0.5) OVER() AS median_disc
FROM
  `project-id.course_dataset.loan_default`
LIMIT
  1;

SELECT
  STDDEV_SAMP(default_amount) AS std_dev_sample,
  STDDEV(default_amount) AS std_dev_sample_alias,
  STDDEV_POP(default_amount) AS std_dev_pop
FROM
  `project-id.course_dataset.loan_default`
LIMIT
  1;
  
SELECT
  CORR(default_amount, loan_amount)
FROM
  `course-project-timpamungkas.course_dataset.loan_default`
LIMIT
  1;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment