Skip to content

Instantly share code, notes, and snippets.

@jster1357
jster1357 / optimized-tpch.sql
Created June 30, 2025 16:52
Clustering DDL for TPCH for BigQuery
CREATE OR REPLACE TABLE tpch_optimized.region (
r_regionkey INT64,
r_name STRING,
r_comment STRING)
CLUSTER BY r_regionkey;
CREATE OR REPLACE TABLE tpch_optimized.nation (
n_nationkey INT64,
n_name STRING,
n_regionkey INT64,
@jster1357
jster1357 / tpch-ddl.sql
Created June 26, 2025 15:35
TPCH DDL for BigQuery
CREATE TABLE tpch.supplier
(
s_suppkey INT64,
s_name STRING,
s_address STRING,
s_nationkey INT64,
s_phone STRING,
s_acctbal FLOAT64,
s_comment STRING,
@jster1357
jster1357 / tpch.sql
Last active June 26, 2025 15:25
TPCH SQL for BigQuery
SELECT l_returnflag, l_linestatus, SUM(l_quantity) AS SUM_qty, SUM(l_extENDedprice) AS SUM_base_price, SUM(l_extENDedprice * (1-l_discount)) AS SUM_disc_price, SUM(l_extENDedprice * (1-l_discount) * (1 + l_tax)) AS SUM_charge, avg(l_quantity) AS avg_qty, avg(l_extENDedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM tpch_optimized.lineitem WHERE l_shipdate <= DATE_ADD(DATE "1996-12-01", INTERVAL 60 DAY) GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
SELECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment FROM tpch_optimized.part, tpch_optimized.supplier, tpch_optimized.partsupp, tpch_optimized.nation, tpch_optimized.region WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey AND p_size > 15 AND p_type like '%BRASS' AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'EUROPE' AND ps_supplycost = ( SELECT MIN(ps_supplycost) FROM tpch_optimized.partsupp, tpch_optimized.supplier, tpch_optimized.nation, tpch_
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
5994000001|53532433|O|176627.30|1997-03-30|1-URGENT|Clerk#000632485|0|ly bold sentiments integrate doggedly? furious
5994000002|9601419|F|292586.12|1993-10-16|3-MEDIUM|Clerk#000171252|0|special pinto beans; furiously even ideas sle
5994000003|41037313|F|218489.85|1994-07-28|2-HIGH|Clerk#000218085|0|accounts may cajole. final dinos wake f
5994000004|14091148|O|52208.87|1998-02-22|2-HIGH|Clerk#000348944|0|ct furiously around the care
5994000005|129915457|F|201126.75|1993-07-18|1-URGENT|Clerk#000687363|0|ainst the slyly special courts. quickly ironi
@jster1357
jster1357 / bq_schema_extractor_to_json.py
Created March 29, 2024 13:32
Terraform requires BQ table details to be in JSON format. This code extracts the schema, partitioning, and clustering details in JSON format so it can be used in Terraform BQ object creation.
import google.auth.transport.requests
from google.oauth2 import id_token
from google.oauth2 import service_account
from google.auth.transport.requests import AuthorizedSession
import google.auth
import requests
import json
import os
##set variables
@jster1357
jster1357 / PipelineMetadataExtractionPipeline.json
Created October 31, 2022 19:27
PipelineMetadataExtractionPipeline.json
{
"artifact": {
"name": "cdap-data-pipeline",
"version": "6.7.1",
"scope": "SYSTEM"
},
"description": "Data Pipeline Application",
"name": "getRunIDMetrics_v4",
"config": {
"resources": {
@jster1357
jster1357 / gist:9dcbf8346d170c3423bec036555ee009
Last active November 2, 2022 15:16
getCDFMetadataDAGComposer
import datetime
import re
from airflow import models
from airflow.providers.google.cloud.operators.datafusion import CloudDataFusionStartPipelineOperator
from airflow.providers.google.cloud.sensors.datafusion import CloudDataFusionPipelineStateSensor
from airflow.providers.google.cloud.hooks.datafusion import PipelineStates
from airflow.utils.dates import days_ago
PROJECT_ID=""
REGION=""
@jster1357
jster1357 / variables.tf
Created June 2, 2022 17:59
Data Fusion Terraform - no shared vpc w/ pipeline load
variable "project_id" {
description = "The project id of the cdf deployment"
type = string
}
variable "instance_name" {
description = "The instance name."
type = string
}
@jster1357
jster1357 / resource.tf
Created June 2, 2022 17:40
Data Fusion Terraform - no shared vpc w/ pipeline load
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "4.22.0"
}
cdap = {
source = "GoogleCloudPlatform/cdap"
# Pin to a specific version as 0.x releases are not guaranteed to be backwards compatible.
version = "0.9.0"
@jster1357
jster1357 / main.tf
Created June 2, 2022 17:35
Data Fusion Terraform - no shared vpc w/ loaded pipelines
resource "google_data_fusion_instance" "create_instance" {
name = var.instance_name
description = var.description
region = var.region
type = var.cdf_version
enable_stackdriver_logging = true
enable_stackdriver_monitoring = true
labels = {
instance_owner = var.instance_owner
}