Skip to content

Instantly share code, notes, and snippets.

View doit-mattporter's full-sized avatar

Matthew Porter doit-mattporter

  • DoiT International
  • Eugene, OR
View GitHub Profile
@doit-mattporter
doit-mattporter / table_access_patterns.py
Last active July 14, 2023 23:41
This Python script uses the Google Cloud BigQuery API to query a specified BigQuery dataset's job history, returning information about the total number of queries made to each table, the users who made those queries, the number of queries each user made, their last query, and the timestamp of their last query, sorted by the number of total queri…
#!/usr/bin/env python3
from google.cloud import bigquery
import pandas as pd
# Specify your dataset
project_id = "your_project_here"
dataset_id = "your_dataset_here"
# Specify how far back you want to look through your job history, up to 180 days
job_history_lookback_days = 2
@doit-mattporter
doit-mattporter / dataset_tables_metadata.py
Last active July 14, 2023 23:37
Utilizes the Google Cloud BigQuery API to retrieve and display metadata of all tables in a specified BigQuery dataset, including details such as table size, row count, creation time, last modified time, partitioning and clustering information, as well as an estimate of monthly storage cost
#!/usr/bin/env python3
from google.cloud import bigquery
import pandas as pd
client = bigquery.Client()
# Specify your dataset
project_id = "your_project_here"
dataset_id = "your_dataset_here"
@doit-mattporter
doit-mattporter / import_temperature_csv.sh
Created February 9, 2021 21:50
Import Kaggle-provided temperature dataset into BigQuery
bq load \
--replace \
--skip_leading_rows 1 \
--source_format=CSV \
${PROJECT_ID}:sensordata.temperature \
gs://${BUCKET_NAME}/temperature.csv.gz \ timestamp_utc:DATETIME,timestamp_epoch:INTEGER,temp_f:FLOAT,temp_c:FLOAT,device_id:INTEGER
@doit-mattporter
doit-mattporter / open_window_alert_requirements.txt
Created February 7, 2021 04:17
Requirements for the open_window_alert.py gist
google-cloud-bigquery-storage
pandas
pyarrow
@doit-mattporter
doit-mattporter / open_window_alert.py
Created February 7, 2021 04:15
Cloud Function for sending an e-mail alert on a window being open too long
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from google.cloud import bigquery
from google.cloud import bigquery_storage
def open_window_alert(_):
project_id = "YOUR_PROJECT_ID"
from_email = "YOUR_EMAIL"
@doit-mattporter
doit-mattporter / bigquery_ml_predict.sh
Created February 6, 2021 00:48
# Predict whether window was open in the last 600 seconds
# Predict whether window was open in the last 30 seconds
cat >predict.sql << EOL
DECLARE now_time DATETIME;
DECLARE lookback_time DATETIME;
DECLARE monitor_window INT64;
DECLARE lookback_time_monitor_window DATETIME;
SET now_time = CURRENT_DATETIME();
SET lookback_time = DATETIME_SUB(now_time, INTERVAL 2400 SECOND);
SET monitor_window = 600; --The amount of time a window should be allowed to be open. Max value of 1800s.
SET lookback_time_monitor_window = DATETIME_SUB(now_time, INTERVAL monitor_window SECOND);
@doit-mattporter
doit-mattporter / temperature_automl_model.sh
Created February 6, 2021 00:12
Construct an AutoML model
# Construct AutoML model
bq query --nouse_legacy_sql \
"CREATE OR REPLACE MODEL ${PROJECT_ID}.sensordata.temperature_automl_model
OPTIONS(MODEL_TYPE=\"AUTOML_CLASSIFIER\",
BUDGET_HOURS=24.0,
INPUT_LABEL_COLS=[\"object_code\"])
AS SELECT * EXCEPT (timestamp_utc, timestamp_epoch) FROM ${PROJECT_ID}.sensordata.temperature_pivoted_timediff"
@doit-mattporter
doit-mattporter / temperature_pivoted_timediff.sh
Created February 5, 2021 23:50
Create a table with changes in temperature recorded from a device's current value to the value x seconds ago: 5, 15, 30, 60, 180, 300, and 600 seconds
# Create a table with changes in temperature recorded from a device's current value to the value x seconds ago: 5, 15, 30, 60, 180, 300, and 600 seconds
bq query --nouse_legacy_sql \
"CREATE OR REPLACE TABLE ${PROJECT_ID}.sensordata.temperature_pivoted_timediff PARTITION BY DATE(timestamp_utc) CLUSTER BY timestamp_epoch AS (
SELECT
timestamp_utc,
timestamp_epoch,
object_code,
device_FrontOfLivingRoom,
device_LivingRoomCycleWindow,
device_LivingRoomKitchenCorner,
@doit-mattporter
doit-mattporter / temperature_pivoted.sh
Created February 5, 2021 23:33
Create a streaming temperature data pivot table such that all temps for any given second are recorded in one row, one column per device
# Create a streaming temperature data pivot table such that all temps for any given second are recorded in one row, one column per device
# Also, exclude timepoints where not all three devices were powered
bq query --nouse_legacy_sql \
"CREATE OR REPLACE TABLE ${PROJECT_ID}.sensordata.temperature_pivoted PARTITION BY DATE(timestamp_utc) CLUSTER BY timestamp_epoch AS (
SELECT * FROM (
SELECT
t.timestamp_utc,
ANY_VALUE(t.timestamp_epoch) AS timestamp_epoch,
ANY_VALUE(IF(device_id = 2753085432003927, temp_f, null)) AS device_FrontOfLivingRoom,
ANY_VALUE(IF(device_id = 2582277131565505, temp_f, null)) AS device_LivingRoomCycleWindow,
@doit-mattporter
doit-mattporter / window_opened_closed.sh
Created February 5, 2021 23:27
Create a BQ table of open & closed window time frames from a CSV on GCS
#!/usr/bin/env bash
PROJECT_ID=iottempstreaming
DATA_BUCKET=iot-window-open-close-times
bq load \
--autodetect \
--source_format=CSV \
$PROJECT_ID:sensordata.window_opened_closed \
gs://$DATA_BUCKET/OpeningAndClosingTimes.csv