Skip to content

Instantly share code, notes, and snippets.

View caohy1988's full-sized avatar

Hai-Yuan Cao caohy1988

View GitHub Profile
@caohy1988
caohy1988 / 07_information_schema_ci_cost.sql
Created April 24, 2026 21:04
Medium post #2 — section 6 INFORMATION_SCHEMA cost-per-feature pivot for SDK-labeled BigQuery jobs
-- Section 6 INFORMATION_SCHEMA cost pivot from Medium post #2:
-- "Your Agent Events Table Is Also a Test Suite"
--
-- The BigQuery Agent Analytics SDK labels every query it issues
-- with the feature that triggered it. This pivot groups BQ jobs
-- from the last 24 hours by `sdk_feature`, so you can see what the
-- CI gate cost in BQ compute and what the developer trace-reads
-- after a failing run cost on top of that.
--
-- Swap `region-us` for the region your dataset lives in
@caohy1988
caohy1988 / 06_categorical_eval_metrics_and_gate.md
Created April 24, 2026 21:04
Medium post #2 — section 5 categorical CI gate: metrics.json + one command

Categorical CI gate — metrics.json + one command

Companion to Medium post #2, section 5 ("Going deeper — add a categorical gate").

Requires bigquery-agent-analytics >= 0.2.2, which ships categorical-eval --exit-code --pass-category --min-pass-rate.

1. Metrics file

@caohy1988
caohy1988 / 05_evaluate_thresholds_workflow.yml
Last active April 24, 2026 21:47
Medium post #2 — section 4 reference GitHub Actions workflow: four deterministic agent quality gates
# .github/workflows/evaluate_thresholds.yml
#
# Section 4 reference workflow from Medium post #2:
# "Your Agent Events Table Is Also a Test Suite"
#
# Four deterministic gates run against the last 24 hours of
# production traces every time a PR touches agent code or prompts.
# Each gate is its own step so a red status tells you which budget
# regressed.
#
@caohy1988
caohy1988 / 04_evaluate_exit_code_one_liner.sh
Created April 24, 2026 21:04
Medium post #2 — section 3 hero command: evaluate --exit-code one-liner
#!/usr/bin/env bash
# Section 3 hero command from Medium post #2:
# "Your Agent Events Table Is Also a Test Suite"
#
# Runs the deterministic latency gate against the last 24 hours of
# production traces. Exit 0 = all sessions within budget; exit 1 =
# at least one session regressed; exit 2 = configuration error.
#
# The SDK's `evaluate --exit-code` path also prints one readable
# FAIL session=... observed=... budget=... line on stderr per failing
@caohy1988
caohy1988 / example_demo.html
Created March 5, 2026 08:35
example demo html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>BigQuery Agent Analytics - Real Demo with Gemini 3 Flash</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/react/18.2.0/umd/react.production.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/react-dom/18.2.0/umd/react-dom.production.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/babel-standalone/7.23.5/babel.min.js"></script>
<link href="https://fonts.googleapis.com/css2?family=Google+Sans:wght@400;500;600;700&family=Roboto+Mono:wght@400;500&display=swap" rel="stylesheet">
  1. feature engineering (most important by far)!!!!!
  2. simple models
  3. overfitting leaderboard
  4. ensembling
  • predict the right thing!
  • build pipeline and put something on the leaderboard
  • allocate time to play with data, explore
  • make heavy use of forums
  • understand subtleties of algos, know what tool to use when
@caohy1988
caohy1988 / apache-logs-hive.sql
Created April 20, 2016 03:50 — forked from emk/apache-logs-hive.sql
Apache log analysis with Hadoop, Hive and HBase
-- This is a Hive program. Hive is an SQL-like language that compiles
-- into Hadoop Map/Reduce jobs. It's very popular among analysts at
-- Facebook, because it allows them to query enormous Hadoop data
-- stores using a language much like SQL.
-- Our logs are stored on the Hadoop Distributed File System, in the
-- directory /logs/randomhacks.net/access. They're ordinary Apache
-- logs in *.gz format.
--
-- We want to pretend that these gzipped log files are a database table,
@caohy1988
caohy1988 / read_db.R
Created November 26, 2015 07:13 — forked from ashutoshnanda/read_db.R
This code (in both Python and R) will read from a SQLite database of Hillary Clinton's emails and display basic information about the database.
library(dplyr)
library(RSQLite)
#Set up connection to the SQLite database
connection <- dbConnect(RSQLite::SQLite(), dbname = "clinton.sqlite")
#Print all tables
print("Tables")
all_tables <- dbListTables(connection)
print(all_tables)
@caohy1988
caohy1988 / Crash Course v0.5.ipynb.json
Last active August 29, 2015 14:27 — forked from rpmuller/Crash Course v0.5.ipynb.json
Crash Course in Python for Scientists
This file has been truncated, but you can view the full file.
{
"metadata": {
"name": "",
"signature": "sha256:a04c38d9604adb7eb9ca89860dfa1ef72db66037cc2c07c391ef8e67a31f9254"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
  1. Feature Learning
  1. Deep Learning