Hai-Yuan Cao caohy1988

## 07_information_schema_ci_cost.sql
-- Section 6 INFORMATION_SCHEMA cost pivot from Medium post #2:
-- "Your Agent Events Table Is Also a Test Suite"
--
-- The BigQuery Agent Analytics SDK labels every query it issues
-- with the feature that triggered it. This pivot groups BQ jobs
-- from the last 24 hours by `sdk_feature`, so you can see what the
-- CI gate cost in BQ compute and what the developer trace-reads
-- after a failing run cost on top of that.
--
-- Swap `region-us` for the region your dataset lives in

## 06_categorical_eval_metrics_and_gate.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                caohy1988
                / 06_categorical_eval_metrics_and_gate.md
            
            
              Created
              April 24, 2026 21:04
            
              
                Medium post #2 — section 5 categorical CI gate: metrics.json + one command
              
          
    Categorical CI gate — metrics.json + one command

Companion to Medium post #2, section 5
("Going deeper — add a categorical gate").
Requires bigquery-agent-analytics >= 0.2.2, which ships
categorical-eval --exit-code --pass-category --min-pass-rate.
1. Metrics file


## 05_evaluate_thresholds_workflow.yml
# .github/workflows/evaluate_thresholds.yml
#
# Section 4 reference workflow from Medium post #2:
# "Your Agent Events Table Is Also a Test Suite"
#
# Four deterministic gates run against the last 24 hours of
# production traces every time a PR touches agent code or prompts.
# Each gate is its own step so a red status tells you which budget
# regressed.
#

## 04_evaluate_exit_code_one_liner.sh
#!/usr/bin/env bash
# Section 3 hero command from Medium post #2:
# "Your Agent Events Table Is Also a Test Suite"
#
# Runs the deterministic latency gate against the last 24 hours of
# production traces. Exit 0 = all sessions within budget; exit 1 =
# at least one session regressed; exit 2 = configuration error.
#
# The SDK's `evaluate --exit-code` path also prints one readable
# FAIL session=... observed=... budget=... line on stderr per failing

## example_demo.html
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>BigQuery Agent Analytics - Real Demo with Gemini 3 Flash</title>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/react/18.2.0/umd/react.production.min.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/react-dom/18.2.0/umd/react-dom.production.min.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/babel-standalone/7.23.5/babel.min.js"></script>
  <link href="https://fonts.googleapis.com/css2?family=Google+Sans:wght@400;500;600;700&family=Roboto+Mono:wght@400;500&display=swap" rel="stylesheet">

## gist:d22a60a5b2e4e9cc54f526b0c4fb1e94

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                caohy1988
                / gist:d22a60a5b2e4e9cc54f526b0c4fb1e94
            
            
              Created
              May 22, 2016 04:17
                — forked from codinguncut/gist:c4359d9bc6f36549b625
            
              
                kaggle collection
              
          
    Competitive Machine Learning


feature engineering (most important by far)!!!!!
simple models
overfitting leaderboard
ensembling


predict the right thing!
build pipeline and put something on the leaderboard
allocate time to play with data, explore
make heavy use of forums
understand subtleties of algos, know what tool to use when


## apache-logs-hive.sql
-- This is a Hive program. Hive is an SQL-like language that compiles
-- into Hadoop Map/Reduce jobs. It's very popular among analysts at
-- Facebook, because it allows them to query enormous Hadoop data
-- stores using a language much like SQL.

-- Our logs are stored on the Hadoop Distributed File System, in the
-- directory /logs/randomhacks.net/access.  They're ordinary Apache
-- logs in *.gz format.
--
-- We want to pretend that these gzipped log files are a database table,

## read_db.R
library(dplyr)
library(RSQLite)

#Set up connection to the SQLite database
connection <- dbConnect(RSQLite::SQLite(), dbname = "clinton.sqlite")

#Print all tables
print("Tables")
all_tables <-  dbListTables(connection)
print(all_tables)

## Crash Course v0.5.ipynb.json
{
 "metadata": {
  "name": "",
  "signature": "sha256:a04c38d9604adb7eb9ca89860dfa1ef72db66037cc2c07c391ef8e67a31f9254"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {

## gist:d8df3cf8be2c539f798d

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                caohy1988
                / gist:d8df3cf8be2c539f798d
            
            
              Last active
              August 29, 2015 14:18
                — forked from debasishg/gist:b4df1648d3f1776abdff
            
          
Feature Learning


Learning Feature Representations with K-means by Adam Coates and Andrew Y. Ng
The devil is in the details: an evaluation of recent feature encoding methods by Chatfield et. al.
Emergence of Object-Selective Features in Unsupervised Feature Learning by Coates, Ng
Scaling Learning Algorithms towards AI Benjio & LeCun


Deep Learning


Dropout: A Simple Way to Prevent Neural Networks from Overfitting by Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov
[Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/gl
	-- Section 6 INFORMATION_SCHEMA cost pivot from Medium post #2:
	-- "Your Agent Events Table Is Also a Test Suite"
	--
	-- The BigQuery Agent Analytics SDK labels every query it issues
	-- with the feature that triggered it. This pivot groups BQ jobs
	-- from the last 24 hours by `sdk_feature`, so you can see what the
	-- CI gate cost in BQ compute and what the developer trace-reads
	-- after a failing run cost on top of that.
	--
	-- Swap `region-us` for the region your dataset lives in
	# .github/workflows/evaluate_thresholds.yml
	#
	# Section 4 reference workflow from Medium post #2:
	# "Your Agent Events Table Is Also a Test Suite"
	#
	# Four deterministic gates run against the last 24 hours of
	# production traces every time a PR touches agent code or prompts.
	# Each gate is its own step so a red status tells you which budget
	# regressed.
	#
	#!/usr/bin/env bash
	# Section 3 hero command from Medium post #2:
	# "Your Agent Events Table Is Also a Test Suite"
	#
	# Runs the deterministic latency gate against the last 24 hours of
	# production traces. Exit 0 = all sessions within budget; exit 1 =
	# at least one session regressed; exit 2 = configuration error.
	#
	# The SDK's `evaluate --exit-code` path also prints one readable
	# FAIL session=... observed=... budget=... line on stderr per failing
	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>BigQuery Agent Analytics - Real Demo with Gemini 3 Flash</title>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/react/18.2.0/umd/react.production.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/react-dom/18.2.0/umd/react-dom.production.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/babel-standalone/7.23.5/babel.min.js"></script>
	<link href="https://fonts.googleapis.com/css2?family=Google+Sans:wght@400;500;600;700&family=Roboto+Mono:wght@400;500&display=swap" rel="stylesheet">
	-- This is a Hive program. Hive is an SQL-like language that compiles
	-- into Hadoop Map/Reduce jobs. It's very popular among analysts at
	-- Facebook, because it allows them to query enormous Hadoop data
	-- stores using a language much like SQL.

	-- Our logs are stored on the Hadoop Distributed File System, in the
	-- directory /logs/randomhacks.net/access. They're ordinary Apache
	-- logs in *.gz format.
	--
	-- We want to pretend that these gzipped log files are a database table,
	library(dplyr)
	library(RSQLite)

	#Set up connection to the SQLite database
	connection <- dbConnect(RSQLite::SQLite(), dbname = "clinton.sqlite")

	#Print all tables
	print("Tables")
	all_tables <- dbListTables(connection)
	print(all_tables)
	{
	"metadata": {
	"name": "",
	"signature": "sha256:a04c38d9604adb7eb9ca89860dfa1ef72db66037cc2c07c391ef8e67a31f9254"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{