Skip to content

Instantly share code, notes, and snippets.

View 0x726f's full-sized avatar

Rohit 0x726f

View GitHub Profile
@0x726f
0x726f / h11_gist_report.md
Last active May 16, 2026 21:25
Enriched KW Search: Simplified EVAL_SYSTEM_PROMPT (H11) - Hill-Climbing Session Report 2026-05-16

Enriched KW Search: Hill-Climbing Session Report (2026-05-16)

Improvement Per Eval Surface

Surface Cases Baseline (eca0ea20) H11 (Simplified Prompt) Delta Status
dc_evals 156 54.26% 57.80% +3.54pp ✓ Best so far
gong 26 84.62% 90.40% +5.78pp ✓✓ Exceeds +5pp
cursor_usage 50 97.33% 98.00% +0.67pp ~ neutral
github 48 98.61% 99.00% +0.39pp ~ neutral
@0x726f
0x726f / dc_evals_per_case_diff.md
Created May 16, 2026 02:35
Per-case DC eval diff for Rippling/rippling-bots#12699 — Snowflake usage signal + bucket fix

DC Evals — Per-Case Improvements: Snowflake Usage Signal + Bucket Fix

PR: Rippling/rippling-bots#12699

Setup: 156 DC eval cases, 3 replicates per arm at concurrency 5.
Baseline: production main, no SF usage signal, original bucket logic.
Variant: ships SF usage as [queried_in_warehouse:N] metadata tag exposed to the LLM reranker, plus a one-bullet prompt change explaining the two signals are complementary, plus a bucket-logic fix that lets dbt-output tables (derived_dataset_mart_/int_/rpt_/stg_*) consume the analyst-bucket cap so warehouse + connectors get a fair share of the source-bucket cap.

Headline

@0x726f
0x726f / report_full.md
Last active May 15, 2026 21:56
Snowflake usage data → Enriched keyword search: empirical impact on __e eval recall

Snowflake Usage Data → Enriched Keyword Search: Empirical Impact on __e Eval Recall

TL;DR

Adding our newly-built Snowflake usage data as a separate, lower-weighted multiplier in the enriched keyword-search ranker (alongside the existing Rippling-internal usage_count) produces a large recall lift on eval cases that reference __e tables:

Metric Current ranker + Snowflake (α=0.25)
Targets in top 20 3 / 75 29 / 75
Targets in top 50 13 / 75 31 / 75