| Surface | Cases | Baseline (eca0ea20) | H11 (Simplified Prompt) | Delta | Status |
|---|---|---|---|---|---|
| dc_evals | 156 | 54.26% | 57.80% | +3.54pp | ✓ Best so far |
| gong | 26 | 84.62% | 90.40% | +5.78pp | ✓✓ Exceeds +5pp |
| cursor_usage | 50 | 97.33% | 98.00% | +0.67pp | ~ neutral |
| github | 48 | 98.61% | 99.00% | +0.39pp | ~ neutral |
PR: Rippling/rippling-bots#12699
Setup: 156 DC eval cases, 3 replicates per arm at concurrency 5.
Baseline: production main, no SF usage signal, original bucket logic.
Variant: ships SF usage as [queried_in_warehouse:N] metadata tag exposed to the LLM reranker, plus a one-bullet prompt change explaining the two signals are complementary, plus a bucket-logic fix that lets dbt-output tables (derived_dataset_mart_/int_/rpt_/stg_*) consume the analyst-bucket cap so warehouse + connectors get a fair share of the source-bucket cap.
Adding our newly-built Snowflake usage data as a separate, lower-weighted multiplier in the enriched keyword-search ranker (alongside the existing Rippling-internal usage_count) produces a large recall lift on eval cases that reference __e tables:
| Metric | Current ranker | + Snowflake (α=0.25) |
|---|---|---|
| Targets in top 20 | 3 / 75 | 29 / 75 |
| Targets in top 50 | 13 / 75 | 31 / 75 |