Rohit 0x726f

## h11_gist_report.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                0x726f
                / h11_gist_report.md
            
            
              Last active
              May 16, 2026 21:25
            
              
                Enriched KW Search: Simplified EVAL_SYSTEM_PROMPT (H11) - Hill-Climbing Session Report 2026-05-16
              
          
    Enriched KW Search: Hill-Climbing Session Report (2026-05-16)

Improvement Per Eval Surface


Surface
Cases
Baseline (eca0ea20)
H11 (Simplified Prompt)
Delta
Status


dc_evals
156
54.26%
57.80%
+3.54pp
✓ Best so far


gong
26
84.62%
90.40%
+5.78pp
✓✓ Exceeds +5pp


cursor_usage
50
97.33%
98.00%
+0.67pp
~ neutral


github
48
98.61%
99.00%
+0.39pp
~ neutral


## dc_evals_per_case_diff.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                0x726f
                / dc_evals_per_case_diff.md
            
            
              Created
              May 16, 2026 02:35
            
              
                Per-case DC eval diff for Rippling/rippling-bots#12699 — Snowflake usage signal + bucket fix
              
          
    DC Evals — Per-Case Improvements: Snowflake Usage Signal + Bucket Fix

PR: Rippling/rippling-bots#12699
Setup: 156 DC eval cases, 3 replicates per arm at concurrency 5.

Baseline: production main, no SF usage signal, original bucket logic.

Variant: ships SF usage as [queried_in_warehouse:N] metadata tag exposed to the LLM reranker, plus a one-bullet prompt change explaining the two signals are complementary, plus a bucket-logic fix that lets dbt-output tables (derived_dataset_mart_/int_/rpt_/stg_*) consume the analyst-bucket cap so warehouse + connectors get a fair share of the source-bucket cap.
Headline


## report_full.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                0x726f
                / report_full.md
            
            
              Last active
              May 15, 2026 21:56
            
              
                Snowflake usage data → Enriched keyword search: empirical impact on __e eval recall
              
          
    Snowflake Usage Data → Enriched Keyword Search: Empirical Impact on __e Eval Recall

TL;DR

Adding our newly-built Snowflake usage data as a separate, lower-weighted multiplier in the enriched keyword-search ranker (alongside the existing Rippling-internal usage_count) produces a large recall lift on eval cases that reference __e tables:


Metric
Current ranker
+ Snowflake (α=0.25)


Targets in top 20
3 / 75
29 / 75


Targets in top 50
13 / 75
31 / 75
Surface	Cases	Baseline (eca0ea20)	H11 (Simplified Prompt)	Delta	Status
dc_evals	156	54.26%	57.80%	+3.54pp	✓ Best so far
gong	26	84.62%	90.40%	+5.78pp	✓✓ Exceeds +5pp
cursor_usage	50	97.33%	98.00%	+0.67pp	~ neutral
github	48	98.61%	99.00%	+0.39pp	~ neutral
Metric	Current ranker	+ Snowflake (α=0.25)
Targets in top 20	3 / 75	29 / 75
Targets in top 50	13 / 75	31 / 75