Classify every prompt locally with Gemma 3, then route to the cheapest model that can handle it. Cuts Claude API spend by ~79% on a typical workload.
Built for the AIFirewire video: "I Cut My Claude API Bill 79% With 60 Lines of Python"
Every Claude tutorial pushes you to hit Sonnet for everything. Sonnet 4.6 is $15 per million output tokens. Haiku 4.5 is $1. Local Gemma 3 is $0.