Skip to content

Instantly share code, notes, and snippets.

@haproxytechblog
Last active July 11, 2024 14:30
Show Gist options
  • Save haproxytechblog/8a65a65b4f866f1a68e62b4573bee7a3 to your computer and use it in GitHub Desktop.
Save haproxytechblog/8a65a65b4f866f1a68e62b4573bee7a3 to your computer and use it in GitHub Desktop.
Creating an HAProxy AI Gateway to Control LLM Costs, Security, and Privacy
http-request set-var(txn.openai_key_hash)
log-format-sd "%{+Q,+E}o[request@58750 host=%[var(txn.host)] referer=%[var(txn.referer)] user_agent=%[var(txn.user_agent)]][custom@58750 openai_key_hash=%[var(txn.openai_key_hash)]]"
http_auth_bearer(Authorization),sha2(256),hex
# example denylist.acl
5fd924625a10e0baacdb8
acl blocked_key var(txn.openai_key_hash) -m -i -f /denylist.acl
Http-request deny deny_status 403 if blocked_key
<key hash> <per minute prompt limit>:<per day prompt limit>:<per minute completion limit>:<per day completion limit>
5fd924625a10e0baacdb8 100:200:1000:50000
813490e4ba67813490e4 300:600:2000:30000
backend rates
stick-table rates_prompt_minute.local type string len 128 size 50k expire 1m store gpc(1),gpc_rate(1,60s) peers "$peers_section_name"
stick-table rates_prompt_minute.aggregate type string len 128 size 50k expire 1m store gpc(1),gpc_rate(1,60s) peers "$peers_section_name"
stick-table rates_prompt_day.local type string len 128 size 50k expire 24h store gpc(1),gpc_rate(1,1d) peers "$peers_section_name"
stick-table rates_prompt_day.aggregate type string len 128 size 50k expire 24h store gpc(1),gpc_rate(1,1d) peers "$peers_section_name"
stick-table rates_completion_minute.local type string len 128 size 50k expire 1m store gpc(1),gpc_rate(1,60s) peers "$peers_section_name"
stick-table rates_completion_minute.aggregate type string len 128 size 50k expire 1m store gpc(1),gpc_rate(1,60s) peers "$peers_section_name"
stick-table rates_completion_day.local type string len 128 size 50k expire 24h store gpc(1),gpc_rate(1,1d) peers "$peers_section_name"
stick-table rates_completion_day.aggregate type string len 128 size 50k expire 24h store gpc(1),gpc_rate(1,1d) peers "$peers_section_name"
frontend mysite
http-request set-var(txn.maxrate_min_prompt) var(txn.openai_key_hash),map(/rate-limits.map,0),field(1,:)
http-request set-var(txn.maxrate_day_prompt) var(txn.openai_key_hash),map(rate-limits.map,0),field(2,:)
http-request set-var(txn.maxrate_min_completion) var(txn.openai_key_hash),map(rate-limits.map,0),field(3,:)
http-request set-var(txn.maxrate_day_completion) var(txn.openai_key_hash),map(rate-limits.map,0),field(4,:)
http-request track-sc0 var(txn.openai_key_hash) table rates/rates_prompt_minute.local
http-request track-sc1 var(txn.openai_key_hash) table rates/rates_prompt_minute.aggregate
http-request track-sc2 var(txn.openai_key_hash) table rates/rates_prompt_day.local
http-request track-sc3 var(txn.openai_key_hash) table rates/rates_prompt_day.aggregate
http-request track-sc4 var(txn.openai_key_hash) table rates/rates_completion_minute.local
http-request track-sc5 var(txn.openai_key_hash) table rates/rates_completion_minute.aggregate
http-request track-sc6 var(txn.openai_key_hash) table rates/rates_completion_day.local
http-request track-sc7 var(txn.openai_key_hash) table rates/rates_completion_day.aggregate
http-request set-var(txn.rate_prompt_minute) sc_gpc_rate(0,1)
http-request set-var(txn.rate_prompt_day) sc_gpc_rate(0,3)
http-request set-var(txn.rate_completion_minute) sc_gpc_rate(0,5)
http-request set-var(txn.rate_completion_day) sc_gpc_rate(0,7)
If (Current rate - Maximum rate <= 0) then
Over the limit
http-request deny status 429 if { var(txn.rate_prompt_minute),sub(txn.maxrate_min_prompt) gt 0 }
http-request deny status 429 if { var(txn.rate_prompt_day),sub(txn.maxrate_day_prompt) gt 0 }
http-request deny status 429 if { var(txn.rate_completion_minute),sub(txn.maxrate_min_completion) gt 0 }
http-request deny status 429 if { var(txn.rate_completion_day),sub(txn.maxrate_day_completion) gt 0 }
http-response set-var(txn.prompt_tokens) res.body,json_query('$.usage.prompt_tokens','int')
http-response set-var(txn.completion_tokens) res.body,json_query('$.usage.completion_tokens','int')
http-request sc-add-gpc(0,0) var(txn.smtp.prompt_tokens) if { var(txn.rate_prompt_minute),sub(txn.maxrate_min_prompt) le 0 }
http-request sc-add-gpc(0,2) var(txn.smtp.prompt_tokens) if { var(txn.rate_prompt_day),sub(txn.maxrate_day_prompt) le 0 }
http-request sc-add-gpc(0,4) var(txn.smtp.completion_tokens) if { var(txn.rate_prompt_minute),sub(txn.maxrate_min_completion) le 0 }
http-request sc-add-gpc(0,6) var(txn.smtp.completion_tokens) if { var(txn.rate_prompt_day),sub(txn.maxrate_day_comp.etion) le 0 }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment