ShadowJonathan/haproxy.cfg

## readme.md

      
    Raw
  

              readme.md
            
          
    Here I provide some fediverse config suggestions that will;

prioritise user traffic
deprioritise fediverse traffic

This is important, because fediverse traffic is VERY bursty, it will idle at 20req/s, and suddenly peak to 100 or 150 when a post or user needs to be resolved as someone with tons of followers will boost a post.
Fediverse traffic is resillient and will back off and retry, but during these stampedes, user request latency is affected, as (depending on how your server is setup), your server will try to process all of these requests at once, or put user requests in a queue among the hundreds of fediverse ones.
As such, there should be a way to deprioritise fediverse traffic and "put them on timeout", while prioritising user traffic, right?
Well, the simplest thing would be to tag and rate-limit federation traffic, which is fairly simple in nginx:

Lots of Nginx code
map $http_accept $has_ap_accept {
    default                         0;

    ~*application/activity\+json    1;
    ~*application/ld\+json          1;
    ~*application/json\+activitypub 1;
}

map $http_content_type $has_ap_ct {
    default                         0;

    ~*application/activity\+json    1;
    ~*application/ld\+json          1;
    ~*application/json\+activitypub 1;
}

map $request_uri $is_wellknown {
    default                     0;

    "~^/.well-known/"  1;
}

# Fediverse has a number of ways to identify if its its own traffic
map "$has_ap_accept:$has_ap_ct:$is_wellknown" $is_ap {
  # reverse, to not have to list out 3 cases
  default   1;

  "0:0:0"   0;
}

map $is_ap $ap_limit_key {
    0 "";
    1 "AP";
}

# Adjust the "rate" number as needed.
limit_req_zone $ap_limit_key zone=activitypub:10m rate=50r/s;

server {

  # ...
  
  location / {
  
    # bit of archaic magic here, but simply;
    # - "burst" (here) is the amount of requests nginx will delay by the rate limit (positioning them evenly-spaced in time)
    #   before nginx will start responding with 503s. In this case, its 500 (10-seconds burst worth of requests).
    # - "delay" (here) is the threshold of requests per second nginx will *not* delay and space evenly,
    #   passing them through immidiately.
    # 
    # Phrased otherwise: When starting from an empty rate-limit bucket,
    # nginx will send 25 requests through immidiately before spacing the rest evenly in time,
    # until it hits 500 outstanding requests, then it'll respond with 503s (configurable).
    
    # 50 (rps) * 10 (sec) = 500
    limit_req zone=activitypub burst=500 delay=25;
  
    # ...
    
   }
}
  

However, the rate-limit of federation requests you're able to handle before falling over is hard to find and tune, especially since user requests are still variable.
As such - while much more complicated - it would be much more stable to have a queue mechanism that would;

be knowledgable of how many outstanding requests can be sent to the backend server, and not exceed that
put the remaining requests in a queue
sort that queue based on what kind of request it is (user, federation, or other)

It turns out that such a mechanism exists, haproxy's queueing mechanism, and http-request set-priority-class.
Unfortunately, haproxy currently does not seem to have a proper certbot plugin, which means that auto-renewal would be a pain in the ass with it.
Fortunately, we still have nginx. And I just so happen to become proficient with making and manipulating variables in nginx, by abusing the map rule.
So, currently the best solution I have on hand is;

A way to tag and categorise traffic in nginx
A way to queue that traffic in haproxy

And the rest of the config snippets are all about that.

  
## haproxy.cfg
# Place this after your default configuration

frontend fedi_fe
	mode http

  # this is the address that haproxy will listen on,
  # when doing this local loopback, its recommended to set to localhost (127.0.0.1)
	bind 127.0.0.1:28080

	default_backend fedi_fe

backend fedi_fe
	mode http
	balance roundrobin

  # to make sure that requests longer than this get removed from the queue
	timeout queue 60s

  # here we assign each variable (traffic_*) with a boilerplate header match
	acl traffic_ap_g req.hdr(x-traffic-class) -m str AP_G
	acl traffic_ap_p req.hdr(x-traffic-class) -m str AP_P
	acl traffic_ua_g req.hdr(x-traffic-class) -m str UA_G
	acl traffic_ua_p req.hdr(x-traffic-class) -m str UA_P
	acl traffic_anon req.hdr(x-traffic-class) -m str ANON
	acl traffic_admi req.hdr(x-traffic-class) -m str ADMI

  # here we prioritise the traffic.
  # the lower the number, the higher the priority.
  #
  # here, we prioritise traffic so that, when there's a queue of requests,
  # the user API POSTs (favs, boosts, post submissions) get processed FIRST,
  # before user API GETs (timeline, notifs, posts themselves),
  # before anonymous requests (HTML, unauthenticated API, media),
  # before federation POSTs (sending activitypub activities to inboxes, such as new posts, deletes, updates, edits, reports, etc.),
  # before federation GETs (getting data about an activity, post, user, etc.)
  #
  # as such, when the queue gets backed up, the federation GETs are the FIRST that will get delayed in favour of the other requests getting processed first, and then the federation POSTs, and then the anonymous requests, etc.
  # practically, it should never reach the federation POSTs, as federation GETs are the simple-most bursty traffic, so those get put into timeout first (and will always gracefully retry).
	http-request set-priority-class int(2) if traffic_ap_g
	http-request set-priority-class int(1) if traffic_ap_p
	http-request set-priority-class int(0) if traffic_anon
	http-request set-priority-class int(-1) if traffic_ua_g
	http-request set-priority-class int(-2) if traffic_ua_p

	http-request set-priority-class int(-10) if traffic_admi

  # change "maxconn" to the maximum amount of requests you'd want processed by your fediverse software at any time.
  #
  # some hints:
  # - use the amount of web workers you've set each server to, usually each will spawn a thread and process one connection,
  #   here it is a good idea to match that 1:1
  # - for mastodon, use WEB_CONCURRENCY (default 2) * MAX_THREADS (default 5) for the number
  # - for akkoma, use 40 * cpu count for a good ballpark, tweak when you get extreme slowdowns
	server local_fedi_server 127.0.0.1:4000 maxconn 75 check

## nginx.conf
map $http_accept $has_ap_accept {
    default                         0;

    ~*application/activity\+json    1;
    ~*application/ld\+json          1;
    ~*application/json\+activitypub 1;
}

map $http_content_type $has_ap_ct {
    default                         0;

    ~*application/activity\+json    1;
    ~*application/ld\+json          1;
    ~*application/json\+activitypub 1;
}

map $request_uri $is_wellknown {
    default                     0;

    "~^/.well-known/"  1;
}

map "$has_ap_accept:$has_ap_ct:$is_wellknown" $is_ap {
  # reverse, to not have to list out 3 cases
  default   1;

  "0:0:0"   0;
}

geo $admin_trusted {
  default               0;

  # insert your IP address here with a "1" to get high priority access, like so;
  # <ip>  1;

  # you can also add a whole range;
  # <ip>/<range> 1;
}

map "$admin_trusted" $admin_with_enabled {
  # toggle this to "1 1;" when you want to enable admin priority access,
  # keep this off regularly so you can *cough* experience server slowdowns 'like everyone else' :)
  1 0;

  default 0;
}

map $http_authorization $has_auth {
  default 1;
  ''      0;
}

map $request_method $is_send_method {
   PUT     1;
   POST    1;
   PATCH   1;

   default 0;
}

map "$admin_with_enabled:$is_ap:$is_send_method:$has_auth" $traffic_class {
  ~^1           "ADMI";

  ~^0:0:0:1     "UA_G";
  ~^0:0:1:1     "UA_P";
  ~^0:1:0       "AP_G";
  ~^0:1:1       "AP_P";
  # if you categorise any other traffic class, put them here.
# ~^0:.:.:1     "WHAT"

  default       "ANON"; # HTML and unclassified
}

server {
  # to signal to haproxy what kind of traffic it is
  proxy_set_header X-Traffic-Class $traffic_class;

  # ...

  location / {
    # ...

    # point this at your local haproxy instance, with the port in the "frontend" section of that config

    proxy_pass http://127.0.0.1:28080;
  }
}
	# Place this after your default configuration

	frontend fedi_fe
	mode http

	# this is the address that haproxy will listen on,
	# when doing this local loopback, its recommended to set to localhost (127.0.0.1)
	bind 127.0.0.1:28080

	default_backend fedi_fe

	backend fedi_fe
	mode http
	balance roundrobin

	# to make sure that requests longer than this get removed from the queue
	timeout queue 60s

	# here we assign each variable (traffic_*) with a boilerplate header match
	acl traffic_ap_g req.hdr(x-traffic-class) -m str AP_G
	acl traffic_ap_p req.hdr(x-traffic-class) -m str AP_P
	acl traffic_ua_g req.hdr(x-traffic-class) -m str UA_G
	acl traffic_ua_p req.hdr(x-traffic-class) -m str UA_P
	acl traffic_anon req.hdr(x-traffic-class) -m str ANON
	acl traffic_admi req.hdr(x-traffic-class) -m str ADMI

	# here we prioritise the traffic.
	# the lower the number, the higher the priority.
	#
	# here, we prioritise traffic so that, when there's a queue of requests,
	# the user API POSTs (favs, boosts, post submissions) get processed FIRST,
	# before user API GETs (timeline, notifs, posts themselves),
	# before anonymous requests (HTML, unauthenticated API, media),
	# before federation POSTs (sending activitypub activities to inboxes, such as new posts, deletes, updates, edits, reports, etc.),
	# before federation GETs (getting data about an activity, post, user, etc.)
	#
	# as such, when the queue gets backed up, the federation GETs are the FIRST that will get delayed in favour of the other requests getting processed first, and then the federation POSTs, and then the anonymous requests, etc.
	# practically, it should never reach the federation POSTs, as federation GETs are the simple-most bursty traffic, so those get put into timeout first (and will always gracefully retry).
	http-request set-priority-class int(2) if traffic_ap_g
	http-request set-priority-class int(1) if traffic_ap_p
	http-request set-priority-class int(0) if traffic_anon
	http-request set-priority-class int(-1) if traffic_ua_g
	http-request set-priority-class int(-2) if traffic_ua_p

	http-request set-priority-class int(-10) if traffic_admi

	# change "maxconn" to the maximum amount of requests you'd want processed by your fediverse software at any time.
	#
	# some hints:
	# - use the amount of web workers you've set each server to, usually each will spawn a thread and process one connection,
	# here it is a good idea to match that 1:1
	# - for mastodon, use WEB_CONCURRENCY (default 2) * MAX_THREADS (default 5) for the number
	# - for akkoma, use 40 * cpu count for a good ballpark, tweak when you get extreme slowdowns
	server local_fedi_server 127.0.0.1:4000 maxconn 75 check
	map $http_accept $has_ap_accept {
	default 0;

	~*application/activity\+json 1;
	~*application/ld\+json 1;
	~*application/json\+activitypub 1;
	}

	map $http_content_type $has_ap_ct {
	default 0;

	~*application/activity\+json 1;
	~*application/ld\+json 1;
	~*application/json\+activitypub 1;
	}

	map $request_uri $is_wellknown {
	default 0;

	"~^/.well-known/" 1;
	}

	map "$has_ap_accept:$has_ap_ct:$is_wellknown" $is_ap {
	# reverse, to not have to list out 3 cases
	default 1;

	"0:0:0" 0;
	}

	geo $admin_trusted {
	default 0;

	# insert your IP address here with a "1" to get high priority access, like so;
	# <ip> 1;

	# you can also add a whole range;
	# <ip>/<range> 1;
	}

	map "$admin_trusted" $admin_with_enabled {
	# toggle this to "1 1;" when you want to enable admin priority access,
	# keep this off regularly so you can cough experience server slowdowns 'like everyone else' :)
	1 0;

	default 0;
	}

	map $http_authorization $has_auth {
	default 1;
	'' 0;
	}

	map $request_method $is_send_method {
	PUT 1;
	POST 1;
	PATCH 1;

	default 0;
	}

	map "$admin_with_enabled:$is_ap:$is_send_method:$has_auth" $traffic_class {
	~^1 "ADMI";

	~^0:0:0:1 "UA_G";
	~^0:0:1:1 "UA_P";
	~^0:1:0 "AP_G";
	~^0:1:1 "AP_P";
	# if you categorise any other traffic class, put them here.
	# ~^0:.:.:1 "WHAT"

	default "ANON"; # HTML and unclassified
	}

	server {
	# to signal to haproxy what kind of traffic it is
	proxy_set_header X-Traffic-Class $traffic_class;

	# ...

	location / {
	# ...

	# point this at your local haproxy instance, with the port in the "frontend" section of that config

	proxy_pass http://127.0.0.1:28080;
	}
	}