The limiting factor of erl-dns QPS is in the loop around reading a UDP packet and getting it to a worker, as this is currently executed in a synchronous loop. Workers are asynchronous, so once they have the packet the next UDP packet can be read.
Reading a packet in Erlang is reported to take 10s of microseconds. Casting the packet to the worker takes about 100 microseconds. Getting a worker out of the worker queue is currently the slowest part of the process, taking around 1 millisecond. Originally I was using Poolboy, however pulling workers off the queue was taking 4 to 6 milliseconds.
The approach I am testing now is to have multiple processes reading from the same UDP socket. Under no inbound traffic this strategy does not work too well because there are a lot of ealready errors since the socket is blocking. My current theory is that under high traffic loads these will go away since there will be a constant stream of requests, however it's possible that the number of workers listening to the socket wil