I have some early benchmark results for our work on a high performance NATS server in Go.
Quick Summary:
We can process ~2M msgs/sec through the system, and the ingress and egress are fairly well balanced.
The basics of the architecture are intelligent buffering and IO calls, fast hashing algorithms and subject distributor/routing, and a zero-allocation hand-written protocol parser.
In addition, I used quite a bit of inlining to avoid function overhead, no use of defer, and little to no object allocation within the fast path. I will share more details and the code at a future date.