Google Summer of Code 2021
Add support for checkpoint/restore UDP socket's queue
In this project, I implemented an UDP socket's send queue and receive queue checkpoint/restore support in CRIU. Besides, I also implemented a small patch to Linux kernel for dumping packet from UDP socket's send queue
Link to the pull request: https://github.com/checkpoint-restore/criu/pull/1571
Link to my kernel patch series: [PATCH 0/2] [PATCH 1/2] [PATCH 2/2]
- For storing the dumped packets, a new protobuf image has been implemented. The image has the following format: UdpQueueEntry at the beginning for storing queues' information, each packet has UdpPacketEntry at the beginning for storing address information and packet's length followed by packet's data.
- Checkpoint/restore send queue: in the kernel patch, I implement a new UDP_REPAIR sockoption to turn on/off socket's repair mode. When repair mode is on, we can use the recvmsg to get the send queue's packet and the destination address of it. However, this patch cannot dump some other information such as TTL, TOS, .. that can be set in cmsg provided when sending that packet. I also created a kernel selftest for this feature. With this patch, we can checkpoint the send queue's packet data and destination address. When restoring, we resend that packet through the checkpointed socket.
- Checkpoint/restore receive queue: when checkpointing the receive queue, the packets in the receive queue are peeked and stored into the protobuf image. When restoring, a temporary local socket is created to send each packet. To modify the source IP addresses of these packets to match the dumped ones, I used
IPV6_PKTINFOfor AF_INET6) and
IP_TRANSPARENTis used so that the kernel does not check whether the source address is from a local interface). The temporary is bound to the same port as the original packet's source port. In order to reduce the port collision, I use
IP_FREEBINDto bind the socket to a non-local (fake) address. However, in case the port is already used to all interfaces, we cannot bind to that port, the packet is dropped.
- 2 zdtm tests are implemented for testing this new feature in AF_INET and AF_INET6 UDP sockets
- The current patch is not accepted by the Linux kernel maintainers because they don't want to have a check in UDP's
revcmsgfast path. I will solve this problem by adding a new option in UDP's getsockopt in the future patch.
Some interesting things
- The default peek offset of a socket is -1, it behaves like 0 but does not change after a MSG_PEEK recvmsg. Therefore, the next MSG_PEEK recvmsg returns exactly the same packet.
- Send a packet to address A with corked UDP socket or with MSG_MORE, after that, send another packet to address B. That packet is eventually concatenated with first packet and sent to address A.
I want to thank my mentors Alexander Mikhalitsyn and Andrei Vagin for being very supportive to me through the project. This project helps me to learn a lot about Linux programming (kernel and userspace) and computer network. I would love to participate in more feature development for CRIU in the future.