Skip to content

Instantly share code, notes, and snippets.

@danderson
Created October 7, 2019 01:16
Show Gist options
  • Save danderson/664bf95f372acf106982bcc29ff56b53 to your computer and use it in GitHub Desktop.
Save danderson/664bf95f372acf106982bcc29ff56b53 to your computer and use it in GitHub Desktop.
NAT64 all in kernel mode

I'm writing this up as a gist, because I'm not sure I'll pursue any of it, but it seems worth writing down.

A basic problem with IPv6-only LANs at the moment is that you still need support for legacy protocol servers, which implies you doing some kind of DNS64 and NAT64 (IPv6-to-IPv4 stateful translation).

Currently, NAT64 only exists outside the kernel, as userspace software. This is not ideal because it limits available performance (since you have to keep bouncing between user space and kernel space), requires reimplementing a lot of things that already exist in the kernel (e.g. conntrack logic), and isn't composable with other existing kernel subsystems to implement flexible, mad network things.

Short version

Implement a virtual interface type that does SIIT. Combining such an interface with existing NAT66 and connmark logic, you can implement all kinds of fancy NAT64 setups, all in-kernel.

Background

The assumed topology here is that we're a dual-stack router, trying to provide legacy IP service to a v6-only LAN. In other words:

LAN (v6 only) <--> Router (dual-stack) <.-> Modern Internet
                                        |
                                        `-> Legacy Internet

Let's say the LAN is 2001:db8:cafe::/64. The router is 2001:db8:cafe::1 on the LAN, and 2001:db8:deed::1729 and 192.0.2.16 on the WAN.

We assume that the LAN already has access to a DNS server that does DNS64, translating A records into AAAA records using IPv4-translated addresses. In other words, a response A 1.2.3.4 is reaching the LAN clients as AAAA ::ffff:0:1.2.3.4.

Packets arriving at the router from the LAN are therefore all native IPv6. Most will have native v6 source and destination addresses:

src: 2001:db8:cafe::f4bc
dst: 2001:db8:f00f::42

However, legacy sites will have been mapped by DNS64, and the packets will look like:

src: 2001:db8:cafe::1
dst: ::ffff:0:1.2.3.4

Our mission is to get the latter packets out to the WAN looking like this:

src: 192.0.2.16
dst: 1.2.3.4

And we need enough conntrack information to translate it all back to the v6 packet on the way back in.

The SIIT interface

This is the one new piece of code we need. It's a new virtual interface type (similar to gre, ipip, tun, ...) for the linux kernel. Call it siit0. It acts as a zero-configuration hairpin SIIT translator, doing the following to packets it receives:

  • IPv4 packets are rewritten to IPv6 (using IPv4-translated addresses).
  • IPv6 packets, whose source and destination IPs are both IPv4-translated addresses, are rewritten to IPv4 and input back into the network stack.
  • Other IPv6 packets are dropped.

Of note is that this interface does not carry any state at all. Some packets, which match its constraints, are statelessly rewritten and returned to the kernel as-if transmitted by some remote host. Packets which cannot sensibly be translated are simply dropped. It's on the system administrator to only route "good" packets to this interface.

What can we build with this?

Turns out, this is the one missing piece that lets us implement a decent variety of NAT64 setups, under different operating conditions.

Home router

The basic home router case is: single IPv4 WAN IP, and some IPv6 subnet routed for LAN use.

We receive on lan0 the following packet:

src: 2001:db8:cafe::f4bc
dst: ::ffff:0:1.2.3.4

To get it out to the legacy internet, we need one additional thing: an extra IPv4 address that will remain entirely local to the router. Some locally-unused RFC1918 address will do the trick, say 192.168.254.254. We add the following configuration:

ip -6 route add ::ffff:0:0:0/96 dev siit0
ip6tables -t nat -A POSTROUTING -o siit0 -j SNAT --to-source 192.168.254.254

ip route add 192.168.254.254/32 dev siit0
ip route add default via 192.0.2.1
iptables -t nat -A POSTROUTING -o wan0 -j MASQUERADE

(If you already have IPv4 configured and routable, the latter parts might already be done. I'm assuming you have no IPv4 routing configuration at all, so the router can speak IPv4 but nothing else can)

With this configuration, the outbound packet path is as follows:

  • Packet arrives on lan0, source 2001:db8:cafe::f4bc, destination ::ffff:0:1.2.3.4.
  • Route lookup matches ::ffff:0:0:0/96, forwards the packet to siit0.
  • Netfilter rule does NAT66, changes the packet source to ::ffff:0:192.168.254.254.
  • siit0 translates the packet from IPv6 to IPv4, and bounces it back to the kernel.
  • Packet arrives on siit0, source 192.168.254.254 and destination 1.2.3.4.
  • Route lookup matches 0.0.0.0/0, forwards the packet to wan0.
  • Netfilter rule does NAT44, changes the packet source to 192.0.2.16.

The return packet path is:

  • Packet arrives on wan0, source 1.2.3.4, destination 192.0.2.16.
  • Conntrack rewrites the destination IP to 192.168.254.254.
  • Route lookup matches 192.168.254.254/32, forwards the packet to siit0.
  • siit0 translates the packet from IPv4 to IPv6, and bounces it back to the kernel.
  • Packet arrives on siit0, source ::ffff:0:1.2.3.4, destination ::ffff:0:192.168.254.254.
  • Conntrack rewrites the destination IP to 2001:db8:cafe::f4bc.
  • Route lookup matches 2001:db8:cafe::/64, forwards the packet to lan0.

This setup requires a double NAT: NAT66 before SIIT, and NAT44 afterwards. This is because the WAN IP 192.0.2.16 is assigned to wan0. If we didn't do NAT44, the first route lookup on the return path would get a hit in the magic "local" routing table (ip route show table 0), and divert the packet into the "destination is local" codepath, rather than the forwarding codepath.

Small Business Router

This setup is similar to the home router one, except that now we don't have a single IPv4 WAN address, but two, 192.0.2.16 and 192.0.2.17.

In this setup, we can simplify a little bit and dispense with the NAT44 stage, by using 192.0.2.17 as the "intermediate" IP for SIIT. In effect, we turn 192.0.2.17 into a SIIT translator and gateway, from the ISP's point of view. The configuration is similar, but simpler:

ip -6 route add ::ffff:0:0:0/96 dev siit0
ip6tables -t nat -A POSTROUTING -o siit0 -j SNAT --to-source 192.0.2.17

ip route add 192.0.2.17/32 dev siit0
ip route add default via 192.0.2.1

With this configuration, the outbound packet path is as follows:

  • Packet arrives on lan0, source 2001:db8:cafe::f4bc, destination ::ffff:0:1.2.3.4.
  • Route lookup matches ::ffff:0:0:0/96, forwards the packet to siit0.
  • Netfilter rule does NAT66, changes the packet source to ::ffff:0:192.0.2.17.
  • siit0 translates the packet from IPv6 to IPv4, and bounces it back to the kernel.
  • Packet arrives on siit0, source 192.0.2.17 and destination 1.2.3.4.
  • Route lookup matches 0.0.0.0/0, forwards the packet out wan0.

The return packet path is:

  • Packet arrives on wan0, source 1.2.3.4, destination 192.0.2.17.
  • Route lookup matches 192.0.2.17/32, forwards the packet to siit0.
  • siit0 translates the packet from IPv4 to IPv6, and bounces it back to the kernel.
  • Packet arrives on siit0, source ::ffff:0:1.2.3.4, destination ::ffff:0:192.0.2.17.
  • Conntrack rewrites the destination IP to 2001:db8:cafe::f4bc.
  • Route lookup matches 2001:db8:cafe::/64, forwards the packet to lan0.

Enterprise NAT

Again the setup is similar, but with more IPs: this time we own all of 192.0.2.128/25 on the WAN side.

In this setup, we can alter the NAT66 step to use the entire IP range as possible source addresses, to expand our conntrack abilities beyond a single IP:

ip -6 route add ::ffff:0:0:0/96 dev siit0
ip6tables -t nat -A POSTROUTING -o siit0 -j SNAT --to-source 192.0.2.128-192.0.2.255

ip route add 192.0.2.128/25 dev siit0
ip route add default via 192.0.2.1

The packet path and return path is similar to the above, just with more IPs.

Conclusion

All the setups above are notionally very similar: the siit0 interface acts like some remote SIIT 1:1 translator box. It's slightly more constrained in that the incoming v6 packets must already be in the v4-translated format (i.e. the SIIT box doesn't implement stateful NAT), so we have to do the NAT layer ourselves. But it shows that a small addition to the kernel would enable all major NAT64 scenarios using existing linux network features.

Other ideas

As Netfilter modules

Rather than implement the translation as an interface, maybe we could do it as a netfilter module. Do NAT66 as above, and then as the very last step in nat/POSTROUTING, do the stateless translation right before plopping the packet onto the wire.

Similarly on the input end, in raw/PREROUTING, statelessly translate back to IPv6 and then let the entire kernel live happily as v6-only.

This is cleaner in terms of how much of the system has to care about IPv4. However, it'll interfere with normal IPv4 operation on the WAN link, for things like DHCP and other ISP configuration protocols that are assigning the IPv4 in the first place. It'd work for the SMB and Enterprise cases above, but not for the minimal "single WAN IP" case.

As an XFRM

Similar to the netfilter idea, the XFRM framework acts late in the packet processing stage, and could do the SIIT translation. The same problems exist as with the netfilter implementation.

bpfilter something something???

The more programmable the kernel network stack, obviously the more fancy we can be here. I don't know enough about bpfilter to know.

@artizirk
Copy link

artizirk commented Dec 5, 2023

by a chance i stumbled upon a bpf nat64 implementation https://github.com/xdp-project/bpf-examples/tree/master/nat64-bpf

@cwmos
Copy link

cwmos commented Feb 25, 2024

What about supporting a Linux light weight tunnel? Then it would be possible to attach the NAT directly to a route. See the "encap" option in "man ip route".

Another option would be create an eBPF version.Then it can be used without being upstreamed to the kernel. I think Cilium has something like this https://github.com/cilium/cilium/blob/main/bpf/lib/nat_46x64.h

@lemmi
Copy link

lemmi commented May 4, 2024

I have an almost complete implentation in XDP. It's only missing the translation of the inner packet in ICMP errors correctly (which no other eBPF examples in the wild seemed to care about so far). Otherwise it already works well.
I also tried using eBPF via LWTs, but LWT eBPF programs have very limited access to bpf helpers. You can't change protocol from IPv4 to IPv6 and back and adjust room for the headers. Crucially, bpf_skb_adjust_room is missing from https://elixir.bootlin.com/linux/v6.8.9/source/net/core/filter.c#L8456. I think adding this to the available function should almost be enough to write an SIIT implementation for LWTs.
I might just have miss something, but IIRC using bpf_skb_change_head did not work.

@telmich
Copy link

telmich commented May 13, 2024

@lemmi that is amazing! I've done a master thesis about NAT64 in hardware (FPGA) and will try to checkout your implementation in the next weeks.

Regarding the ICMP codes, some are rather straight forward (echo request/echo reply), some are a bit IPv6 specific (such as NDP) and some actual need a mapping table.

I think both Jool and Tayga have those implemented, so either of them can be used as an inspiration (I actually did not translate ICMP in my thesis, afair).

@artizirk
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment