Skip to content

Instantly share code, notes, and snippets.

@mcastelino
Last active December 11, 2023 02:16
Show Gist options
  • Star 21 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save mcastelino/7d85f4164ffdaf48242f9281bb1d0f9b to your computer and use it in GitHub Desktop.
Save mcastelino/7d85f4164ffdaf48242f9281bb1d0f9b to your computer and use it in GitHub Desktop.
Using tc redirect to connect a virtual machine to a container network

Connecting a veth device to tap

  • veth device from CNI/CNM plugin: eth0
  • tap device that connects to the VM: tap0

Redirecting traffic between the two devices

tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol all u32 match u8 0 0 action mirred egress redirect dev tap0
tc qdisc add dev tap0 ingress
tc filter add dev tap0 parent ffff: protocol all u32 match u8 0 0 action mirred egress redirect dev eth0

tc qdisc add dev eth0 ingress

  • Add a queuing discipline
  • on dev eth0
  • attach the ingress qdisc Here the handle defaults to ffff:

tc filter add dev eth0 parent ffff: protocol all u32 match u8 0 0 action mirred egress redirect dev tap0

  • Add a filter
  • to device dev eth0
  • to parent (class) handle to which we are attaching, ffff: i.e. ingress which we created before (there is no need for tc class add in the ingress case as it does not support classful queuing discplines).
  • protocol all
  • classifier u32
  • parameters to the classifier u8 0 0, and the first byte of the packet with 0 and if the result is 0 (which it always will be) (i.e. always true)
  • action mirred egress redirect dev eth0, redirect the packet to egress of dev eth0
@mcastelino
Copy link
Author

create a new qdisc called "ingress". qdiscs normally don't work on ingress so this is really a special qdisc that you can consider an "alternate root" for inbound packets
add a new filter, and attach it to node "ffff:". The ID "ffff:" is the fixed ID of the ingress qdisc

we use the "u32" matcher, with arguments "u8 0 0". This means match any packet where the first byte, when ANDed with the value 0, returns 0. In other words, all packets are selected

@mcastelino
Copy link
Author

mcastelino commented Oct 16, 2018

So sudo tc qdisc add dev eth0 handle ffff: ingress is equivalent to tc qdisc add dev eth0 ingress

This can be verified as follows

$ sudo tc qdisc show
qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev enp0s31f6 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn

$ sudo tc qdisc add dev enp0s31f6 ingress

$ sudo tc qdisc show
qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev enp0s31f6 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc ingress ffff: dev enp0s31f6 parent ffff:fff1 ----------------

$ sudo tc qdisc del dev enp0s31f6 ingress

@mcastelino
Copy link
Author

mcastelino commented Oct 16, 2018

https://www.tldp.org/HOWTO/html_single/Traffic-Control-HOWTO/

A source of terminology confusion is the usage of the terms root qdisc and ingress qdisc. These are not really queuing disciplines, but rather locations onto which traffic control structures can be attached for egress (outbound traffic) and ingress (inbound traffic).

Each interface contains both. The primary and more common is the egress qdisc, known as the root qdisc. It can contain any of the queuing disciplines (qdiscs) with potential classes and class structures. The overwhelming majority of documentation applies to the root qdisc and its children. Traffic transmitted on an interface traverses the egress or root qdisc.

For traffic accepted on an interface, the ingress qdisc is traversed. With its limited utility, it allows no child class to be created, and only exists as an object onto which a filter can be attached. For practical purposes, the ingress qdisc is merely a convenient object onto which to attach a policer to limit the amount of traffic accepted on a network interface.

@amshinde
Copy link

amshinde commented Oct 16, 2018

Go program that implements the above logic:

package main

import (
	"fmt"
	"github.com/vishvananda/netlink"
	"os"
	"strconv"

	"golang.org/x/sys/unix"
)

func main() {
	args := os.Args[1:]

	if len(args) != 2 {
		fmt.Println("Incorrect number of args")
		os.Exit(1)
	}

	index1, _ := strconv.Atoi(args[0])
	index2, _ := strconv.Atoi(args[1])

	fmt.Printf("network index1 : %d\n", index1)
	fmt.Printf("network index2 : %d\n", index2)

	qdisc1 := &netlink.Ingress{
		QdiscAttrs: netlink.QdiscAttrs{
			LinkIndex: index1,
			Parent:    netlink.HANDLE_INGRESS,
		},
	}

	err := netlink.QdiscAdd(qdisc1)
	if err != nil {
		fmt.Printf("Failed to add qdisc for index %d : %s", index1, err)
		os.Exit(1)
	}

	qdisc2 := &netlink.Ingress{
		QdiscAttrs: netlink.QdiscAttrs{
			LinkIndex: index2,
			Parent:    netlink.HANDLE_INGRESS,
		},
	}

	err = netlink.QdiscAdd(qdisc2)
	if err != nil {
		fmt.Printf("Failed to add qdisc for index %d : %s", index2, err)
		os.Exit(1)
	}

	filter1 := &netlink.U32{
		FilterAttrs: netlink.FilterAttrs{
			LinkIndex: index1,
			Parent:    netlink.MakeHandle(0xffff, 0),
			Protocol:  unix.ETH_P_ALL,
		},
		Actions: []netlink.Action{
			&netlink.MirredAction{
				ActionAttrs: netlink.ActionAttrs{
					Action: netlink.TC_ACT_STOLEN,
				},
				MirredAction: netlink.TCA_EGRESS_REDIR,
				Ifindex:      index2,
			},
		},
	}

	if err := netlink.FilterAdd(filter1); err != nil {
		fmt.Printf("Failed to add filter for index %d : %s", index1, err)
		os.Exit(1)
	}

	filter2 := &netlink.U32{
		FilterAttrs: netlink.FilterAttrs{
			LinkIndex: index2,
			Parent:    netlink.MakeHandle(0xffff, 0),
			Protocol:  unix.ETH_P_ALL,
		},
		Actions: []netlink.Action{
			&netlink.MirredAction{
				ActionAttrs: netlink.ActionAttrs{
					Action: netlink.TC_ACT_STOLEN,
				},
				MirredAction: netlink.TCA_EGRESS_REDIR,
				Ifindex:      index1,
			},
		},
	}

	if err := netlink.FilterAdd(filter2); err != nil {
		fmt.Printf("Failed to add filter for index %d : %s", index2, err)
		os.Exit(1)
	}
}

@hellt
Copy link

hellt commented Feb 20, 2021

Hi, thank you for that nice trick.
Do you know if this tc based redirect will allow transparent forwarding of layer 2 frames which are filtered on linux bridge by default (like stp bpdu and LACP frames)?

@mcastelino
Copy link
Author

mcastelino commented Feb 22, 2021

Hi, thank you for that nice trick.
Do you know if this tc based redirect will allow transparent forwarding of layer 2 frames which are filtered on linux bridge by default (like stp bpdu and LACP frames)?

@hellt yes. All traffic should passthro. We use this in Kata containers will all types of CNI interfaces without issues. The performance drop is negleible.

@hellt
Copy link

hellt commented Feb 27, 2021

Thanks @mcastelino
that helped a lot in my similar case where I needed to connect qemu tap interfaces to a containers interfaces

@xzhao025
Copy link

thanks for sharing this! I was struggling with lacp over bridge, this helps!

@tamalsaha
Copy link

tamalsaha commented Sep 6, 2022

There is a CNI plugin for this https://github.com/awslabs/tc-redirect-tap used with firecracker

@mari0d
Copy link

mari0d commented Oct 7, 2022

@tamalsaha
Copy link

lol indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment