Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Using tc redirect to connect a virtual machine to a container network

Connecting a veth device to tap

  • veth device from CNI/CNM plugin: eth0
  • tap device that connects to the VM: tap0

Redirecting traffic between the two devices

tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol all u32 match u8 0 0 action mirred egress redirect dev tap0
tc qdisc add dev tap0 ingress
tc filter add dev tap0 parent ffff: protocol all u32 match u8 0 0 action mirred egress redirect dev eth0

tc qdisc add dev eth0 ingress

  • Add a queuing discipline
  • on dev eth0
  • attach the ingress qdisc Here the handle defaults to ffff:

tc filter add dev eth0 parent ffff: protocol all u32 match u8 0 0 action mirred egress redirect dev tap0

  • Add a filter
  • to device dev eth0
  • to parent (class) handle to which we are attaching, ffff: i.e. ingress which we created before (there is no need for tc class add in the ingress case as it does not support classful queuing discplines).
  • protocol all
  • classifier u32
  • parameters to the classifier u8 0 0, and the first byte of the packet with 0 and if the result is 0 (which it always will be) (i.e. always true)
  • action mirred egress redirect dev eth0, redirect the packet to egress of dev eth0
@mcastelino

This comment has been minimized.

@mcastelino

This comment has been minimized.

Copy link
Owner Author

@mcastelino mcastelino commented Oct 16, 2018

create a new qdisc called "ingress". qdiscs normally don't work on ingress so this is really a special qdisc that you can consider an "alternate root" for inbound packets
add a new filter, and attach it to node "ffff:". The ID "ffff:" is the fixed ID of the ingress qdisc

we use the "u32" matcher, with arguments "u8 0 0". This means match any packet where the first byte, when ANDed with the value 0, returns 0. In other words, all packets are selected

@mcastelino

This comment has been minimized.

Copy link
Owner Author

@mcastelino mcastelino commented Oct 16, 2018

So sudo tc qdisc add dev eth0 handle ffff: ingress is equivalent to tc qdisc add dev eth0 ingress

This can be verified as follows

$ sudo tc qdisc show
qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev enp0s31f6 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn

$ sudo tc qdisc add dev enp0s31f6 ingress

$ sudo tc qdisc show
qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev enp0s31f6 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc ingress ffff: dev enp0s31f6 parent ffff:fff1 ----------------

$ sudo tc qdisc del dev enp0s31f6 ingress
@mcastelino

This comment has been minimized.

Copy link
Owner Author

@mcastelino mcastelino commented Oct 16, 2018

https://www.tldp.org/HOWTO/html_single/Traffic-Control-HOWTO/

A source of terminology confusion is the usage of the terms root qdisc and ingress qdisc. These are not really queuing disciplines, but rather locations onto which traffic control structures can be attached for egress (outbound traffic) and ingress (inbound traffic).

Each interface contains both. The primary and more common is the egress qdisc, known as the root qdisc. It can contain any of the queuing disciplines (qdiscs) with potential classes and class structures. The overwhelming majority of documentation applies to the root qdisc and its children. Traffic transmitted on an interface traverses the egress or root qdisc.

For traffic accepted on an interface, the ingress qdisc is traversed. With its limited utility, it allows no child class to be created, and only exists as an object onto which a filter can be attached. For practical purposes, the ingress qdisc is merely a convenient object onto which to attach a policer to limit the amount of traffic accepted on a network interface.

@amshinde

This comment has been minimized.

Copy link

@amshinde amshinde commented Oct 16, 2018

Go program that implements the above logic:

package main

import (
	"fmt"
	"github.com/vishvananda/netlink"
	"os"
	"strconv"

	"golang.org/x/sys/unix"
)

func main() {
	args := os.Args[1:]

	if len(args) != 2 {
		fmt.Println("Incorrect number of args")
		os.Exit(1)
	}

	index1, _ := strconv.Atoi(args[0])
	index2, _ := strconv.Atoi(args[1])

	fmt.Printf("network index1 : %d\n", index1)
	fmt.Printf("network index2 : %d\n", index2)

	qdisc1 := &netlink.Ingress{
		QdiscAttrs: netlink.QdiscAttrs{
			LinkIndex: index1,
			Parent:    netlink.HANDLE_INGRESS,
		},
	}

	err := netlink.QdiscAdd(qdisc1)
	if err != nil {
		fmt.Printf("Failed to add qdisc for index %d : %s", index1, err)
		os.Exit(1)
	}

	qdisc2 := &netlink.Ingress{
		QdiscAttrs: netlink.QdiscAttrs{
			LinkIndex: index2,
			Parent:    netlink.HANDLE_INGRESS,
		},
	}

	err = netlink.QdiscAdd(qdisc2)
	if err != nil {
		fmt.Printf("Failed to add qdisc for index %d : %s", index2, err)
		os.Exit(1)
	}

	filter1 := &netlink.U32{
		FilterAttrs: netlink.FilterAttrs{
			LinkIndex: index1,
			Parent:    netlink.MakeHandle(0xffff, 0),
			Protocol:  unix.ETH_P_ALL,
		},
		Actions: []netlink.Action{
			&netlink.MirredAction{
				ActionAttrs: netlink.ActionAttrs{
					Action: netlink.TC_ACT_STOLEN,
				},
				MirredAction: netlink.TCA_EGRESS_REDIR,
				Ifindex:      index2,
			},
		},
	}

	if err := netlink.FilterAdd(filter1); err != nil {
		fmt.Printf("Failed to add filter for index %d : %s", index1, err)
		os.Exit(1)
	}

	filter2 := &netlink.U32{
		FilterAttrs: netlink.FilterAttrs{
			LinkIndex: index2,
			Parent:    netlink.MakeHandle(0xffff, 0),
			Protocol:  unix.ETH_P_ALL,
		},
		Actions: []netlink.Action{
			&netlink.MirredAction{
				ActionAttrs: netlink.ActionAttrs{
					Action: netlink.TC_ACT_STOLEN,
				},
				MirredAction: netlink.TCA_EGRESS_REDIR,
				Ifindex:      index1,
			},
		},
	}

	if err := netlink.FilterAdd(filter2); err != nil {
		fmt.Printf("Failed to add filter for index %d : %s", index2, err)
		os.Exit(1)
	}
}
@hellt

This comment has been minimized.

Copy link

@hellt hellt commented Feb 20, 2021

Hi, thank you for that nice trick.
Do you know if this tc based redirect will allow transparent forwarding of layer 2 frames which are filtered on linux bridge by default (like stp bpdu and LACP frames)?

@mcastelino

This comment has been minimized.

Copy link
Owner Author

@mcastelino mcastelino commented Feb 22, 2021

Hi, thank you for that nice trick.
Do you know if this tc based redirect will allow transparent forwarding of layer 2 frames which are filtered on linux bridge by default (like stp bpdu and LACP frames)?

@hellt yes. All traffic should passthro. We use this in Kata containers will all types of CNI interfaces without issues. The performance drop is negleible.

@hellt

This comment has been minimized.

Copy link

@hellt hellt commented Feb 27, 2021

Thanks @mcastelino
that helped a lot in my similar case where I needed to connect qemu tap interfaces to a containers interfaces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment