Skip to content

Instantly share code, notes, and snippets.

@jeez
Last active July 11, 2024 05:08
Show Gist options
  • Save jeez/bd3afeff081ba64a695008dd8215866f to your computer and use it in GitHub Desktop.
Save jeez/bd3afeff081ba64a695008dd8215866f to your computer and use it in GitHub Desktop.
[TSN] Scheduled Tx Tools - Examples and Helpers for testing SO_TXTIME, and the etf and taprio qdiscs
Here we provide a testing application and scripts that can be used
to exercise the SO_TXTIME APIs, the etf qdisc and the taprio qdisc.
The example is based on a sample application (udp_tai.c) provided by
Richard Cochran as part of the RFC v1 of SO_TXTIME. We've extended
it in several ways so it may be used as an example of different
setups: per-packet Tx time only based systems, per-port Time-aware
scheduler, and a combination of those.
The documentation is split into 2 README files:
- README.etf: Provides instructions for how to setup an example to
use etf standalone. In other words, only Time-based
transmission is used.
- README.taprio: Provides instructions for how to setup an example
to use etf and taprio together. That means using
a Time-aware scheduler (i.e. 802.1Qbv) in conjunction
time-based transmission for fine-grained control over
the Tx time of packets.
A custom tool known as 'dump-classifier' was developed so we can
verify if a taprio schedule is being respected. For more information
please check README.classifier .
To help analyze taprio scheduling characteristics, we've developed a custom
tool called 'dump-classifier'.
dump-classifier
===============
dump-classifier aims to ease the test/verification of how well an
implementation runs 802.1Qbv-like schedules.
How to compile
--------------
* Dependencies:
- libpcap-dev
Just running 'make' should work, if all the dependencies are met:
$ make
How to run
----------
$ ./dump-classifier -s <BATCH FILE> -f <FILTER FILE> -d <DUMP FILE>
<BATCH FILE> is a text file containg a batch file intended for use
with 'tc -batch', this allows dump-classifier to use the same file
used for configuring the qdiscs.
Example:
-----<cut
qdisc replace dev enp3s0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
base-time 1536883100000000000 \
sched-entry S 01 300000 \
sched-entry S 02 300000 \
sched-entry S 04 400000 \
clockid CLOCK_TAI
qdisc replace dev enp3s0 parent 100:1 etf \
offload delta 300000 clockid CLOCK_TAI
qdisc replace dev enp3s0 parent 100:2 etf clockid CLOCK_TAI \
delta 300000 offload deadline_mode
----->end
<FILTER FILE> allows different traffic classes to be indentified in a
pcap dump file, it has the following format is contains a traffic
class name and a pcap expression on each line, any traffic class that
doesn't have a filter associated will be classified as "BE" (best
effort). The order is important, as the first line will match the
first traffic class (bit 0) in the gatemask parameter (the second
field of each line of the schedule file), the second line will match
the second traffic class (bit 1), and so on.
Example:
-----<cut
talker :: ether dst aa:aa:aa:aa:aa:aa
----->end
<BASE TIME> an absolute time in nanoseconds where the schedule
started, if that time is before the timestamp of the first packet in
the <DUMP FILE>, the schedule will run until it reaches that
timestamp, packets that have a timestamp before basetime will be
ignored.
<DUMP FILE> is a dump file captured via tcpdump, with timestamp
precision in nanoseconds, so captured using something like this:
$ tcpdump -j adapter_unsynced --time-stamp-precision=nanos -i enp2s0 -w dump.pcap
Here we present the steps taken for setting up a test that uses *only*
the ETF qdisc. That means that only Time-based transmission is exercised.
The 'talker' side of the example described below will transmit a packet
every 1ms. The packet's txtime is set through the SO_TXTIME api, and is
copied into the packet's payload.
At the 'listener' side, we capture traffic and then post-process it to
compute the delta between each packet's arrival time and their txtime.
ptp4l is used for synchronizing the PHC clocks over the network and
phc2sys is used on the 'talker' size for synchronizing the system
clock to the PHC.
CLOCK_TAI is the reference clockid used throughout the example for the
qdiscs and the applications.
# LISTENER #
1) Setup the PTP master. If using the listener end point as PTP
master, setup_clock_sync.sh can be used as the below.
e.g.: $ sudo ip addr add 192.168.0.78/4 broadcast 192.168.0.255 dev IFACE
$ sudo ./setup_clock_sync.sh -i IFACE -m -v
This script will start ptp4l so the PHC time is propagated to the
network. The system clock and the PHC are NOT synchronized on that mode.
* Note that the TAI offset is applied, so CLOCK_REALTIME will be in
the UTC scale while CLOCK_TAI will be in the TAI scale, just like
the PHC.
2) Start capturing traffic on the listener end point. If we want to capture
traffic for 1 minute, and are expecting 1 packet per milisecond:
e.g.: $ sudo tcpdump -c 60000 -i enp3s0 -w tmp.pcap \
-j adapter_unsynced -tt --time-stamp-precision=nano \
udp port 7788
# TALKER #
3) Configure the Qdiscs on the talker side (Device Under Testing, DUT).
Our DUT uses an Intel i210 NIC, and our setup here is as follows.
1.a) First, we setup mqprio as the root qdisc:
e.g.: $ sudo tc qdisc replace dev IFACE parent root handle 100 mqprio \
num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 hw 0
1.b) Then we setup etf with the desired config:
e.g.: $ sudo tc qdisc add dev enp2s0 parent 100:1 etf \
offload clockid CLOCK_TAI delta 150000
4) Setup the Device Under Testing (DUT) as PTP slave and synchronize
the local clocks.
e.g.: $ sudo ip addr add 192.168.0.77/4 broadcast 192.168.0.255 dev IFACE
$ sudo ./setup_clock_sync.sh -i IFACE -s -v
This script will start ptp4l so the PHC is synchronized to the PTP master,
and then will synchronize the system clock to PHC using phc2sys.
At this stage, based purely on empirical observations, one recommendation
is waiting for the rms value reported by ptp4l to reach a value below 15 ns,
and to remain somewhat constant after that.
* Note that the TAI offset is applied, so CLOCK_REALTIME will be in the UTC
scale while CLOCK_TAI will be in the TAI scale, just like the PHC.
5) Optionally, build and run check_clocks on both PTP master and slave.
e.g.: $ make check_clocks && sudo ./check_clocks IFACE
It reports the timestamps fetched from CLOCK_REALTIME, CLOCK_TAI and
the interface's PHC, as well the latency for reading from each clock
and the delta between the PHC and the system clocks.
You may use this information to verify if the offsets were applied
correctly and if the PHC - CLOCK_TAI delta is not too high. Again,
based on empirical observations, we consider this value as "good enough"
if it's less than 25us, and it's been observed to get as low as 4us.
6) Build and run udp_tai on the talker end station
e.g.: $ gcc -o udp_tai -lpthread udp_tai.c
$ sudo ./udp_tai -i enp2s0 -P 1000000 -p 90 -d 600000
# LISTENER #
7) Analyze traffic and generate statistics.
We first use tshark for post-processing the pcap file as needed, then
we use a custom python script to compute the packets' offset from their
expected arrival time, and then compute statistics for the overall data set.
e.g.: $ tshark -r tmp.pcap --disable-protocol dcp-etsi --disable-protocol \
dcp-pft -t e -E separator=, -T fields -e frame.number \
-e frame.time_epoch -e data.data > tmp.out
$ ./txtime_offset_stats.py -f tmp.out
# NOTE ON VLAN USAGE #
If your tests require that VLAN tagging is performed by the end stations, then
you must configure the kernel to do so. There are different ways to approach that,
one of them is to create a vlan interface that knows how to map from a socket
priority to the VLAN PCP.
e.g.: $ ip link add link enp2s0 name enp2s0.2 type vlan id 2 egress 2:2 3:3
$ ip link set dev enp2s0.2 up
This maps socket priority 2 to PCP 2 and 3 to 3 for egress on a VLAN with id 2.
The same can be done for ingress.
Here we present the steps taken for setting up a test that uses both
the ETF qdisc and the TAPRIO one. That means that we'll use a (Qbv-like)
port scheduler with a fixed Tx schedule for traffic classes (TC), while
using Time-based transmission for controlling the Tx time of packets within
each TC.
The 'talker' side of the example described below will have 2 applications
transmitting time-sensitive traffic following a strict cyclic schedule.
In addition to that, iperf3 is used to transmit best-effort traffic on the
port. The port schedule is thus comprised by 3 time-slices, with a total
cycle-time of 1 millisecond allocated as:
- Traffic Class 0 (TC 0): duration of 300us, 'strict txtime' is used.
- Traffic Class 1 (TC 1): duration of 300us, 'deadline txtime' is used.
- Traffic Class 2 (TC 2): duration of 400us, best-effort traffic.
The system is configured so the application enqueueing packets for
TC 0 will set its packets *Tx time* with an offset of 250us within its
time-slice.
The application enqueueing packets for TC 1 will set its packets *deadline*
with an offset of 250us within its time-slice. However, because this TC is
using the deadline mode of SO_TXTIME + etf, then a packet maybe transmitted
at anytime within its time slice that is before its deadline.
Best-effort traffic is transmitted at anytime during the third time slice.
A away to visualize this cycle and its time-slices is:
|______x_|......D_|bbbbbbbbbb|
0 299 599 999us
The application for each time-sensitive traffic class will transmit a packet
every 1ms. The packet's txtime is set through the SO_TXTIME api, and is
copied into the packet's payload.
At the 'listener' side, we capture traffic and then post-process it to
verify if packets are arriving outside of the time-slice they belong to.
ptp4l is used for synchronizing the PHC clocks over the network and
phc2sys is used on the 'talker' size for synchronizing the system
clock to the PHC.
CLOCK_TAI is the reference clockid used throughout the example for the
qdiscs and the applications.
# NOTE ON VLAN USAGE #
If your tests require that VLAN tagging is performed by the end stations, then
you must configure the kernel to do so. There are different ways to approach that,
one of them is to create a vlan interface that knows how to map from a socket
priority to the VLAN PCP.
e.g.: $ ip link add link enp2s0 name enp2s0.2 type vlan id 2 egress 2:2 3:3
$ ip link set dev enp2s0.2 up
This maps socket priority 2 to PCP 2 and 3 to 3 for egress on a VLAN with id 2.
The same can be done for ingress.
# TALKER #
1) Setup network
sudo ip addr add 192.168.0.77/4 broadcast 192.168.0.255 dev enp3s0
2) Setup qdiscs
The script 'config-taprio.sh', will configure taprio and ETF,
automatically, with the same parameters explained below. It
will also save on the 'taprio.batch' file the configuration
used, so it can be used for analysis.
The rest of Section 2 describes taprio and etf configuration
parameters briefly.
2.1) Setup taprio with a base-time starting in 2min from now rounded down.
We must add the 37s UTC-TAI offset to the timestamp we get with 'date'.
i=$((`date +%s%N` + 37000000000 + (2 * 60 * 1000000000))) ; \
base=$(($i - ($i % 1000000000))) ; \
tc qdisc replace dev enp3s0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
base-time $base \
sched-entry S 01 300000 \
sched-entry S 02 300000 \
sched-entry S 04 400000 \
clockid CLOCK_TAI
We can read the above as:
- there are 3 traffic classes (num_tc 3);
- SO_PRIORITY value 3 maps to TC 0, while value 2 maps to TC 1.
Everything else maps to the other (best-effort) traffic classes;
- "queues 0 1 2 2" is a positional argument, meaning that TC 0 maps
to queue 0, TC 1 maps to queue 1 and TC 2 maps to queues 2 and 3.
- gates.sched is used as schedule file;
- the reference clock is CLOCK_TAI;
2.2) Setup etf for queue TC 0:
tc qdisc replace dev enp3s0 parent 100:1 etf clockid CLOCK_TAI \
delta 200000 offload
2.3) Setup etf in deadline mode for TC 1:
tc qdisc replace dev enp3s0 parent 100:2 etf clockid CLOCK_TAI \
delta 200000 offload deadline_mode
3) Start time sync (ptp slave):
sudo ./setup_clock_sync.sh -i enp3s0 -s -v
4) Start iperf3 client:
iperf3 -c 192.168.0.78 -t 600 --fq-rate 100M
5) Start udp_tai for TC 0. Use a base-time starting in 1min from now + a
250us offset for txtime:
now=`date +%s%N` ; i=$(($now + 37000000000 + (60 * 1000000000))) ; \
base=$(($i - ($i % 1000000000) + 250000)) ; \
sudo ./udp_tai -i enp3s0 -b $base -P 1000000 -t 3 -p 90 -d 600000 \
-u 7788
To automate this process a little, the script
'run-udp-tai-tc0.sh' is provided.
6) Start udp_tai in deadline mode for TC 1. Use the txtime computed for
the previous traffic class (above) and add 300us so it falls under the
second time slice (TC 1). For example, if the instance of udp_tai executed
on the previous step printed
"txtime of 1st packet is: 1528320726000250000", then the now we should do:
sudo ./udp_tai -i enp3s0 -t 2 -p 90 -D -d 600000 \
-b 1528320726000550000 -u 7798
# LISTENER #
1) Setup network
sudo ip addr add 192.168.0.78/4 broadcast 192.168.0.255 dev enp3s0
2) Start time sync (ptp master)
sudo ./setup_clock_sync.sh -i enp3s0 -m -v
3) Start iperf server
iperf3 -s
4) Prepare 'dump-classifier' files. Running 'config-taprio.sh'
should produce a 'taprio.batch' file, it will be used for
verifying how well the schedule specified there was followed.
Please refer to README.classifier for further information.
--filters--
talker_strict :: udp port 7788
talker_deadline :: udp port 7798
--taprio.batch--
qdisc replace dev enp3s0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
base-time 1536883100000000000 \
sched-entry S 01 300000 \
sched-entry S 02 300000 \
sched-entry S 04 400000 \
clockid CLOCK_TAI
qdisc replace dev enp3s0 parent 100:1 etf \
offload delta 300000 clockid CLOCK_TAI
qdisc replace dev enp3s0 parent 100:2 etf clockid CLOCK_TAI \
delta 300000 offload deadline_mode
5) Start capturing traffic:
sudo tcpdump -c 600000 -i enp3s0 -w tmp.pcap -j adapter_unsynced \
-tt --time-stamp-precision=nano
6) Use the talkers to transmit packets as described on the next section.
6) After traffic was captured, check if packets arrived outside of their
time-slices. The base-time comes from the udp_tai for TC 0 minus the
250us txtime offset as used below. For example:
./dump-classifier -d tmp.pcap -f filter -s taprio.batch | grep -v ontime
/*
* Copyright (c) 2018, Intel Corporation
*
* SPDX-License-Identifier: BSD-3-Clause
*
*/
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <linux/ethtool.h>
#include <linux/sockios.h>
#include <net/if.h>
#include <sys/ioctl.h>
#define ONE_SEC 1000000000ULL
#define PTP_MAX_DEV_PATH 16
/* fd to clockid helpers. Copied from posix-timers.h. */
#define CLOCKFD 3
static inline clockid_t make_process_cpuclock(const unsigned int pid,
const clockid_t clock)
{
return ((~pid) << 3) | clock;
}
static inline clockid_t fd_to_clockid(const int fd)
{
return make_process_cpuclock((unsigned int) fd, CLOCKFD);
}
static inline void open_phc_fd(int* fd_ptp, char* ifname)
{
struct ethtool_ts_info interface_info = {0};
char ptp_path[PTP_MAX_DEV_PATH];
struct ifreq req = {0};
int fd_ioctl;
/* Get PHC index */
interface_info.cmd = ETHTOOL_GET_TS_INFO;
snprintf(req.ifr_name, sizeof(req.ifr_name), "%s", ifname);
req.ifr_data = (char *) &interface_info;
fd_ioctl = socket(AF_INET, SOCK_DGRAM, 0);
if (fd_ioctl < 0) {
perror("Couldn't open socket");
exit(EXIT_FAILURE);
}
if (ioctl(fd_ioctl, SIOCETHTOOL, &req) < 0) {
perror("Couldn't issue SIOCETHTOOL ioctl");
exit(EXIT_FAILURE);
}
snprintf(ptp_path, sizeof(ptp_path), "%s%d", "/dev/ptp",
interface_info.phc_index);
*fd_ptp = open(ptp_path, O_RDONLY);
if (*fd_ptp < 0) {
perror("Couldn't open the PTP fd. Did you forget to run with sudo again?");
exit(EXIT_FAILURE);
}
close(fd_ioctl);
}
int main(int argc, char** argv)
{
struct timespec ts_rt1, ts_rt2, ts_ptp1, ts_ptp2, ts_tai1, ts_tai2;
uint64_t rt, tai, ptp, lat_rt, lat_tai, lat_ptp;
char ifname[IFNAMSIZ];
int fd_ptp, err;
if (argc < 2) {
printf("You must run this as %s NET_IFACE (e.g. enp2s0)\n", argv[0]);
return EXIT_FAILURE;
}
strncpy(ifname, argv[1], sizeof(ifname) - 1);
open_phc_fd(&fd_ptp, ifname);
/* Fetch timestamps for each clock. */
clock_gettime(CLOCK_REALTIME, &ts_rt1);
clock_gettime(CLOCK_TAI, &ts_tai1);
clock_gettime(fd_to_clockid(fd_ptp), &ts_ptp1);
rt = (ts_rt1.tv_sec * ONE_SEC) + ts_rt1.tv_nsec;
tai = (ts_tai1.tv_sec * ONE_SEC) + ts_tai1.tv_nsec;
ptp = (ts_ptp1.tv_sec * ONE_SEC) + ts_ptp1.tv_nsec;
/* Compute clocks read latency. */
clock_gettime(CLOCK_REALTIME, &ts_rt1);
clock_gettime(CLOCK_REALTIME, &ts_rt2);
lat_rt = ((ts_rt2.tv_sec * ONE_SEC) + ts_rt2.tv_nsec)
- ((ts_rt1.tv_sec * ONE_SEC) + ts_rt1.tv_nsec);
clock_gettime(CLOCK_TAI, &ts_tai1);
clock_gettime(CLOCK_TAI, &ts_tai2);
lat_tai = ((ts_tai2.tv_sec * ONE_SEC) + ts_tai2.tv_nsec)
- ((ts_tai1.tv_sec * ONE_SEC) + ts_tai1.tv_nsec);
clock_gettime(fd_to_clockid(fd_ptp), &ts_ptp1);
clock_gettime(fd_to_clockid(fd_ptp), &ts_ptp2);
lat_ptp = ((ts_ptp2.tv_sec * ONE_SEC) + ts_ptp2.tv_nsec)
- ((ts_ptp1.tv_sec * ONE_SEC) + ts_ptp1.tv_nsec);
printf("rt tstamp:\t%llu\n", rt);
printf("tai tstamp:\t%llu\n", tai);
printf("phc tstamp:\t%llu\n", ptp);
printf("rt latency:\t%llu\n", lat_rt);
printf("tai latency:\t%llu\n", lat_tai);
printf("phc latency:\t%llu\n", lat_ptp);
printf("phc-rt delta:\t%lld\n", ptp - rt);
printf("phc-tai delta:\t%lld\n", ptp - tai);
close(fd_ptp);
return EXIT_SUCCESS;
}
#!/bin/bash
#
# Copyright (c) 2018, Intel Corporation
#
# SPDX-License-Identifier: BSD-3-Clause
#
IFACE=$1
if [ -z $IFACE ]; then
echo "You must provide the network interface as first argument"
exit -1
fi
BATCH_FILE=etf.batch
cat > $BATCH_FILE <<EOF
qdisc replace dev $IFACE parent root handle 100 mqprio \\
num_tc 3 \\
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
queues 1@0 1@1 2@2 \\
hw 0
qdisc replace dev enp3s0 parent 100:1 etf \\
offload delta 300000 clockid CLOCK_TAI
qdisc replace dev enp3s0 parent 100:2 etf clockid CLOCK_TAI \\
delta 300000 offload deadline_mode
EOF
tc -batch $BATCH_FILE
#!/bin/bash
#
# Copyright (c) 2018, Intel Corporation
#
# SPDX-License-Identifier: BSD-3-Clause
#
IFACE=$1
if [ -z $IFACE ]; then
echo "You must provide the network interface as first argument"
exit -1
fi
i=$((`date +%s%N` + 37000000000 + (2 * 60 * 1000000000)))
BASE_TIME=$(($i - ($i % 1000000000)))
BATCH_FILE=taprio.batch
cat > $BATCH_FILE <<EOF
qdisc replace dev $IFACE parent root handle 100 taprio \\
num_tc 3 \\
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
queues 1@0 1@1 2@2 \\
base-time $BASE_TIME \\
sched-entry S 01 300000 \\
sched-entry S 02 300000 \\
sched-entry S 04 400000 \\
clockid CLOCK_TAI
qdisc replace dev $IFACE parent 100:1 etf \\
offload delta 200000 clockid CLOCK_TAI
qdisc replace dev $IFACE parent 100:2 etf clockid CLOCK_TAI \\
delta 200000 offload deadline_mode
EOF
tc -batch $BATCH_FILE
echo "Base time: $BASE_TIME"
echo "Configuration saved to: $BATCH_FILE"
/*
* Copyright (c) 2018, Intel Corporation
*
* SPDX-License-Identifier: BSD-3-Clause
*
*/
#include <argp.h>
#include <inttypes.h>
#include <pcap.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NSEC_TO_SEC 1e9
#define NUM_FILTERS 8
#define NUM_ENTRIES 64
#define MAX_ARGS 100
/* TAPRIO */
enum {
TC_TAPRIO_CMD_SET_GATES = 0x00,
TC_TAPRIO_CMD_SET_AND_HOLD = 0x01,
TC_TAPRIO_CMD_SET_AND_RELEASE = 0x02,
};
#define NEXT_ARG() \
do { \
argv++; \
if (--argc <= 0) { \
fprintf(stderr, "Incomplete command\n"); \
exit(-1); \
} \
} while (0)
enum traffic_flags {
TRAFFIC_FLAGS_TXTIME,
};
struct tc_filter {
char *name;
struct bpf_program prog;
unsigned int flags;
};
struct sched_entry {
uint8_t command;
uint32_t gatemask;
uint32_t interval;
};
struct schedule {
struct sched_entry entries[NUM_ENTRIES];
int64_t base_time;
size_t current_entry;
size_t num_entries;
int64_t cycle_time;
};
static struct argp_option options[] = {
{"batch-file", 's', "BATCH_FILE", 0, "File containing the taprio configuration" },
{"dump-file", 'd', "DUMP_FILE", 0, "File containing the tcpdump dump" },
{"filters-file", 'f', "FILTERS_FILE", 0, "File containing the classfication filters" },
{ 0 }
};
static struct tc_filter traffic_filters[NUM_FILTERS];
static FILE *batch_file, *dump_file, *filters_file;
static struct schedule schedule;
static error_t parser(int key, char *arg, struct argp_state *state)
{
switch (key) {
case 's':
batch_file = fopen(arg, "r");
if (!batch_file) {
perror("Could not open file, fopen");
exit(EXIT_FAILURE);
}
break;
case 'd':
dump_file = fopen(arg, "r");
if (!dump_file) {
perror("Could not open file, fopen");
exit(EXIT_FAILURE);
}
break;
case 'f':
filters_file = fopen(arg, "r");
if (!filters_file) {
perror("Could not open file, fopen");
exit(EXIT_FAILURE);
}
break;
}
return 0;
}
static struct argp argp = { options, parser };
static void usage(void)
{
fprintf(stderr, "dump-classifier -s <tc batch file> -d <dump-file> -f <filters-file>\n");
}
/* split command line into argument vector */
int makeargs(char *line, char *argv[], int max_args)
{
static const char ws[] = " \t\r\n";
char *cp = line;
int argc = 0;
while (*cp) {
/* skip leading whitespace */
cp += strspn(cp, ws);
if (*cp == '\0')
break;
if (argc >= (max_args - 1))
return -1;
/* word begins with quote */
if (*cp == '\'' || *cp == '"') {
char quote = *cp++;
argv[argc++] = cp;
/* find ending quote */
cp = strchr(cp, quote);
if (cp == NULL) {
fprintf(stderr, "Unterminated quoted string\n");
exit(1);
}
} else {
argv[argc++] = cp;
/* find end of word */
cp += strcspn(cp, ws);
if (*cp == '\0')
break;
}
/* seperate words */
*cp++ = 0;
}
argv[argc] = NULL;
return argc;
}
/* Like glibc getline but handle continuation lines and comments */
ssize_t getcmdline(char **linep, size_t *lenp, FILE *in)
{
ssize_t cc;
char *cp;
cc = getline(linep, lenp, in);
if (cc < 0)
return cc; /* eof or error */
cp = strchr(*linep, '#');
if (cp)
*cp = '\0';
while ((cp = strstr(*linep, "\\\n")) != NULL) {
char *line1 = NULL;
size_t len1 = 0;
ssize_t cc1;
cc1 = getline(&line1, &len1, in);
if (cc1 < 0) {
fprintf(stderr, "Missing continuation line\n");
return cc1;
}
*cp = 0;
cp = strchr(line1, '#');
if (cp)
*cp = '\0';
*lenp = strlen(*linep) + strlen(line1) + 1;
*linep = realloc(*linep, *lenp);
if (!*linep) {
fprintf(stderr, "Out of memory\n");
*lenp = 0;
return -1;
}
cc += cc1 - 2;
strcat(*linep, line1);
free(line1);
}
return cc;
}
static int parse_filters(pcap_t *handle, FILE *file,
struct tc_filter *filters, size_t num_filters)
{
char *name, *expression;
size_t i = 0;
int err;
while (i < num_filters && fscanf(file, "%ms :: %m[^\n]s\n",
&name, &expression) != EOF) {
struct tc_filter *filter = &filters[i];
filter->name = name;
err = pcap_compile(handle, &filter->prog, expression,
1, PCAP_NETMASK_UNKNOWN);
if (err < 0) {
pcap_perror(handle, "pcap_compile");
return -EINVAL;
}
i++;
}
return i;
}
static int str_to_entry_cmd(const char *str)
{
if (strcmp(str, "S") == 0)
return TC_TAPRIO_CMD_SET_GATES;
if (strcmp(str, "H") == 0)
return TC_TAPRIO_CMD_SET_AND_HOLD;
if (strcmp(str, "R") == 0)
return TC_TAPRIO_CMD_SET_AND_RELEASE;
return -1;
}
int get_u32(uint32_t *val, const char *arg, int base)
{
unsigned long res;
char *ptr;
if (!arg || !*arg)
return -1;
res = strtoul(arg, &ptr, base);
/* empty string or trailing non-digits */
if (!ptr || ptr == arg || *ptr)
return -1;
/* overflow */
if (res == ULONG_MAX && errno == ERANGE)
return -1;
/* in case UL > 32 bits */
if (res > 0xFFFFFFFFUL)
return -1;
*val = res;
return 0;
}
int get_s64(int64_t *val, const char *arg, int base)
{
long res;
char *ptr;
errno = 0;
if (!arg || !*arg)
return -1;
res = strtoll(arg, &ptr, base);
if (!ptr || ptr == arg || *ptr)
return -1;
if ((res == LLONG_MIN || res == LLONG_MAX) && errno == ERANGE)
return -1;
if (res > INT64_MAX || res < INT64_MIN)
return -1;
*val = res;
return 0;
}
static int parse_batch_file(FILE *file, struct schedule *schedule, size_t max_entries)
{
int argc;
char *arguments[MAX_ARGS];
char **argv;
size_t len;
char *line = NULL;
int err;
if (getcmdline(&line, &len, file) < 0) {
fprintf(stderr, "Could not read batch file\n");
exit(EXIT_FAILURE);
}
argc = makeargs(line, arguments, MAX_ARGS);
if (argc < 0) {
fprintf(stderr, "Could not parse arguments\n");
return -1;
}
argv = arguments;
while (argc > 0) {
if (strcmp(*argv, "sched-entry") == 0) {
struct sched_entry *e;
if (schedule->num_entries >= max_entries) {
fprintf(stderr, "The maximum number of schedule entries is %zu\n", max_entries);
return -1;
}
e = &schedule->entries[schedule->num_entries];
NEXT_ARG();
err = str_to_entry_cmd(*argv);
if (err < 0) {
fprintf(stderr, "Could not parse command (found %s)\n", *argv);
return -1;
}
e->command = err;
NEXT_ARG();
if (get_u32(&e->gatemask, *argv, 16)) {
fprintf(stderr, "Could not parse gatemask (found %s)\n", *argv);
return -1;
}
NEXT_ARG();
if (get_u32(&e->interval, *argv, 0)) {
fprintf(stderr, "Could not parse interval (found %s)\n", *argv);
return -1;
}
schedule->num_entries++;
} else if (strcmp(*argv, "base-time") == 0) {
NEXT_ARG();
if (get_s64(&schedule->base_time, *argv, 10)) {
fprintf(stderr, "Could not parse base-time (found %s)\n", *argv);
return -1;
}
}
argc--; argv++;
}
return 0;
}
/* libpcap re-uses the timeval struct for nanosecond resolution when
* PCAP_TSTAMP_PRECISION_NANO is specified.
*/
static uint64_t tv_to_nanos(const struct timeval *tv)
{
return tv->tv_sec * NSEC_TO_SEC + tv->tv_usec;
}
static struct sched_entry *next_entry(struct schedule *schedule)
{
schedule->current_entry++;
if (schedule->current_entry >= schedule->num_entries)
schedule->current_entry = 0;
return &schedule->entries[schedule->current_entry];
}
static struct sched_entry *first_entry(struct schedule *schedule)
{
schedule->current_entry = 0;
return &schedule->entries[0];
}
static struct sched_entry *advance_until(struct schedule *schedule,
uint64_t ts, uint64_t *now)
{
struct sched_entry *first, *entry;
uint64_t cycle = 0;
uint64_t n;
entry = first = first_entry(schedule);
if (!schedule->cycle_time) {
do {
cycle += entry->interval;
entry = next_entry(schedule);
} while (entry != first);
schedule->cycle_time = cycle;
}
cycle = schedule->cycle_time;
n = (ts - schedule->base_time) / cycle;
*now = schedule->base_time + (n * cycle);
do {
if (*now + entry->interval > ts)
break;
*now += entry->interval;
entry = next_entry(schedule);
} while (true);
return entry;
}
static int match_packet(const struct tc_filter *filters, int num_filters,
const struct pcap_pkthdr *hdr,
const uint8_t *frame)
{
int err;
int i;
for (i = 0; i < num_filters; i++) {
const struct tc_filter *f = &filters[i];
err = pcap_offline_filter(&f->prog, hdr, frame);
if (!err) {
/* The filter for traffic class 'i' doesn't
* match the packet
*/
continue;
}
return i;
}
/* returning 'num_filters' means that the packet matches none
* of the filters, so it's a Best Effort packet.
*/
return num_filters;
}
static int classify_frames(pcap_t *handle, const struct tc_filter *tc_filters,
int num_filters, struct schedule *schedule)
{
struct sched_entry *entry;
struct pcap_pkthdr *hdr;
const uint8_t *frame;
uint64_t now, ts;
int err;
now = schedule->base_time;
/* Ignore frames until we get to the base_time of the
* schedule. */
do {
err = pcap_next_ex(handle, &hdr, &frame);
if (err < 0) {
pcap_perror(handle, "pcap_next_ex");
return -EINVAL;
}
ts = tv_to_nanos(&hdr->ts);
} while (ts <= now);
do {
const char *name, *ontime;
int64_t offset;
int tc;
ts = tv_to_nanos(&hdr->ts);
entry = advance_until(schedule, ts, &now);
tc = match_packet(tc_filters, num_filters, hdr, frame);
if (tc < num_filters)
name = tc_filters[tc].name;
else
name = "BE";
if (entry->gatemask & (1 << tc))
ontime = "ontime";
else
ontime = "late";
offset = ts - now;
/* XXX: what more information might we need? */
printf("%" PRIu64 " %" PRIu64 " \"%s\" \"%s\" %" PRId64 " %#x\n",
ts, now, name, ontime, offset, entry->gatemask);
} while (pcap_next_ex(handle, &hdr, &frame) >= 0);
return 0;
}
static void free_filters(struct tc_filter *filters, int num_filters)
{
int i;
for (i = 0; i < num_filters; i++) {
struct tc_filter *f = &filters[i];
free(f->name);
}
}
int main(int argc, char **argv)
{
char errbuf[PCAP_ERRBUF_SIZE];
pcap_t *handle;
int err, num;
argp_parse(&argp, argc, argv, 0, NULL, NULL);
if (!dump_file || !batch_file || !filters_file) {
usage();
exit(EXIT_FAILURE);
}
err = parse_batch_file(batch_file, &schedule, NUM_ENTRIES);
if (err < 0) {
fprintf(stderr, "Could not parse schedule file (or file empty)\n");
exit(EXIT_FAILURE);
}
handle = pcap_fopen_offline_with_tstamp_precision(
dump_file, PCAP_TSTAMP_PRECISION_NANO, errbuf);
if (!handle) {
fprintf(stderr, "Could not parse dump file\n");
exit(EXIT_FAILURE);
}
num = parse_filters(handle, filters_file,
traffic_filters, NUM_FILTERS);
if (err < 0) {
fprintf(stderr, "Could not filters file\n");
exit(EXIT_FAILURE);
}
err = classify_frames(handle, traffic_filters, num, &schedule);
if (err < 0) {
fprintf(stderr, "Could not classify frames\n");
exit(EXIT_FAILURE);
}
free_filters(traffic_filters, num);
pcap_close(handle);
return 0;
}
/*
* This program demonstrates transmission of L2 frames using the
* system TAI timer.
*
* Copyright (c) 2018, Intel Corporation
*
* Copyright (C) 2017 linutronix GmbH
*
* Large portions taken from the linuxptp stack.
* Copyright (C) 2011, 2012 Richard Cochran <richardcochran@gmail.com>
*
* Some portions taken from the sgd test program.
* Copyright (C) 2015 linutronix GmbH
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/
#define _GNU_SOURCE /*for CPU_SET*/
#include <errno.h>
#include <ifaddrs.h>
#include <linux/errqueue.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <net/if.h>
#include <netinet/in.h>
#include <poll.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#define ONE_SEC 1000000000ULL
#define DEFAULT_PERIOD 1000000
#define DEFAULT_DELAY 500000
#define DEFAULT_PRIORITY 3
#define MARKER 'a'
#ifndef SO_TXTIME
#define SO_TXTIME 61
#define SCM_TXTIME SO_TXTIME
#endif
#ifndef SO_EE_ORIGIN_TXTIME
#define SO_EE_ORIGIN_TXTIME 6
#define SO_EE_CODE_TXTIME_INVALID_PARAM 1
#define SO_EE_CODE_TXTIME_MISSED 2
#endif
#define pr_err(s) fprintf(stderr, s "\n")
#define pr_info(s) fprintf(stdout, s "\n")
/* The API for SO_TXTIME is the below struct and enum, which will be
* provided by uapi/linux/net_tstamp.h in the near future.
*/
struct sock_txtime {
clockid_t clockid;
uint16_t flags;
};
enum txtime_flags {
SOF_TXTIME_DEADLINE_MODE = (1 << 0),
SOF_TXTIME_REPORT_ERRORS = (1 << 1),
SOF_TXTIME_FLAGS_LAST = SOF_TXTIME_REPORT_ERRORS,
SOF_TXTIME_FLAGS_MASK = (SOF_TXTIME_FLAGS_LAST - 1) |
SOF_TXTIME_FLAGS_LAST
};
static int running = 1, use_so_txtime = 1;
static int period_nsec = DEFAULT_PERIOD;
static int waketx_delay = DEFAULT_DELAY;
static int so_priority = DEFAULT_PRIORITY;
static int use_deadline_mode = 0;
static int receive_errors = 0;
static uint64_t base_time = 0;
static uint8_t mac_addr[ETH_ALEN] = {0};
static struct sock_txtime sk_txtime;
static struct sockaddr_ll addr = {0};
static void normalize(struct timespec *ts)
{
while (ts->tv_nsec > 999999999) {
ts->tv_sec += 1;
ts->tv_nsec -= ONE_SEC;
}
while (ts->tv_nsec < 0) {
ts->tv_sec -= 1;
ts->tv_nsec += ONE_SEC;
}
}
static int sk_interface_index(int fd, const char *name)
{
struct ifreq ifreq;
int err;
memset(&ifreq, 0, sizeof(ifreq));
strncpy(ifreq.ifr_name, name, sizeof(ifreq.ifr_name) - 1);
err = ioctl(fd, SIOCGIFINDEX, &ifreq);
if (err < 0) {
pr_err("ioctl SIOCGIFINDEX failed: %m");
return err;
}
return ifreq.ifr_ifindex;
}
static int l2_open_socket(const char *name, clockid_t clkid)
{
int fd, index, on = 1;
addr.sll_family = AF_PACKET,
addr.sll_protocol = htons(ETH_P_TSN),
addr.sll_halen = ETH_ALEN,
fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN));
if (fd < 0) {
pr_err("socket failed: %m");
goto no_socket;
}
index = sk_interface_index(fd, name);
if (index < 0)
goto no_option;
addr.sll_ifindex = index;
if (setsockopt(fd, SOL_SOCKET, SO_PRIORITY, &so_priority, sizeof(so_priority))) {
pr_err("Couldn't set priority");
goto no_option;
}
memcpy(&addr.sll_addr, mac_addr, ETH_ALEN);
sk_txtime.clockid = clkid;
sk_txtime.flags = (use_deadline_mode | receive_errors);
if (use_so_txtime && setsockopt(fd, SOL_SOCKET, SO_TXTIME, &sk_txtime, sizeof(sk_txtime))) {
pr_err("setsockopt SO_TXTIME failed: %m");
goto no_option;
}
return fd;
no_option:
close(fd);
no_socket:
return -1;
}
static int l2_send(int fd, void *buf, int len, __u64 txtime)
{
char control[CMSG_SPACE(sizeof(txtime))] = {};
struct cmsghdr *cmsg;
struct msghdr msg;
struct iovec iov;
ssize_t cnt;
iov.iov_base = buf;
iov.iov_len = len;
memset(&msg, 0, sizeof(msg));
msg.msg_name = &addr;
msg.msg_namelen = sizeof(addr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
/*
* We specify the transmission time in the CMSG.
*/
if (use_so_txtime) {
msg.msg_control = control;
msg.msg_controllen = sizeof(control);
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_TXTIME;
cmsg->cmsg_len = CMSG_LEN(sizeof(__u64));
*((__u64 *) CMSG_DATA(cmsg)) = txtime;
}
cnt = sendmsg(fd, &msg, 0);
if (cnt < 1) {
pr_err("sendmsg failed: %m");
return cnt;
}
return cnt;
}
static unsigned char tx_buffer[256];
static int process_socket_error_queue(int fd)
{
uint8_t msg_control[CMSG_SPACE(sizeof(struct sock_extended_err))];
unsigned char err_buffer[sizeof(tx_buffer)];
struct sock_extended_err *serr;
struct cmsghdr *cmsg;
__u64 tstamp = 0;
struct iovec iov = {
.iov_base = err_buffer,
.iov_len = sizeof(err_buffer)
};
struct msghdr msg = {
.msg_iov = &iov,
.msg_iovlen = 1,
.msg_control = msg_control,
.msg_controllen = sizeof(msg_control)
};
if (recvmsg(fd, &msg, MSG_ERRQUEUE) == -1) {
pr_err("recvmsg failed");
return -1;
}
cmsg = CMSG_FIRSTHDR(&msg);
while (cmsg != NULL) {
serr = (void *) CMSG_DATA(cmsg);
if (serr->ee_origin == SO_EE_ORIGIN_TXTIME) {
tstamp = ((__u64) serr->ee_data << 32) + serr->ee_info;
switch(serr->ee_code) {
case SO_EE_CODE_TXTIME_INVALID_PARAM:
fprintf(stderr, "packet with tstamp %llu dropped due to invalid params\n", tstamp);
return 0;
case SO_EE_CODE_TXTIME_MISSED:
fprintf(stderr, "packet with tstamp %llu dropped due to missed deadline\n", tstamp);
return 0;
default:
return -1;
}
}
cmsg = CMSG_NXTHDR(&msg, cmsg);
}
return 0;
}
static int run_nanosleep(clockid_t clkid, int fd)
{
struct timespec ts;
int cnt, err;
__u64 txtime;
struct pollfd p_fd = {
.fd = fd,
};
memset(tx_buffer, MARKER, sizeof(tx_buffer));
/* If no base-time was specified, start one to two seconds in the
* future.
*/
if (base_time == 0) {
clock_gettime(clkid, &ts);
ts.tv_sec += 1;
ts.tv_nsec = ONE_SEC - waketx_delay;
} else {
ts.tv_sec = base_time / ONE_SEC;
ts.tv_nsec = (base_time % ONE_SEC) - waketx_delay;
}
normalize(&ts);
txtime = ts.tv_sec * ONE_SEC + ts.tv_nsec;
txtime += waketx_delay;
fprintf(stderr, "\ntxtime of 1st packet is: %llu", txtime);
while (running) {
memcpy(tx_buffer, &txtime, sizeof(__u64));
err = clock_nanosleep(clkid, TIMER_ABSTIME, &ts, NULL);
switch (err) {
case 0:
cnt = l2_send(fd, tx_buffer, sizeof(tx_buffer), txtime);
if (cnt != sizeof(tx_buffer)) {
pr_err("send failed");
}
ts.tv_nsec += period_nsec;
normalize(&ts);
txtime += period_nsec;
/* Check if errors are pending on the error queue. */
err = poll(&p_fd, 1, 0);
if (err == 1 && p_fd.revents & POLLERR) {
if (!process_socket_error_queue(fd))
return -ECANCELED;
}
break;
case EINTR:
continue;
default:
fprintf(stderr, "clock_nanosleep returned %d: %s",
err, strerror(err));
return err;
}
}
return 0;
}
static int set_realtime(pthread_t thread, int priority, int cpu)
{
cpu_set_t cpuset;
struct sched_param sp;
int err, policy;
int min = sched_get_priority_min(SCHED_FIFO);
int max = sched_get_priority_max(SCHED_FIFO);
fprintf(stderr, "min %d max %d\n", min, max);
if (priority < 0) {
return 0;
}
err = pthread_getschedparam(thread, &policy, &sp);
if (err) {
fprintf(stderr, "pthread_getschedparam: %s\n", strerror(err));
return -1;
}
sp.sched_priority = priority;
err = pthread_setschedparam(thread, SCHED_FIFO, &sp);
if (err) {
fprintf(stderr, "pthread_setschedparam: %s\n", strerror(err));
return -1;
}
if (cpu < 0) {
return 0;
}
CPU_ZERO(&cpuset);
CPU_SET(cpu, &cpuset);
err = pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
if (err) {
fprintf(stderr, "pthread_setaffinity_np: %s\n", strerror(err));
return -1;
}
return 0;
}
static void usage(char *progname)
{
fprintf(stderr,
"\n"
"usage: %s [options]\n"
"\n"
" -c [num] run on CPU 'num'\n"
" -d [num] delta from wake up to txtime in nanoseconds (default %d)\n"
" -h prints this message and exits\n"
" -i [name] use network interface 'name'\n"
" -p [num] run with RT priorty 'num'\n"
" -P [num] period in nanoseconds (default %d)\n"
" -s do not use SO_TXTIME\n"
" -t [num] set SO_PRIORITY to 'num' (default %d)\n"
" -D set deadline mode for SO_TXTIME\n"
" -E enable error reporting on the socket error queue for SO_TXTIME\n"
" -b [tstamp] txtime of 1st packet as a 64bit [tstamp]. Default: now + ~2seconds\n"
" -m [mac_addr] dst MAC address\n"
"\n",
progname, DEFAULT_DELAY, DEFAULT_PERIOD, DEFAULT_PRIORITY);
}
int main(int argc, char *argv[])
{
int c, cpu = -1, err, fd, priority = -1;
clockid_t clkid = CLOCK_TAI;
char *iface = NULL, *progname;
/* Process the command line arguments. */
progname = strrchr(argv[0], '/');
progname = progname ? 1 + progname : argv[0];
while (EOF != (c = getopt(argc, argv, "c:d:hi:p:P:st:DEb:m:"))) {
switch (c) {
case 'c':
cpu = atoi(optarg);
break;
case 'd':
waketx_delay = atoi(optarg);
break;
case 'h':
usage(progname);
return 0;
case 'i':
iface = optarg;
break;
case 'p':
priority = atoi(optarg);
break;
case 'P':
period_nsec = atoi(optarg);
break;
case 's':
use_so_txtime = 0;
break;
case 't':
so_priority = atoi(optarg);
break;
case 'D':
use_deadline_mode = SOF_TXTIME_DEADLINE_MODE;
break;
case 'E':
receive_errors = SOF_TXTIME_REPORT_ERRORS;
break;
case 'b':
base_time = atoll(optarg);
break;
case 'm':
err = sscanf(optarg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
&mac_addr[0], &mac_addr[1], &mac_addr[2],
&mac_addr[3], &mac_addr[4], &mac_addr[5]);
if (err != 6) {
printf("Invalid MAC address\n");
return -1;
}
break;
case '?':
usage(progname);
return -1;
}
}
if (mac_addr[0] == 0 && mac_addr[1] == 0 && mac_addr[2] == 0) {
pr_err("Destination MAC Address must be specified.");
usage(progname);
return -1;
}
if (waketx_delay > 999999999 || waketx_delay < 0) {
pr_err("Bad wake up to transmission delay.");
usage(progname);
return -1;
}
if (period_nsec < 1000) {
pr_err("Bad period.");
usage(progname);
return -1;
}
if (!iface) {
pr_err("Need a network interface.");
usage(progname);
return -1;
}
if (set_realtime(pthread_self(), priority, cpu)) {
return -1;
}
fd = l2_open_socket(iface, clkid);
if (fd < 0) {
return -1;
}
err = run_nanosleep(clkid, fd);
close(fd);
return err;
}
PCAP_CFLAGS=$(shell pcap-config --cflags --libs)
all: dump-classifier udp_tai
dump-classifier: dump-classifier.c
${CC} ${CFLAGS} $(PCAP_CFLAGS) -o $@ $<
udp_tai: udp_tai.c
${CC} ${CFLAGS} -lpthread -o $@ $<
clean:
@rm dump-classifier
@rm udp_tai
.PHONY: clean debug
#!/bin/sh
#
# Copyright (c) 2018, Intel Corporation
#
# SPDX-License-Identifier: BSD-3-Clause
#
IFACE=$1
if [ -z $IFACE ]; then
echo "You must provide the network interface as first argument"
exit -1
fi
# Now plus 1 minute
PLUS_1MIN=$((`date +%s%N` + 37000000000 + (60 * 1000000000)))
# Base will the next "round" timestamp ~1 min from now, plus 250us
BASE=$(($PLUS_1MIN - ( $PLUS_1MIN % 1000000000 ) + 250000))
sudo ./udp_tai -i $IFACE -b $BASE -P 1000000 -t 3 -p 90 -d 600000 -u 7788
#!/bin/bash
#
# Copyright (c) 2018, Intel Corporation
#
# SPDX-License-Identifier: BSD-3-Clause
#
#
# DISCLAIMER
#
# This script is meant for testing purposes only.
# It provides an oversimplified approach for having a simple PTP
# network up and running, with each local node having its CLOCK_TAI
# offset adjusted.
#
# TODO:
# - find a way to fetch the TAI offset from ptp4l directly. Ivan
# suggested using pmc for that.
#
set -e
INTERFACE=none
TAI_OFFSET=37
PTP4L_VERBOSE=''
PHC2SYS_VERBOSE=''
if [ -z $PTP4L ]; then
PTP4L=$(which ptp4l)
fi
if [ -z $PHC2SYS ]; then
PHC2SYS=$(which phc2sys)
fi
# On the PTP master, if started with -M parameter, synchronize the
# system clock to PHC first, then propagate that to network using ptp4l.
# We trust that the system clock was initially setup correctly or adjusted
# to some other source (i.e. NTP, GPS, etc).
#
# For this -M mode, clocks are kept synchronized by phc2sys.
# This is provided for the scenarios in which the PTP master on this network
# is also running one end of the TSN application (either the listener or the
# talker), which requires the local clocks to be synchronized.
#
# When that isn't the case (i.e. the etf experiment, in which all we care
# about is the network clock sync), then just start this script with -m
# instead so phc2sys is not used and the jitter of the network clock sync is
# not affected.
#
setup_ptp_master() {
ptp4l -2 -i $INTERFACE $PTP4L_VERBOSE &
}
setup_ptp_master_and_sync() {
phc2sys -c $INTERFACE -s CLOCK_REALTIME -w $PHC2SYS_VERBOSE &
setup_ptp_master
}
# On PTP slaves, first synchronize the PHC to the PTP master,
# then synchronize the system clock to the PHC.
setup_ptp_slave() {
phc2sys -a -r $PHC2SYS_VERBOSE &
ptp4l -2 -s -i $INTERFACE $PTP4L_VERBOSE &
}
# Use adjtimex to set the TAI offset to CLOCK_TAI.
adjust_clock_tai_offset() {
tmp_src=$(mktemp /tmp/XXXXXX.c)
tmp_bin=$(mktemp)
cat <<EOF > $tmp_src
#include <stdio.h>
#include <stdlib.h>
#include <sys/timex.h>
int main(void)
{
struct timex timex = {
.modes = ADJ_TAI,
.constant = $TAI_OFFSET
};
if (adjtimex(&timex) == -1) {
perror("adjtimex failed to set CLOCK_TAI offset");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
EOF
gcc -o $tmp_bin $tmp_src
$tmp_bin
rm -f $tmp_bin $tmp_src
}
test_dependencies() {
if [ ! -x $PTP4L ]; then
echo "ptp4l must be available from your \$PATH or set \$PTP4L."
exit -1
fi
if [ ! -x $PHC2SYS ]; then
echo "phc2sys must be available from your \$PATH or set \$PHC2SYS."
exit -1
fi
}
test_dependencies
ptp_master_mode=f
while getopts "Mmsvi:" opt; do
case ${opt} in
i) INTERFACE=$OPTARG ;;
m) ptp_master_mode=y ;;
s) ptp_master_mode=n ;;
M) ptp_master_mode=M ;;
v) PTP4L_VERBOSE='-m --summary_interval=5' ;
PHC2SYS_VERBOSE='-m -u 20' ;;
*) exit -1 ;;
esac
done
if [ ${INTERFACE} = none ]; then
echo "You must set the network interface using '-i'."
exit -1
fi
if [ ${ptp_master_mode} = y ]; then
setup_ptp_master
adjust_clock_tai_offset
elif [ ${ptp_master_mode} = M ]; then
setup_ptp_master_and_sync
adjust_clock_tai_offset
elif [ ${ptp_master_mode} = n ]; then
setup_ptp_slave
adjust_clock_tai_offset
else
echo "You must select PTP master (-m) OR PTP slave (-s) mode."
exit -1
fi
#!/usr/bin/env python3
#
# Copyright (c) 2018, Intel Corporation
#
# SPDX-License-Identifier: BSD-3-Clause
#
# Expected input file format is a CSV file with:
#
# <FRAME_NUMBER, FRAME_ARRIVAL_TIME, FRAME_PAYLOAD_BYTES>
# E.g.:
# 1,1521534608.000000456,00:38:89:bd:a1:93:1d:15:(...)
# 2,1521534608.001000480,00:38:89:bd:a1:93:1d:15:(...)
#
# Frame number: sequence number for each frame
# Frame arrival time: Rx HW timestamp for each frame
# Frame Payload: payload starting with 64bit timestamp (txtime)
#
# This can be easily generated with tshark with the following command line:
# $ tshark -r CAPTURE.pcap -t e -E separator=, -T fields -e frame.number \
# -e frame.time_epoch \
# -e data.data > DATA.out
#
import argparse
import csv
import struct
import math
import sys
# TAI to UTC offset. Currently that is 37 seconds.
TAI_OFFSET = 37000000000
def compute_offsets_stats(file_path):
with open(file_path) as f:
count = mean = total_sqr_dist = 0.0
min_t = sys.maxsize
max_t = -sys.maxsize
for line in csv.reader(f):
arrival_tstamp = int(line[1].replace('.', ''))
data = line[2].split(':')
txtime = ''.join(data[0:8])
txtime = bytearray.fromhex(txtime)
txtime = struct.unpack('<Q', txtime)
val = float(arrival_tstamp - txtime[0])
val = (val - TAI_OFFSET) if val > TAI_OFFSET else val
# Update statistics.
# Compute the mean and variance online using Welford's algorithm.
count += 1
min_t = val if val < min_t else min_t
max_t = val if val > max_t else max_t
delta = val - mean
mean = mean + (delta / count)
new_delta = val - mean
total_sqr_dist += delta * new_delta
if count != 0.0:
variance = total_sqr_dist / (count - 1)
std_dev = math.sqrt(variance)
print("min:\t\t%e" % min_t)
print("max:\t\t%e" % max_t)
print("jitter (pk-pk):\t%e" % (max_t - min_t))
print("avg:\t\t%e" % mean)
print("std dev:\t%e" % std_dev)
print("count:\t\t%d" % count)
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
'-f', dest='file_path', default=None, type=str,
help='Path to input file (e.g. DATA.out) generated by tshark with:\
tshark -r CAPTURE.pcap -t e -E separator=, -T\
fields -e frame.number -e frame.time_epoch\
-e data.data > DATA.out')
args = parser.parse_args()
if args.file_path is not None:
compute_offsets_stats(args.file_path)
else:
parser.print_help()
if __name__ == "__main__":
main()
/*
* This program demonstrates transmission of UDP packets using the
* system TAI timer.
*
* Copyright (C) 2017 linutronix GmbH
*
* Large portions taken from the linuxptp stack.
* Copyright (C) 2011, 2012 Richard Cochran <richardcochran@gmail.com>
*
* Some portions taken from the sgd test program.
* Copyright (C) 2015 linutronix GmbH
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/
#define _GNU_SOURCE /*for CPU_SET*/
#include <arpa/inet.h>
#include <errno.h>
#include <fcntl.h>
#include <ifaddrs.h>
#include <linux/errqueue.h>
#include <linux/ethtool.h>
#include <linux/net_tstamp.h>
#include <linux/sockios.h>
#include <net/if.h>
#include <netinet/in.h>
#include <poll.h>
#include <pthread.h>
#include <sched.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#define ONE_SEC 1000000000ULL
#define DEFAULT_PERIOD 1000000
#define DEFAULT_DELAY 500000
#define DEFAULT_PRIORITY 3
#define MCAST_IPADDR "239.1.1.1"
#define UDP_PORT 7788
#define MARKER 'a'
#ifndef SO_TXTIME
#define SO_TXTIME 61
#define SCM_TXTIME SO_TXTIME
#endif
#ifndef SO_EE_ORIGIN_TXTIME
#define SO_EE_ORIGIN_TXTIME 6
#define SO_EE_CODE_TXTIME_INVALID_PARAM 1
#define SO_EE_CODE_TXTIME_MISSED 2
#endif
#define pr_err(s) fprintf(stderr, s "\n")
#define pr_info(s) fprintf(stdout, s "\n")
/* The API for SO_TXTIME is the below struct and enum, which will be
* provided by uapi/linux/net_tstamp.h in the near future.
*/
struct sock_txtime {
clockid_t clockid;
uint16_t flags;
};
enum txtime_flags {
SOF_TXTIME_DEADLINE_MODE = (1 << 0),
SOF_TXTIME_REPORT_ERRORS = (1 << 1),
SOF_TXTIME_FLAGS_LAST = SOF_TXTIME_REPORT_ERRORS,
SOF_TXTIME_FLAGS_MASK = (SOF_TXTIME_FLAGS_LAST - 1) |
SOF_TXTIME_FLAGS_LAST
};
static int running = 1, use_so_txtime = 1;
static int period_nsec = DEFAULT_PERIOD;
static int waketx_delay = DEFAULT_DELAY;
static int so_priority = DEFAULT_PRIORITY;
static int udp_port = UDP_PORT;
static int use_deadline_mode = 0;
static int receive_errors = 0;
static uint64_t base_time = 0;
static struct in_addr mcast_addr;
static struct sock_txtime sk_txtime;
static int mcast_bind(int fd, int index)
{
int err;
struct ip_mreqn req;
memset(&req, 0, sizeof(req));
req.imr_ifindex = index;
err = setsockopt(fd, IPPROTO_IP, IP_MULTICAST_IF, &req, sizeof(req));
if (err) {
pr_err("setsockopt IP_MULTICAST_IF failed: %m");
return -1;
}
return 0;
}
static int mcast_join(int fd, int index, const struct sockaddr *grp,
socklen_t grplen)
{
int err, off = 0;
struct ip_mreqn req;
struct sockaddr_in *sa = (struct sockaddr_in *) grp;
memset(&req, 0, sizeof(req));
memcpy(&req.imr_multiaddr, &sa->sin_addr, sizeof(struct in_addr));
req.imr_ifindex = index;
err = setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &req, sizeof(req));
if (err) {
pr_err("setsockopt IP_ADD_MEMBERSHIP failed: %m");
return -1;
}
err = setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP, &off, sizeof(off));
if (err) {
pr_err("setsockopt IP_MULTICAST_LOOP failed: %m");
return -1;
}
return 0;
}
static void normalize(struct timespec *ts)
{
while (ts->tv_nsec > 999999999) {
ts->tv_sec += 1;
ts->tv_nsec -= ONE_SEC;
}
while (ts->tv_nsec < 0) {
ts->tv_sec -= 1;
ts->tv_nsec += ONE_SEC;
}
}
static int sk_interface_index(int fd, const char *name)
{
struct ifreq ifreq;
int err;
memset(&ifreq, 0, sizeof(ifreq));
strncpy(ifreq.ifr_name, name, sizeof(ifreq.ifr_name) - 1);
err = ioctl(fd, SIOCGIFINDEX, &ifreq);
if (err < 0) {
pr_err("ioctl SIOCGIFINDEX failed: %m");
return err;
}
return ifreq.ifr_ifindex;
}
static int open_socket(const char *name, struct in_addr mc_addr, short port, clockid_t clkid)
{
struct sockaddr_in addr;
int fd, index, on = 1;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(port);
fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
if (fd < 0) {
pr_err("socket failed: %m");
goto no_socket;
}
index = sk_interface_index(fd, name);
if (index < 0)
goto no_option;
if (setsockopt(fd, SOL_SOCKET, SO_PRIORITY, &so_priority, sizeof(so_priority))) {
pr_err("Couldn't set priority");
goto no_option;
}
if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on))) {
pr_err("setsockopt SO_REUSEADDR failed: %m");
goto no_option;
}
if (bind(fd, (struct sockaddr *) &addr, sizeof(addr))) {
pr_err("bind failed: %m");
goto no_option;
}
if (setsockopt(fd, SOL_SOCKET, SO_BINDTODEVICE, name, strlen(name))) {
pr_err("setsockopt SO_BINDTODEVICE failed: %m");
goto no_option;
}
addr.sin_addr = mc_addr;
if (mcast_join(fd, index, (struct sockaddr *) &addr, sizeof(addr))) {
pr_err("mcast_join failed");
goto no_option;
}
if (mcast_bind(fd, index)) {
goto no_option;
}
sk_txtime.clockid = clkid;
sk_txtime.flags = (use_deadline_mode | receive_errors);
if (use_so_txtime && setsockopt(fd, SOL_SOCKET, SO_TXTIME, &sk_txtime, sizeof(sk_txtime))) {
pr_err("setsockopt SO_TXTIME failed: %m");
goto no_option;
}
return fd;
no_option:
close(fd);
no_socket:
return -1;
}
static int udp_open(const char *name, clockid_t clkid)
{
int fd;
if (!inet_aton(MCAST_IPADDR, &mcast_addr))
return -1;
fd = open_socket(name, mcast_addr, udp_port, clkid);
return fd;
}
static int udp_send(int fd, void *buf, int len, __u64 txtime)
{
char control[CMSG_SPACE(sizeof(txtime))] = {};
struct sockaddr_in sin;
struct cmsghdr *cmsg;
struct msghdr msg;
struct iovec iov;
ssize_t cnt;
memset(&sin, 0, sizeof(sin));
sin.sin_family = AF_INET;
sin.sin_addr = mcast_addr;
sin.sin_port = htons(udp_port);
iov.iov_base = buf;
iov.iov_len = len;
memset(&msg, 0, sizeof(msg));
msg.msg_name = &sin;
msg.msg_namelen = sizeof(sin);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
/*
* We specify the transmission time in the CMSG.
*/
if (use_so_txtime) {
msg.msg_control = control;
msg.msg_controllen = sizeof(control);
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_TXTIME;
cmsg->cmsg_len = CMSG_LEN(sizeof(__u64));
*((__u64 *) CMSG_DATA(cmsg)) = txtime;
}
cnt = sendmsg(fd, &msg, 0);
if (cnt < 1) {
pr_err("sendmsg failed: %m");
return cnt;
}
return cnt;
}
static unsigned char tx_buffer[256];
static int process_socket_error_queue(int fd)
{
uint8_t msg_control[CMSG_SPACE(sizeof(struct sock_extended_err))];
unsigned char err_buffer[sizeof(tx_buffer)];
struct sock_extended_err *serr;
struct cmsghdr *cmsg;
__u64 tstamp = 0;
struct iovec iov = {
.iov_base = err_buffer,
.iov_len = sizeof(err_buffer)
};
struct msghdr msg = {
.msg_iov = &iov,
.msg_iovlen = 1,
.msg_control = msg_control,
.msg_controllen = sizeof(msg_control)
};
if (recvmsg(fd, &msg, MSG_ERRQUEUE) == -1) {
pr_err("recvmsg failed");
return -1;
}
cmsg = CMSG_FIRSTHDR(&msg);
while (cmsg != NULL) {
serr = (void *) CMSG_DATA(cmsg);
if (serr->ee_origin == SO_EE_ORIGIN_TXTIME) {
tstamp = ((__u64) serr->ee_data << 32) + serr->ee_info;
switch(serr->ee_code) {
case SO_EE_CODE_TXTIME_INVALID_PARAM:
fprintf(stderr, "packet with tstamp %llu dropped due to invalid params\n", tstamp);
return 0;
case SO_EE_CODE_TXTIME_MISSED:
fprintf(stderr, "packet with tstamp %llu dropped due to missed deadline\n", tstamp);
return 0;
default:
return -1;
}
}
cmsg = CMSG_NXTHDR(&msg, cmsg);
}
return 0;
}
static int run_nanosleep(clockid_t clkid, int fd)
{
struct timespec ts;
int cnt, err;
__u64 txtime;
struct pollfd p_fd = {
.fd = fd,
};
memset(tx_buffer, MARKER, sizeof(tx_buffer));
/* If no base-time was specified, start one to two seconds in the
* future.
*/
if (base_time == 0) {
clock_gettime(clkid, &ts);
ts.tv_sec += 1;
ts.tv_nsec = ONE_SEC - waketx_delay;
} else {
ts.tv_sec = base_time / ONE_SEC;
ts.tv_nsec = (base_time % ONE_SEC) - waketx_delay;
}
normalize(&ts);
txtime = ts.tv_sec * ONE_SEC + ts.tv_nsec;
txtime += waketx_delay;
fprintf(stderr, "\ntxtime of 1st packet is: %llu", txtime);
while (running) {
memcpy(tx_buffer, &txtime, sizeof(__u64));
err = clock_nanosleep(clkid, TIMER_ABSTIME, &ts, NULL);
switch (err) {
case 0:
cnt = udp_send(fd, tx_buffer, sizeof(tx_buffer), txtime);
if (cnt != sizeof(tx_buffer)) {
pr_err("udp_send failed");
}
ts.tv_nsec += period_nsec;
normalize(&ts);
txtime += period_nsec;
/* Check if errors are pending on the error queue. */
err = poll(&p_fd, 1, 0);
if (err == 1 && p_fd.revents & POLLERR) {
if (!process_socket_error_queue(fd))
return -ECANCELED;
}
break;
case EINTR:
continue;
default:
fprintf(stderr, "clock_nanosleep returned %d: %s",
err, strerror(err));
return err;
}
}
return 0;
}
static int set_realtime(pthread_t thread, int priority, int cpu)
{
cpu_set_t cpuset;
struct sched_param sp;
int err, policy;
int min = sched_get_priority_min(SCHED_FIFO);
int max = sched_get_priority_max(SCHED_FIFO);
fprintf(stderr, "min %d max %d\n", min, max);
if (priority < 0) {
return 0;
}
err = pthread_getschedparam(thread, &policy, &sp);
if (err) {
fprintf(stderr, "pthread_getschedparam: %s\n", strerror(err));
return -1;
}
sp.sched_priority = priority;
err = pthread_setschedparam(thread, SCHED_FIFO, &sp);
if (err) {
fprintf(stderr, "pthread_setschedparam: %s\n", strerror(err));
return -1;
}
if (cpu < 0) {
return 0;
}
CPU_ZERO(&cpuset);
CPU_SET(cpu, &cpuset);
err = pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
if (err) {
fprintf(stderr, "pthread_setaffinity_np: %s\n", strerror(err));
return -1;
}
return 0;
}
static void usage(char *progname)
{
fprintf(stderr,
"\n"
"usage: %s [options]\n"
"\n"
" -c [num] run on CPU 'num'\n"
" -d [num] delta from wake up to txtime in nanoseconds (default %d)\n"
" -h prints this message and exits\n"
" -i [name] use network interface 'name'\n"
" -p [num] run with RT priorty 'num'\n"
" -P [num] period in nanoseconds (default %d)\n"
" -s do not use SO_TXTIME\n"
" -t [num] set SO_PRIORITY to 'num' (default %d)\n"
" -D set deadline mode for SO_TXTIME\n"
" -E enable error reporting on the socket error queue for SO_TXTIME\n"
" -b [tstamp] txtime of 1st packet as a 64bit [tstamp]. Default: now + ~2seconds\n"
" -u [port] use udp port 'port'\n"
"\n",
progname, DEFAULT_DELAY, DEFAULT_PERIOD, DEFAULT_PRIORITY);
}
int main(int argc, char *argv[])
{
int c, cpu = -1, err, fd, priority = -1;
clockid_t clkid = CLOCK_TAI;
char *iface = NULL, *progname;
/* Process the command line arguments. */
progname = strrchr(argv[0], '/');
progname = progname ? 1 + progname : argv[0];
while (EOF != (c = getopt(argc, argv, "c:d:hi:p:P:st:DEb:u:"))) {
switch (c) {
case 'c':
cpu = atoi(optarg);
break;
case 'd':
waketx_delay = atoi(optarg);
break;
case 'h':
usage(progname);
return 0;
case 'i':
iface = optarg;
break;
case 'p':
priority = atoi(optarg);
break;
case 'P':
period_nsec = atoi(optarg);
break;
case 's':
use_so_txtime = 0;
break;
case 't':
so_priority = atoi(optarg);
break;
case 'D':
use_deadline_mode = SOF_TXTIME_DEADLINE_MODE;
break;
case 'E':
receive_errors = SOF_TXTIME_REPORT_ERRORS;
break;
case 'b':
base_time = atoll(optarg);
break;
case 'u':
udp_port = atoi(optarg);
break;
case '?':
usage(progname);
return -1;
}
}
if (waketx_delay > 999999999 || waketx_delay < 0) {
pr_err("Bad wake up to transmission delay.");
usage(progname);
return -1;
}
if (period_nsec < 1000) {
pr_err("Bad period.");
usage(progname);
return -1;
}
if (!iface) {
pr_err("Need a network interface.");
usage(progname);
return -1;
}
if (set_realtime(pthread_self(), priority, cpu)) {
return -1;
}
fd = udp_open(iface, clkid);
if (fd < 0) {
return -1;
}
err = run_nanosleep(clkid, fd);
close(fd);
return err;
}
@KnutSchrader
Copy link

Hello jeez,
the python script reports an error: line 45, in compute_offset_stats txtime = struct.unpack('<Q', txtime)
struct.error: unpack requires a buffer of 8 bytes.
I have tried python2 and python3 version 3.7.3
Regards

@KnutSchrader
Copy link

Hello jeez,
another question:
output result L2_tai:
(iperf 3 running at 100 Mbit)

min: 4.670000e+02
max: 8.441950e+05
jitter (pk-pk): 8.437280e+05
avg: 4.377820e+04
std dev: 1.551230e+05
count: 60000
Is this the expected result?
Are the numbers values in nanoseconds?
Can you please explain?
Thanks a lot
Regards

@jeez
Copy link
Author

jeez commented Jul 1, 2019

Hello jeez,
the python script reports an error: line 45, in compute_offset_stats txtime = struct.unpack('<Q', txtime)
struct.error: unpack requires a buffer of 8 bytes.
I have tried python2 and python3 version 3.7.3
Regards

I've only seen that happening a few times in the past, and I don't fully recall the details. I have some recollection that it had nothing to do with the scripts, but rather with a packet's header and tshark if the first bytes clashed with some known protocol header. That's why I needed the "--disable-protocol foo" on the tshark command line. There has to be a better way of doing that but I confess I didn't look into it back then.

@jeez
Copy link
Author

jeez commented Jul 1, 2019

Hello jeez,

I have a setup with two separate machine having Linux kernel version 5.0.8 and I210 Card.

I had run first test case i.e Only with ETF qdisc and got Result, could you please let me know if you think it is really unexpected behavior or its something normal.

When I execute mqprio command and Etf qdisc command at talker side only , ( command is given below later), number of Packets captured by wireshark is only 18 Packet per second, on other hand if I run same application "udp_tai" without setting any mqprio and etf qdisc than wireshark capture 1000 packets per second. as expected by udp_tai application.

Could you please let me know its a bug or some expected behaviour after Etf qdisc configuration why number of packet captured by wireshark is reduced from 1000 per second to 18 packets per seconds.

**Linux kernel: 5.0.8

**
Listener 1
Talker 1

Talker commands:
TALKER #

ip addr add 192.168.0.77/4 broadcast 192.168.0.255 dev eth1
ifconfig eth1 up
./tc qdisc replace dev eth1 parent root handle 100 mqprio num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
./tc qdisc add dev eth1 parent 100:1 etf offload clockid CLOCK_TAI delta 150000
./ptp4l -i eth1 -p /dev/ptp1 -s

./tcpdump -c 60000 -i eth1 -w talker.pcap -j adapter_unsynced -tt udp port 7788

./udp_tai -i eth1 -P 1000000 -p 90 -d 600000

Listener Command:

ip addr add 192.168.0.78/4 broadcast 192.168.0.255 dev eth1
ifconfig eth1 up

./ptp4l -i eth1 -p /dev/ptp1 -m

./tcpdump -c 60000 -i eth1 -w listener.pcap -j adapter_unsynced -tt udp port 7788

And another question is about theory clarification if you could let me know about it it will be helpfull.

in Command: ./tc qdisc replace dev eth1 parent root handle 100 mqprio num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

What is meaning of " queues 1@0 1@1 2@2" ????

there is no proper clarification available about it, Is it mean queue 0 and queue 1 are mapped to priority 1 and queue 2 mapped to priority 2.

Please let me know about it.

Thanks and Regards
Bhaskar

Somehow I missed this comment and never got a notification for it, sorry.
I'll let @vcgomes chime in here and reply since I'm no longer working on this project and he is now the point-of-contact for it.

@jeez
Copy link
Author

jeez commented Jul 1, 2019

Hello jeez,
another question:
output result L2_tai:
(iperf 3 running at 100 Mbit)

min: 4.670000e+02
max: 8.441950e+05
jitter (pk-pk): 8.437280e+05
avg: 4.377820e+04
std dev: 1.551230e+05
count: 60000
Is this the expected result?
Are the numbers values in nanoseconds?
Can you please explain?
Thanks a lot
Regards

I'll let @vcgomes chime in here and reply since I'm no longer working on this project and he is now the point-of-contact for it.

@JayachandranRameshBabu
Copy link

JayachandranRameshBabu commented Jul 20, 2019

Hello @btripathi123,

I have a setup with two separate machine having Linux kernel version 5.0.8 and I210 Card.

I had run first test case i.e Only with ETF qdisc and got Result, could you please let me know if you think it is really unexpected behavior or its something normal.

When I execute mqprio command and Etf qdisc command at talker side only , ( command is given below later), number of Packets captured by wireshark is only 18 Packet per second, on other hand if I run same application "udp_tai" without setting any mqprio and etf qdisc than wireshark capture 1000 packets per second. as expected by udp_tai application.

Could you please let me know its a bug or some expected behaviour after Etf qdisc configuration why number of packet captured by wireshark is reduced from 1000 per second to 18 packets per seconds.

**Linux kernel: 5.0.8"

I think the problem here is "packet drop"

  • you should synchronize the system clock to a PTP hardware clock (PHC) using phc2sys
  • so that transmission of UDP packets from the user space (which contains the txtime calculated using clock_gettime - system-wide clock) will not get dropped due to txtime expiration at the time of reaching i210 hardware queue which is enabled with etf.
  • Command:
    phc2sys -s <IFNAME> -w -mq -O 0

Let me know if you need any other support.

Thanks,
Jayachandran

@jtc-dolby
Copy link

I've written an application based on the udp_tai.c example here using the i210 NIC running on 5.3.0. Everything seems to work fine but I occasionally I get a packet drop due to a scheduling error. Running the udp_tai.c example with the suggestion options and the -E switch I see the same behaviour where the application quits after a minute or so due to a packet drop. It would be nice if the app just reported the error and kept on running instead.
I have tried using the SOF_TXTIME_REPORT_ERRORS option myself but find the socket stops functioning immediately after an error occurs.
Any ideas?

@jeez
Copy link
Author

jeez commented Nov 27, 2019 via email

@JayachandranRameshBabu
Copy link

I've written an application based on the udp_tai.c example here using the i210 NIC running on 5.3.0. Everything seems to work fine but I occasionally I get a packet drop due to a scheduling error. Running the udp_tai.c example with the suggestion options and the -E switch I see the same behaviour where the application quits after a minute or so due to a packet drop. It would be nice if the app just reported the error and kept on running instead.
I have tried using the SOF_TXTIME_REPORT_ERRORS option myself but find the socket stops functioning immediately after an error occurs.
Any ideas?

Could you please share the wireshark trace and also post the execution steps you followed or rather post the architecture you are trying to implement. There are many reasons that can cause packet drop(see above comment which I have written, that was one of the cause for packet drop), I would suggest you to post end to end steps you followed.

Thanks.

@jtc-dolby
Copy link

jtc-dolby commented Dec 4, 2019

Thanks for your help. Much appreciated.
My application is transmission of audio over RTP i.e. AES67. However, as a first step it would really help me to get the provided example working reliably without any packet drops. As I'm seeing the same packet drops in my own code I'm figuring the cause is the same in both cases.
The machine is a ASROCK C2550-D4i running Ubuntu 19.10 which has a couple of i210 NICs. I'm running ptp4l and phc2sys as services and I'm synched up to a Sonifex AVN PTP grandmaster. I add the etf qdiscs when an interface comes up i.e. in /etc/network/if-up.d
I've forked this GIST here and I have a repo with the trace, standout output and executable here. When I run the example with error reporting turned on I get a single packet drop every few seconds.

update: The drops are performance related. I've got to a point where if the machine is quiet there are generally no drops but opening applications generally causes a few drops. I would have expected the main loop to always be able to catch up before a packet risks being dropped as a 1000 packets/sec is quite slow but that is not happening.

@JayachandranRameshBabu
Copy link

Thanks for sharing the artifacts @jtc-dolby. The below are the major reason for packet loss in your case;

  • etf qdisc: will drop any packets with a txtime in the past
  • a packet expires while waiting for being dequeued
    In both these cases, the time calculation involves; the txtime calculation is done in the application(with lead time) and qdisc schedules the packet accordingly in order to send out to the network. To freeze the issue check the below,
  1. remove the QDisc and check whether the packet drop is still exists(to remove follow the below commands)
  2. Now add the qdisc and increase the delta(fudge factor) value, FYR: http://man7.org/linux/man-pages/man8/tc-etf.8.html

Command to delete the Qdisc:

# tc qdisc delete dev <ETHIF> handle 100: parent root mqprio num_tc 3 \
            map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
            queues 1@0 1@1 2@2 \
            hw 0
# tc qdisc delete dev <ETHIF> parent 100:1 etf \
            clockid CLOCK_TAI delta 100000 offload

the above command schedule packets for 100 us(fudge factor) before their txtime

Please let me know the below;

  1. What is the maximum linux scheduler latency see in the real time PC?
  2. Max PHC2SYS and ptp4l offset?

@jtc-dolby
Copy link

jtc-dolby commented Dec 5, 2019

I think I've figured out what was going on here. To start with I was seeing two types of drops. INVALID_PARAM and MISSED_DEADLINE. Upon analyzing the actual timestamps I realised that the default wakeup time parameter used by the sample application was too small (600us). I could see under heavily load that the scheduling time was falling behind the transmit times. I increased that to 60ms and that has made all the INVALID_PARAMS drops go away. This is fine as my application can have any latency and I plan to schedule the 1ms period packets 1sec ahead of time i.e. 1000 packets waiting in the queue.

Under heavy load I am still getting the missed deadline drops as you can see here. The 60ms buffer is enough to stop scheduling of packets (actual in chart below) falling behind transmit but the drops happen anyway. My guess is this will be fixed increasing the 'delta' that you describe above so I will try that next and report back.

To answer the questions: The real-time latency i.e. wake-up jitter in the application is in the 50ms range. The highest max ptp4l offset I saw today was 811ns. The highest PHC2SYS offset today was 3.5us always generally it is much lower i.e. 100ns or so.

Did you mean to include 'delete' in the second code snippet above?

image

@jtc-dolby
Copy link

I increased the 'delta' in the qdisc command to 1.2ms (increased from 300us). I can still get drops when opening Apps etc. What is a reasonable maximum? I'll try increasing the App wake-up time so I've got more margin when scheduling packets and do another trace.

@vcgomes
Copy link

vcgomes commented Dec 6, 2019

Just a quick note that configuring the system for better realtime behavior will have a lot of impact, e.g. setting the CPU frequency to "performance", disabling power saving features (CONFIG_PCIE_ASPM_PERFORMANCE comes to mind), disabling Energy Efficient Ethernet on your NIC, etc. Also, consider running your TSN workload with higher priority (chrt --fifo <priority> <command>, for example)

@jtc-dolby
Copy link

Thanks for the tips. I'm convinced the example code is good and this is now more a matter of machine / kernel configuration to get the necessary performance to avoid the drops on wake-up. Because the application I am interested in is professional audio, no packet drops can be tolerated. I am disappointed that there are "fudge factors" as @JayachandranRameshBabu described them, that need to be tuned to avoid the drops. Do I also need to patch the kernel with real-time extensions etc.? Is that assumed?

@JayachandranRameshBabu
Copy link

JayachandranRameshBabu commented Dec 9, 2019

@jtc-dolby, As I mentioned earlier, there are multiple reasons for packet drop. Also I have asked a question around real time behavior tuning.

What is the maximum linux scheduler latency see in the real time PC?

To help you more around real time tuning;

Recommended energy saving optimization:
The wakeup latency from sleep mode is 300us and it cannot be done promptly. In a hard real time environment this affects the real time requirements. It also affects the jitter because it periodically monitoring the CPU. The following kernel configuration parameters shall be configured. We have tested these configuration in while benchmarking our TSN application.

CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE = N 
CONFIG_CPU_FREQ = N ->Frequency scaling
CONFIG_CPU_IDLE = N-> Disable transitions to low-power states 
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y

User application running on dedicated cores:
Taskset is used to set or retrieve the CPU affinity of a running process given its pid, or to launch a new command with a given CPU affinity.

$ taskset -c 1 chrt 50 ./<user_application_name>

Chrt – change scheduler priority of a process (0-99) where the processes scheduled under one of the real-time policies (SCHED_FIFO, SCHED_RR) have a sched_priority value in the range 1 (low) to 99(high)

@jtc-dolby
Copy link

jtc-dolby commented Dec 13, 2019

Well. It took me a while to get the new kernel running (haven't done that in a while). Because the platform I was using didn't allow me to disable secure boot I had to change platforms. I'm now running on a Xeon E3-1200 but still with i210 NIC (Supermicro X10SLL). I made the changes to the kernel as suggested but I found I was unable to change the options exactly because of configuration dependencies. Using menuconfig I deselected CONFIG_SCHED_MC_PRIO which allows me to deselect CONFIG_CPU_FREQ and CONFIG_ACPI_PROCESSOR to deselect CONFIG_CPU_IDLE. CONFIG_CPU_FREQ=N deselects everything under CONFIG_CPU_FREQ so selecting CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND was not possible. I set CONFIG_PCIEASPM_PERFORMANCE as well.
This ran significantly better in that there are now very few drops when the machine is lightly loaded. However, under a fall load test and opening and closing Apps (Firefox is a good one) I still get a few drops. I increased delta in the qdiscs to from 300us to 1.2ms but this doesn't seem to make much difference. I also tried disabling EEE but no difference (this was inactive anyway).
All the drops are MISSED_DEADLINE. Running my own app and scheduling a full second ahead of time I see exactly the same behaviour so I'm pretty sure the packet scheduling is OK. If the scheduling was getting behind I would expect to see INVALID_PARAM which I am not. I assume then that the problem is dequeuing under load. I am using max SCHED_FIFO priority everywhere and CPU affinity.
I've posted my changes to udp_tai.c to my repo. This monitors for reported dropped packets using the socket error queue which I have found very reliable and produces a csv report which shows the queuing and packet drops on a graph as above.
I would like to get access to a kernel that has been tested to be robust with this feature so I can debug the kernel differences as I assume that is where the problem is. Another option would be someone trying my modified udp_tai on a known good system. I have not changed the core of this code but just added packet drop monitoring and graph generation.

@jtc-dolby
Copy link

I've been thinking about this recently and have a few questions/thoughts to help take this forward. Do people agree with me that it is difficult to use this functionality (lots of tuning required) if your goal is to never have packet drops under all load conditions? Given that some applications, for example audio, are willing to trade some packet jitter to avoid drops it seems like some control of jitter tolerance would be a useful feature. This would define how late a packet can be dequeued before it is dropped allowing users to make the trade-off between timing accuracy and packet loss. Obviously if someone needs both then the machine must be kept lightly loaded. Is such a feature feasible?

@kcn21
Copy link

kcn21 commented Dec 22, 2021

Hello @jeez , I have downloaded this schedules TX tools folder to run an experiment on TAPRIO and ETF.
But when I executed make command, I got this error.

cc -I/usr/include -lpcap -o dump-classifier dump-classifier.c
/tmp/cc9QZIEP.o: In function parse_filters': dump-classifier.c:(.text+0x50c): undefined reference to pcap_compile'
dump-classifier.c:(.text+0x528): undefined reference to pcap_perror' /tmp/cc9QZIEP.o: In function match_packet':
dump-classifier.c:(.text+0xe63): undefined reference to pcap_offline_filter' /tmp/cc9QZIEP.o: In function classify_frames':
dump-classifier.c:(.text+0xed0): undefined reference to pcap_next_ex' dump-classifier.c:(.text+0xeec): undefined reference to pcap_perror'
dump-classifier.c:(.text+0x101e): undefined reference to pcap_next_ex' /tmp/cc9QZIEP.o: In function main':
dump-classifier.c:(.text+0x1183): undefined reference to pcap_fopen_offline_with_tstamp_precision' dump-classifier.c:(.text+0x1298): undefined reference to pcap_close'
collect2: error: ld returned 1 exit status
Makefile:6: recipe for target 'dump-classifier' failed
make: *** [dump-classifier] Error 1

Can you tell me what the problem is?
Thank you.

@JayachandranRameshBabu
Copy link

JayachandranRameshBabu commented Dec 23, 2021

Hi @kcn21,
Can you check, if you have installed the libpcap-dev library?

$ sudo apt-get install libpcap-dev

If still the problem exists, then I think it should be problem of linking libpcap library. The order of libpcap should be after the dump-classfier object creation in the MakeFile. Replace the below line in MakeFile and try it.

${CC} ${CFLAGS} $(PCAP_CFLAGS) -o $@ $< -lpcap

@GuoxiLin
Copy link

Hello @jeez and @JayachandranRameshBabu, after I created the environment and install the libpcap-dev, thanks to @JayachandranRameshBabu I fixed the problem in dump-classfier, but I got these error

cc -lpthread -o udp_tai udp_tai.c
/tmp/ccNSztn6.o: In function set_realtime': udp_tai.c:(.text+0xe16): undefined reference to pthread_setaffinity_np'
collect2: error: ld returned 1 exit status
Makefile:9: recipe for target 'udp_tai' failed
make: *** [udp_tai] Error 1

Can you tell me how to fix it?
Thank you very much.

@ChuanyuXue
Copy link

Hello @jeez and @JayachandranRameshBabu, after I created the environment and install the libpcap-dev, thanks to @JayachandranRameshBabu I fixed the problem in dump-classfier, but I got these error

cc -lpthread -o udp_tai udp_tai.c /tmp/ccNSztn6.o: In function set_realtime': udp_tai.c:(.text+0xe16): undefined reference to pthread_setaffinity_np' collect2: error: ld returned 1 exit status Makefile:9: recipe for target 'udp_tai' failed make: *** [udp_tai] Error 1

Can you tell me how to fix it? Thank you very much.

Hi @GuoxiLin,

I solved this problem by running gcc -pthread -o udp_tai udp_tai.c instead of using lpthread

@GuoxiLin
Copy link

Hello there and thank you @ChuanyuXue. I have figured out the problem. but now I am facing a new problem.

After I updated the iproute2 in https://github.com/jeez/iproute2.git and made the test in

TALKER

  1. Configure the Qdiscs on the talker side (Device Under Testing, DUT).
    Our DUT uses an Intel i210 NIC, and our setup here is as follows.

    1.a) First, we setup mqprio as the root qdisc:
    e.g.: $ sudo tc qdisc replace dev IFACE parent root handle 100 mqprio
    num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2
    queues 1@0 1@1 2@2 hw 0

    1.b) Then we setup etf with the desired config:
    e.g.: $ sudo tc qdisc add dev enp2s0 parent 100:1 etf
    offload clockid CLOCK_TAI delta 150000

I can not find 'mqprionum_tc' and 'etf'.

The tracebakes are:
Unknown qdisc "etf", hence option "offload" is unparsable
Unknown qdisc "mqprionum_tc", hence option "3" is unparsable

Could you tell me why?

Thank you very much again.

@aladin8
Copy link

aladin8 commented Sep 23, 2022

Hi all,

In short: After enabling ETF qdisc based on this example packets are only sent for a few seconds.

In detail: I have two computers that are connected via an Ethernet cable with Ubuntu 22.04 installed on them. I have a client on computer A) which is sending UDP packets to a server on computer B). I am examining the effects of the ETF qdisc with a TAPRIO qdisc as its parent on the traffic. I used the c socket library for the source code of the client utilizing the SO_PRIORITY socket option with the SCM_TXTIME control message type.

In order to visualize the effects of the ETF qdisc, I set the txtime of each packet in a 5-second interval to the end of that 5 second interval. Therefore, I am expecting a burst in every 5 seconds on the server side.

The problem is that I can only see that burst 2 or 3 times on the server side right after changing from PFIFO qdisc to the mentioned ETF and TAPRIO qdisc setup. After that, no packets get sent from the client side. When switching back go PFIFO qdisc while keeping the client and server alive, the packets arrive as they should, without bursts. Changing to TAPRIO and ETF qdiscs while the client and server is alive again produces a few bursts and nothing more afterwards.

Can somebody help me find out what can cause this behaviour?

Here are the relevant parts of my code:

`
//setting socket options:
static void setsockopt_txtime(int fd)
{
    struct sock_txtime so_txtime_val = { .clockid = CLOCK_TAI };
    struct sock_txtime so_txtime_val_read = { 0 };
    socklen_t vallen = sizeof(so_txtime_val);
    so_txtime_val.flags = (SOF_TXTIME_REPORT_ERRORS);
    if (setsockopt(fd, SOL_SOCKET, SO_TXTIME,
        &so_txtime_val, sizeof(so_txtime_val)))
        error(1, errno, "setsockopt txtime");

    if (getsockopt(fd, SOL_SOCKET, SO_TXTIME,
        &so_txtime_val_read, &vallen))
        error(1, errno, "getsockopt txtime");

    if (vallen != sizeof(so_txtime_val) ||
        memcmp(&so_txtime_val, &so_txtime_val_read, vallen))
        error(1, 0, "getsockopt txtime: mismatch");
}
//sending message and setting txtime for message
static int l2_send(int fd, void* buf, int len, __u64 txtime, struct sockaddr_in *servaddr)
{
    char control[CMSG_SPACE(sizeof(txtime))] = {};
    struct cmsghdr* cmsg;
    struct msghdr msg;
    struct iovec iov;
    ssize_t cnt;

    iov.iov_base = buf;
    iov.iov_len = len;

    memset(&msg, 0, sizeof(msg));
    msg.msg_name = (struct sockaddr*)servaddr;
    msg.msg_namelen = sizeof(*servaddr);

    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;

    //We specify the transmission time in the CMSG.
    msg.msg_control = control;
    msg.msg_controllen = sizeof(control);

    cmsg = CMSG_FIRSTHDR(&msg);
    cmsg->cmsg_level = SOL_SOCKET;
    cmsg->cmsg_type = SCM_TXTIME;
    cmsg->cmsg_len = CMSG_LEN(sizeof(__u64));
    *((__u64*)CMSG_DATA(cmsg)) = txtime;

    cnt = sendmsg(fd, &msg, 0);
    if (cnt < 1) {
        //pr_err("sendmsg failed: %m");
        printf("sending message failed!\n");
        return cnt;
    }
    printf("messaage sent!\ntxtime:%lf\n\n",(double)txtime);
    return cnt;
}
double timer_difference(struct timespec* tval_before, struct timespec* tval_after, struct timespec* tval_result)
{
    clock_gettime(CLOCK_TAI, tval_after);
    timespecsub(tval_after, tval_before, tval_result);
    double time_elapsed = (double)tval_result->tv_sec + ((double)tval_result->tv_nsec / 1000000000.0f);
    return time_elapsed;
}


//Relevant part of main funtion for calculating txtime:
int main(int argc, char* argv[]) {
.
.
    struct timespec txtime_base,txtime_base_after,txtime_difference;    
    // Creating socket file descriptor 
    if ((sockfd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
        perror("socket creation failed");
        exit(EXIT_FAILURE);
    }
    setsockopt_txtime(sockfd);
    memset(&servaddr, 0, sizeof(servaddr));
    // Filling server information 
    servaddr.sin_family = AF_INET;
    servaddr.sin_port = htons(server_port);
    inet_aton("10.0.0.70", &servaddr.sin_addr.s_addr);
    if (connect(sockfd , &servaddr,  sizeof(servaddr)))
        error(1, errno, "connect");

    clock_gettime(CLOCK_TAI, &txtime_base);
    clock_gettime(CLOCK_TAI, &txtime_base_after);
    txtime = txtime_base.tv_sec * (__u64)1000000000+ TXTIME_PERIOD*(__u64)8*(__u64)1000000000 + txtime_base.tv_nsec;
  
        while (1)
        {
            if(TXTIME_PERIOD<timer_difference(&txtime_base,&txtime_base_after,&txtime_difference)) //setting txtime here
            {
                clock_gettime(CLOCK_TAI, &txtime_base);
                txtime = txtime_base.tv_sec * (__u64)1000000000+ (__u64)TXTIME_PERIOD*(__u64)2*(__u64)1000000000 + txtime_base.tv_nsec;
                printf("in txtime if!\n");
            }
            while (time_elapsed < send_interval)  //sending interval without txtime here
            {
                
                time_elapsed = timer_difference(&tval_before, &tval_after, &tval_result);               
            } 
            txtime=txtime+(__u64)1000;       //creating 1 μs gap betwwen packets of one burst
            l2_send(sockfd, (const char*)send_buffer, strlen(send_buffer), txtime,&servaddr);
.
.
`

@p4pe
Copy link

p4pe commented Jan 10, 2023

Currently, I am working on a TSN project and I am trying to implement a TSN scenario in a Ubuntu 20.04 VM.

I leverage on tc qdisc command:

tc qdisc replace dev gateway-eth0 parent root handle 100 taprio \ 
num_tc 8 \
map 0 1 2 3 4 5 6 7 1 1 1 1 1 1 1 1 \ 
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ 
base-time 1000 \

clockid CLOCK_TAI \ 

sched-entry S 10 300000 \ 
sched-entry S 32 500000 \
sched-entry S 128 200000 \

Στιγμιότυπο οθόνης_20230110_074459

And also mangle the iptables to classify the packets based on the dscp field.
Στιγμιότυπο οθόνης_20230110_020819
From what I have read, with this qdisc command I define:

i) 8 traffic classes
ii) map priority 0 to TC0, 1 to TC1 2 to TC2 etc.
iii) TC0 is mapped to one TX queue TX-0, TC1 to TX-1, TC2 to TX2 etc.

In the sched-entry part of the command, my goal is to open TC2 and TC5, and TC7 queues for different periods.

To test this I sent ping packets using ping -Q 0x40 <IP> (priority 5) in order to send packets to traffic class 5.
Στιγμιότυπο οθόνης_20230110_074819

It seems that the traffic went to the correct queue.

But when I use ping -Q 0x16 <IP> (priority 2) the traffic went through the last queue
Στιγμιότυπο οθόνης_20230110_075106

Are the sched-entries right?

Thank you.

@wangxinli123321
Copy link

Hi, @JayachandranRameshBabu I'm trying to add SCM_CLOCKID and SCM_DROP_IF_LATE to the cmsg in this program but it doesn't work and keep printing "sendmsg failed: Invalid argument". Do you have any ideas about this problem?

@AkaCoder404
Copy link

AkaCoder404 commented Apr 21, 2023

@aladin8

Hi all,

In short: After enabling ETF qdisc based on this example packets are only sent for a few seconds.

In detail: I have two computers that are connected via an Ethernet cable with Ubuntu 22.04 installed on them. I have a client on computer A) which is sending UDP packets to a server on computer B). I am examining the effects of the ETF qdisc with a TAPRIO qdisc as its parent on the traffic. I used the c socket library for the source code of the client utilizing the SO_PRIORITY socket option with the SCM_TXTIME control message type.

In order to visualize the effects of the ETF qdisc, I set the txtime of each packet in a 5-second interval to the end of that 5 second interval. Therefore, I am expecting a burst in every 5 seconds on the server side.

The problem is that I can only see that burst 2 or 3 times on the server side right after changing from PFIFO qdisc to the mentioned ETF and TAPRIO qdisc setup. After that, no packets get sent from the client side. When switching back go PFIFO qdisc while keeping the client and server alive, the packets arrive as they should, without bursts. Changing to TAPRIO and ETF qdiscs while the client and server is alive again produces a few bursts and nothing more afterwards.

Can somebody help me find out what can cause this behaviour?

Here are the relevant parts of my code:

`
//setting socket options:
static void setsockopt_txtime(int fd)
{
    struct sock_txtime so_txtime_val = { .clockid = CLOCK_TAI };
    struct sock_txtime so_txtime_val_read = { 0 };
    socklen_t vallen = sizeof(so_txtime_val);
    so_txtime_val.flags = (SOF_TXTIME_REPORT_ERRORS);
    if (setsockopt(fd, SOL_SOCKET, SO_TXTIME,
        &so_txtime_val, sizeof(so_txtime_val)))
        error(1, errno, "setsockopt txtime");

    if (getsockopt(fd, SOL_SOCKET, SO_TXTIME,
        &so_txtime_val_read, &vallen))
        error(1, errno, "getsockopt txtime");

    if (vallen != sizeof(so_txtime_val) ||
        memcmp(&so_txtime_val, &so_txtime_val_read, vallen))
        error(1, 0, "getsockopt txtime: mismatch");
}
//sending message and setting txtime for message
static int l2_send(int fd, void* buf, int len, __u64 txtime, struct sockaddr_in *servaddr)
{
    char control[CMSG_SPACE(sizeof(txtime))] = {};
    struct cmsghdr* cmsg;
    struct msghdr msg;
    struct iovec iov;
    ssize_t cnt;

    iov.iov_base = buf;
    iov.iov_len = len;

    memset(&msg, 0, sizeof(msg));
    msg.msg_name = (struct sockaddr*)servaddr;
    msg.msg_namelen = sizeof(*servaddr);

    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;

    //We specify the transmission time in the CMSG.
    msg.msg_control = control;
    msg.msg_controllen = sizeof(control);

    cmsg = CMSG_FIRSTHDR(&msg);
    cmsg->cmsg_level = SOL_SOCKET;
    cmsg->cmsg_type = SCM_TXTIME;
    cmsg->cmsg_len = CMSG_LEN(sizeof(__u64));
    *((__u64*)CMSG_DATA(cmsg)) = txtime;

    cnt = sendmsg(fd, &msg, 0);
    if (cnt < 1) {
        //pr_err("sendmsg failed: %m");
        printf("sending message failed!\n");
        return cnt;
    }
    printf("messaage sent!\ntxtime:%lf\n\n",(double)txtime);
    return cnt;
}
double timer_difference(struct timespec* tval_before, struct timespec* tval_after, struct timespec* tval_result)
{
    clock_gettime(CLOCK_TAI, tval_after);
    timespecsub(tval_after, tval_before, tval_result);
    double time_elapsed = (double)tval_result->tv_sec + ((double)tval_result->tv_nsec / 1000000000.0f);
    return time_elapsed;
}


//Relevant part of main funtion for calculating txtime:
int main(int argc, char* argv[]) {
.
.
    struct timespec txtime_base,txtime_base_after,txtime_difference;    
    // Creating socket file descriptor 
    if ((sockfd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
        perror("socket creation failed");
        exit(EXIT_FAILURE);
    }
    setsockopt_txtime(sockfd);
    memset(&servaddr, 0, sizeof(servaddr));
    // Filling server information 
    servaddr.sin_family = AF_INET;
    servaddr.sin_port = htons(server_port);
    inet_aton("10.0.0.70", &servaddr.sin_addr.s_addr);
    if (connect(sockfd , &servaddr,  sizeof(servaddr)))
        error(1, errno, "connect");

    clock_gettime(CLOCK_TAI, &txtime_base);
    clock_gettime(CLOCK_TAI, &txtime_base_after);
    txtime = txtime_base.tv_sec * (__u64)1000000000+ TXTIME_PERIOD*(__u64)8*(__u64)1000000000 + txtime_base.tv_nsec;
  
        while (1)
        {
            if(TXTIME_PERIOD<timer_difference(&txtime_base,&txtime_ba