Skip to content

Instantly share code, notes, and snippets.

@chrisaycock
Last active December 22, 2023 11:04
Show Gist options
  • Save chrisaycock/f0e064314cdfe48d5d662fd188332b74 to your computer and use it in GitHub Desktop.
Save chrisaycock/f0e064314cdfe48d5d662fd188332b74 to your computer and use it in GitHub Desktop.
Example for accessing UDP packets in a pcap file
// answer to https://quant.stackexchange.com/q/55088/35
// compile with c++ pcap_udp.cpp -lpcap
#include <iostream>
#include <pcap/pcap.h>
#include <netinet/if_ether.h>
#include <netinet/ip.h>
#include <netinet/udp.h>
// return payload and length by skipping headers in a packet
const u_char* get_payload(const u_char *packet, u_int *length) {
const ip* ip_;
const udphdr* udphdr_;
u_int ip_length;
u_int udphdr_length;
// Ethernet header starts with destination and source addresses
const u_char* payload = packet;
payload += 12;
// search for IP header; assume all other Ethernet types are vlan
while (ntohs(*reinterpret_cast<const u_short*>(payload)) != ETHERTYPE_IP) {
payload += 4;
}
payload += 2;
// IP header can vary in length
ip_ = reinterpret_cast<const ip*>(payload);
ip_length = ip_->ip_hl * 4;
payload += ip_length;
// ensure this is UDP
if (ip_->ip_p != IPPROTO_UDP) {
*length = 0;
payload = nullptr;
}
else {
// UDP header is static length
udphdr_ = reinterpret_cast<const udphdr*>(payload);
udphdr_length = sizeof(udphdr);
*length = ntohs(udphdr_->uh_ulen) - udphdr_length;
payload += udphdr_length;
}
return payload;
}
int main(int argc, char* argv[]) {
// verify parameters
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " file.pcap" << std::endl;
return 1;
}
// open the pcap file
char errbuf[PCAP_ERRBUF_SIZE];
pcap_t* p = pcap_open_offline(argv[1], errbuf);
if (p == nullptr) {
std::cerr << "Error: " << errbuf << std::endl;
return 1;
}
bool done = false;
while (!done) {
// read a packet
pcap_pkthdr* header;
const u_char* packet;
const u_char* payload;
u_int payload_length;
int ret = pcap_next_ex(p, &header, &packet);
// handle packet
switch (ret) {
case 1:
// legitimate packet
payload = get_payload(packet, &payload_length);
std::cout << "Got data of length " << payload_length << std::endl;
break;
case 0:
// timeout
break;
case -1:
// error
std::cerr << "Error: " << pcap_geterr(p) << std::endl;
return 1;
case -2:
// end of file
done = true;
break;
}
}
return 0;
}
@vimell
Copy link

vimell commented Jun 21, 2020

This is great. Thank you so much. Is there anyway to print packet or payload to see the content? Something like
std::cout << packet << std::endl;

@chrisaycock
Copy link
Author

Wireshark can show you the hexdump, but it probably won't be sensible because the packet and payload are compacted binary. You'll have to know each field by offset, plus how to interpret the enums. Also, the easiest way to verify that you're looking at the right location in the header is to print the Ethernet MAC address, IP address, and UDP port numbers; those all require some level of coding anyway.

@vimell
Copy link

vimell commented Jun 21, 2020

Thank you so much. Got it. So once I get to the payload, how would you recommend that I convert the payload to human readable format: such as TimeStamp, Quotes, Price, etc etc?

@chrisaycock
Copy link
Author

The individual fields have to be decoded one-by-one. Worse, each exchange has its own format. Examples:

  • Timestamp could be microseconds since Unix epoch, nanoseconds since midnight today, or milliseconds since last Sunday.
  • Price denominator could be sent in each message, fixed upfront for all symbols, or reported for each symbol every day in a securities file.
  • Ticker symbol can be embedded in the message, or represented by a number that must be referenced in the securities file.
  • Symbols themselves differ across exchanges. (Eg., BRK/B vs BRK.B)
  • Sequence number can increment by one for each UDP packet, or by the number of individual updates in the packet.
  • Numbers might be big or little endian.
  • The message may even include trades that were routed to a different exchange; you'll have to check flags to know when to ignore.

The above should give you an idea of why feed handlers are unique for each exchange operator.

Oh, and struct can't have padding. The easiest way to handle that is to byte-align your data definitions:

#pragma pack(push, 1)

struct Trade { ... };
struct Quote { ... };
struct Imbalance { ... };
struct Cross { ... };

#pragma pack(pop)

@vimell
Copy link

vimell commented Jun 22, 2020

I see. Thank you so much. Wow, this is more work than I thought. I was hoping I could cast the payload to some human readable format such as a string and use string operators (or regex) to extract the fields.

But I guess I have to decode each field separately using byte operations. As always, I am beyond grateful for your help here. I will get back to you on Empirical later this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment