Skip to content

Instantly share code, notes, and snippets.

@chrisaycock
Last active December 22, 2023 11:04
Show Gist options
  • Save chrisaycock/f0e064314cdfe48d5d662fd188332b74 to your computer and use it in GitHub Desktop.
Save chrisaycock/f0e064314cdfe48d5d662fd188332b74 to your computer and use it in GitHub Desktop.
Example for accessing UDP packets in a pcap file
// answer to https://quant.stackexchange.com/q/55088/35
// compile with c++ pcap_udp.cpp -lpcap
#include <iostream>
#include <pcap/pcap.h>
#include <netinet/if_ether.h>
#include <netinet/ip.h>
#include <netinet/udp.h>
// return payload and length by skipping headers in a packet
const u_char* get_payload(const u_char *packet, u_int *length) {
const ip* ip_;
const udphdr* udphdr_;
u_int ip_length;
u_int udphdr_length;
// Ethernet header starts with destination and source addresses
const u_char* payload = packet;
payload += 12;
// search for IP header; assume all other Ethernet types are vlan
while (ntohs(*reinterpret_cast<const u_short*>(payload)) != ETHERTYPE_IP) {
payload += 4;
}
payload += 2;
// IP header can vary in length
ip_ = reinterpret_cast<const ip*>(payload);
ip_length = ip_->ip_hl * 4;
payload += ip_length;
// ensure this is UDP
if (ip_->ip_p != IPPROTO_UDP) {
*length = 0;
payload = nullptr;
}
else {
// UDP header is static length
udphdr_ = reinterpret_cast<const udphdr*>(payload);
udphdr_length = sizeof(udphdr);
*length = ntohs(udphdr_->uh_ulen) - udphdr_length;
payload += udphdr_length;
}
return payload;
}
int main(int argc, char* argv[]) {
// verify parameters
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " file.pcap" << std::endl;
return 1;
}
// open the pcap file
char errbuf[PCAP_ERRBUF_SIZE];
pcap_t* p = pcap_open_offline(argv[1], errbuf);
if (p == nullptr) {
std::cerr << "Error: " << errbuf << std::endl;
return 1;
}
bool done = false;
while (!done) {
// read a packet
pcap_pkthdr* header;
const u_char* packet;
const u_char* payload;
u_int payload_length;
int ret = pcap_next_ex(p, &header, &packet);
// handle packet
switch (ret) {
case 1:
// legitimate packet
payload = get_payload(packet, &payload_length);
std::cout << "Got data of length " << payload_length << std::endl;
break;
case 0:
// timeout
break;
case -1:
// error
std::cerr << "Error: " << pcap_geterr(p) << std::endl;
return 1;
case -2:
// end of file
done = true;
break;
}
}
return 0;
}
@vimell
Copy link

vimell commented Jun 21, 2020

Thank you so much. Got it. So once I get to the payload, how would you recommend that I convert the payload to human readable format: such as TimeStamp, Quotes, Price, etc etc?

@chrisaycock
Copy link
Author

The individual fields have to be decoded one-by-one. Worse, each exchange has its own format. Examples:

  • Timestamp could be microseconds since Unix epoch, nanoseconds since midnight today, or milliseconds since last Sunday.
  • Price denominator could be sent in each message, fixed upfront for all symbols, or reported for each symbol every day in a securities file.
  • Ticker symbol can be embedded in the message, or represented by a number that must be referenced in the securities file.
  • Symbols themselves differ across exchanges. (Eg., BRK/B vs BRK.B)
  • Sequence number can increment by one for each UDP packet, or by the number of individual updates in the packet.
  • Numbers might be big or little endian.
  • The message may even include trades that were routed to a different exchange; you'll have to check flags to know when to ignore.

The above should give you an idea of why feed handlers are unique for each exchange operator.

Oh, and struct can't have padding. The easiest way to handle that is to byte-align your data definitions:

#pragma pack(push, 1)

struct Trade { ... };
struct Quote { ... };
struct Imbalance { ... };
struct Cross { ... };

#pragma pack(pop)

@vimell
Copy link

vimell commented Jun 22, 2020

I see. Thank you so much. Wow, this is more work than I thought. I was hoping I could cast the payload to some human readable format such as a string and use string operators (or regex) to extract the fields.

But I guess I have to decode each field separately using byte operations. As always, I am beyond grateful for your help here. I will get back to you on Empirical later this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment