Skip to content

Instantly share code, notes, and snippets.

@vikasnkumar
Created May 12, 2014 04:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vikasnkumar/fe8263ae3353acd2e9fc to your computer and use it in GitHub Desktop.
Save vikasnkumar/fe8263ae3353acd2e9fc to your computer and use it in GitHub Desktop.
Pegex variable rule request
# let us assume we are parsing a financial binary feed. Each message fits into the following C-like structure
struct Msg {
string Ticker;
string Exchange;
byte Type; #### Type can be says something like Stock/Forex/Future/Option
int Count; #### No of KeyValuePairs
struct KeyValue {
string Key;
union Value {
int i;
long l;
float f;
double d;
};
} *KeyValuePairs;
};
## This structure gets sent across the wire.
# until you parse the value of Count, you cannot tell that the following binary data is part of the KeyValuePairs array or the start of a new Msg object.
GOOG\x00NYSE\x00\x01\x00\x00\x00\x00IBM\x00NASDAQ\x00\x01\x00\x00\x00\x02ABCD\x00\x01\x02\x03\x04\x05\x06\x07\x08EFGH\x00\xA1\xB2\xC3\xD4\xE5\xF6\x07\x18
This has 2 Msg objects
Msg object 1 has the following values:
Ticker: GOOG
Exchange: NYSE
Type: 1
Count: 0
KeyValuePairs: not present since Count is 0
Msg object 2 has the following values:
Ticker: IBM
Exchange: NASDAQ
Type: 1
Count: 2
KeyValuePairs:
- { ABCD, 0x0807060504030201 }
- { EFGH, 0x1807F6E5D4C3B2A1 }
As you can see the union Value is stored as 64-bits and in little endian format.
Each string is terminated by a NULL or \x00 character but each non-string entity is of fixed known byte length
If in Msg 2, the Count value was not parsed as 2 but as 1, then EFGH would be the start of the next Msg object instead of being part of the previous Msg object's KeyValuePairs array.
This above is how financial feeds are structured and various embedded firmwares are also structured this way.
Reversing the firmware requires binary analysis, which requires parsing of the feeds. If Pegex can parse feeds like above, then that is a huge leap in developement of such tools.
@ingydotnet
Copy link

I'll write up a grammar for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment