Created
May 12, 2014 04:00
-
-
Save vikasnkumar/fe8263ae3353acd2e9fc to your computer and use it in GitHub Desktop.
Pegex variable rule request
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# let us assume we are parsing a financial binary feed. Each message fits into the following C-like structure | |
struct Msg { | |
string Ticker; | |
string Exchange; | |
byte Type; #### Type can be says something like Stock/Forex/Future/Option | |
int Count; #### No of KeyValuePairs | |
struct KeyValue { | |
string Key; | |
union Value { | |
int i; | |
long l; | |
float f; | |
double d; | |
}; | |
} *KeyValuePairs; | |
}; | |
## This structure gets sent across the wire. | |
# until you parse the value of Count, you cannot tell that the following binary data is part of the KeyValuePairs array or the start of a new Msg object. | |
GOOG\x00NYSE\x00\x01\x00\x00\x00\x00IBM\x00NASDAQ\x00\x01\x00\x00\x00\x02ABCD\x00\x01\x02\x03\x04\x05\x06\x07\x08EFGH\x00\xA1\xB2\xC3\xD4\xE5\xF6\x07\x18 | |
This has 2 Msg objects | |
Msg object 1 has the following values: | |
Ticker: GOOG | |
Exchange: NYSE | |
Type: 1 | |
Count: 0 | |
KeyValuePairs: not present since Count is 0 | |
Msg object 2 has the following values: | |
Ticker: IBM | |
Exchange: NASDAQ | |
Type: 1 | |
Count: 2 | |
KeyValuePairs: | |
- { ABCD, 0x0807060504030201 } | |
- { EFGH, 0x1807F6E5D4C3B2A1 } | |
As you can see the union Value is stored as 64-bits and in little endian format. | |
Each string is terminated by a NULL or \x00 character but each non-string entity is of fixed known byte length | |
If in Msg 2, the Count value was not parsed as 2 but as 1, then EFGH would be the start of the next Msg object instead of being part of the previous Msg object's KeyValuePairs array. | |
This above is how financial feeds are structured and various embedded firmwares are also structured this way. | |
Reversing the firmware requires binary analysis, which requires parsing of the feeds. If Pegex can parse feeds like above, then that is a huge leap in developement of such tools. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'll write up a grammar for this.