Skip to content

Instantly share code, notes, and snippets.

@zachriggle
Last active October 25, 2017 01:53
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zachriggle/915ee592ffc114d527284ee517bdff73 to your computer and use it in GitHub Desktop.
Save zachriggle/915ee592ffc114d527284ee517bdff73 to your computer and use it in GitHub Desktop.
I fucking hate you, GDB

Lots of commands in GDB's protocol use hex-encoded data. A $ starts a packet, and all packets end with # followed by a one-byte, hex-encoded checksum.

Let's look at the protocol for the request:

remote get /proc/self/cmdline ./cmdline

Which should fetch /proc/self/cmdline and dump it to ./cmdline. It does!

$ phd cmdline
00000000  67 64 62 73  65 72 76 65  72 00 6c 6f  63 61 6c 68  │gdbs│erve│r·lo│calh│
00000010  6f 73 74 3a  31 31 31 31  00 2f 62 69  6e 2f 66 61  │ost:│1111│·/bi│n/fa│
00000020  6c 73 65 00                                         │lse·││
00000024

vFile:open

The filename is hex-encoded. The additional arguments are just hex integers.

127.000.000.001.47102-127.000.000.001.01111:
0000: 2476 4669 6c65 3a6f 7065 6e3a 3266 3730 3732 3666 3633 3266 3733 3635 3663 3636  $vFile:open:2f70726f632f73656c66
0020: 3266 3633 3664 3634 3663 3639 3665 3635 2c30 2c30 2363 36                        2f636d646c696e65,0,0#c6

The response is a hex file descriptor (5). Makes sense!

127.000.000.001.01111-127.000.000.001.47102:
0000: 2446 3523 3762  $F5#7b

vFile:pread

The request uses hex arguments for fd, count, and offset.

127.000.000.001.47102-127.000.000.001.01111:
0000: 2476 4669 6c65 3a70 7265 6164 3a35 2c33 6666 662c 3023 3938  $vFile:pread:5,3fff,0#98

The response is plain binary data.

127.000.000.001.01111-127.000.000.001.47102:
0000: 2446 3234 3b67 6462 7365 7276 6572 006c 6f63 616c 686f 7374 3a31 2a20 002f 6269  $F24;gdbserver.localhost:1* ./bi
0020: 6e2f 6661 6c73 6500 2363 62                                                      n/false.#cb

Wait, what?

But wait a goddamn second. The response F24; says the response is 0x24 bytes long. From phd output in the intro, we see this is true! However, the data on the wire isn't!

00000000  67 64 62 73  65 72 76 65  72 00 6c 6f  63 61 6c 68  │gdbs│erve│r·lo│calh│
00000010  6f 73 74 3a  31 2a 20 00  2f 62 69 6e  2f 66 61 6c  │ost:│1* ·│/bin│/fal│
00000020  73 65 00                                            │se·│
00000023

Also, the port is wrong. The real data shows that we're listening on port 11111. But the protcol has '1* ' instead. The shit?

Apparently, GDB does escaping and run-length encoding of its packets. Which is fucking stupid*.

Binary data in most packets is encoded either as two hexadecimal digits per byte of binary data. This allowed the traditional remote protocol to work over connections which were only seven-bit clean. Some packets designed more recently assume an eight-bit clean connection, and use a more efficient encoding to send and receive binary data.

Some packets? Which packets!?!?!

The binary data representation uses 7d (ASCII ‘}’) as an escape character. Any escaped byte is transmitted as the escape character followed by the original character XORed with 0x20. For example, the byte 0x7d would be transmitted as the two bytes 0x7d 0x5d. The bytes 0x23 (ASCII ‘#’), 0x24 (ASCII ‘$’), and 0x7d (ASCII ‘}’) must always be escaped. Responses sent by the stub must also escape 0x2a (ASCII ‘*’), so that it is not interpreted as the start of a run-length encoded sequence (described next).

Are you fucking kidding me?

Response data can be run-length encoded to save space. Run-length encoding replaces runs of identical characters with one instance of the repeated character, followed by a ‘*’ and a repeat count. The repeat count is itself sent encoded, to avoid binary characters in data: a value of n is sent as n+29. For a repeat count greater or equal to 3, this produces a printable ASCII character, e.g. a space (ASCII code 32) for a repeat count of 3. (This is because run-length encoding starts to win for counts 3 or more.) Thus, for example, ‘0* ’ is a run-length encoding of “0000”: the space character after ‘*’ means repeat the leading 0 32 - 29 = 3 more times.

Mother of shit, this is ridiculous. If bandwidth is a concern, useg deflate, not "lol you have to escape curly braces".

* I understand why. The packetetizer can't know about vFile and its embedded length. Things grow organically, and sometimes the least disruptive solutions have caveats. Backward compatibility is both important and difficult. I still vote that it's a shitty protocol.

@alexbecker
Copy link

Two * in last quoted paragraph should be escaped. They're currently being interpreted as markdown and making the middle of the paragraph italic.

@zachriggle
Copy link
Author

Thanks @alexbecker, I think I've fixed it <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment