Skip to content

Instantly share code, notes, and snippets.

@starhopp3r
Forked from jvns/linkers_101.md
Created November 18, 2018 14:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save starhopp3r/c0913b6b3adcd8396988881eb87b72bb to your computer and use it in GitHub Desktop.
Save starhopp3r/c0913b6b3adcd8396988881eb87b72bb to your computer and use it in GitHub Desktop.
How to understand what's in a binary, with code!

Step 0: A program, and prerequisites

We're going to be dealing with a "Hello, world!" program. Just one. There's going to be a bunch of fruit for discussion here. I'm going to be assuming you're on Linux, because we're gonna be talking about ELF and Macs use Mach-O and I don't know anything about Mach-O.

#include <stdio.h>

char *penguin = "Penguin";
char array[5] = {'a', 'b', 'c', 'd', 'e'};

int blah() {
  return 0;
}

int main() {
  char *monkey = "Monkey";
  int y = blah();
  printf("Hello worl%c!\n", 'd');
}

This program just prints "Hello world!".

Step 1: Compile it!

gcc hello.c -o hello

That's all.

Step 2: Let's look at the insides!

We're going to use a tool called nm to peer into binaries. First, we're going to look at the symbol table. This is a list of the symbols in your file. Symbols are variables, functions, and some other things that we're going to ignore.

$ nm hello
[some stuff removed]
0000000000601028 D array
00000000004004f4 T blah
000000000040043c t call_gmon_start
0000000000601030 b completed.6531
0000000000601010 W data_start
0000000000601038 b dtor_idx.6533
00000000004004d0 t frame_dummy
00000000004004ff T main
0000000000601020 D penguin
                 U printf@@GLIBC_2.2.5

You can see the whole output of nm here.

Okay, so there is some incomprensible stuff here! But here are the ones I understand:

$ nm hello
0000000000601028 D array   <== our array!
0000000000601020 D penguin <== our string called penguin

00000000004004f4 T blah   <== our function "blah"!
00000000004004ff T main   <== our function "main"!

                 U printf@@GLIBC_2.2.5 <== the "printf" function we called!

This is great. We have looked into the belly of the binary and recognized some things.

Step 3: Finding "penguin", "monkey", and "abcde"

Okay, so where is the string "penguin" in our program? We can look at it using objdump -s.

$ objdump -s hello
[lots of stuff removed]
Contents of section .rodata:
 400628 01000200 50656e67 75696e00 4d6f6e6b  ....Penguin.Monk
 400638 65790048 656c6c6f 20776f72 6c256321  ey.Hello worl%c!
 400648 0a00     

You can see the entire output of objdump here.

So this is telling us that there's a part of our binary that says "Penguin\0Monkey\0Hello world%c!\0", and it's part of a section called .rodata.

Sections? Every binary is made up of a bunch of sections. The most important ones to understand for now are .text, .data, .rodata, and .bss.

  • .text: Contains the assembly code that runs
  • .data, .rodata, .bss: Contain various kinds of data.

Where's abcde? That's in .data:

Contents of section .data:
 601010 00000000 00000000 00000000 00000000  ................
 601020 2c064000 00000000 61626364 65000000  ..@.....abcde...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment