We're going to be dealing with a "Hello, world!" program. Just one. There's going to be a bunch of fruit for discussion here. I'm going to be assuming you're on Linux, because we're gonna be talking about ELF and Macs use Mach-O and I don't know anything about Mach-O.
#include <stdio.h>
char *penguin = "Penguin";
char array[5] = {'a', 'b', 'c', 'd', 'e'};
int blah() {
return 0;
}
int main() {
char *monkey = "Monkey";
int y = blah();
printf("Hello worl%c!\n", 'd');
}
This program just prints "Hello world!".
gcc hello.c -o hello
That's all.
We're going to use a tool called nm
to peer into binaries. First, we're going to look at the symbol table. This is a list of the symbols in your file. Symbols are variables, functions, and some other things that we're going to ignore.
$ nm hello
[some stuff removed]
0000000000601028 D array
00000000004004f4 T blah
000000000040043c t call_gmon_start
0000000000601030 b completed.6531
0000000000601010 W data_start
0000000000601038 b dtor_idx.6533
00000000004004d0 t frame_dummy
00000000004004ff T main
0000000000601020 D penguin
U printf@@GLIBC_2.2.5
You can see the whole output of nm here.
Okay, so there is some incomprensible stuff here! But here are the ones I understand:
$ nm hello
0000000000601028 D array <== our array!
0000000000601020 D penguin <== our string called penguin
00000000004004f4 T blah <== our function "blah"!
00000000004004ff T main <== our function "main"!
U printf@@GLIBC_2.2.5 <== the "printf" function we called!
This is great. We have looked into the belly of the binary and recognized some things.
Okay, so where is the string "penguin" in our program? We can look at it using objdump -s
.
$ objdump -s hello
[lots of stuff removed]
Contents of section .rodata:
400628 01000200 50656e67 75696e00 4d6f6e6b ....Penguin.Monk
400638 65790048 656c6c6f 20776f72 6c256321 ey.Hello worl%c!
400648 0a00
You can see the entire output of objdump
here.
So this is telling us that there's a part of our binary that says "Penguin\0Monkey\0Hello world%c!\0"
, and it's part of a section called .rodata
.
Sections? Every binary is made up of a bunch of sections. The most important ones to understand for now are .text
, .data
, .rodata
, and .bss
.
.text
: Contains the assembly code that runs.data
,.rodata
,.bss
: Contain various kinds of data.
Where's abcde
? That's in .data
:
Contents of section .data:
601010 00000000 00000000 00000000 00000000 ................
601020 2c064000 00000000 61626364 65000000 ..@.....abcde...