Skip to content

Instantly share code, notes, and snippets.

@jtpaasch
Created September 27, 2022 23:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jtpaasch/2669dc6b044e66a0aecec7da9c80f855 to your computer and use it in GitHub Desktop.
Save jtpaasch/2669dc6b044e66a0aecec7da9c80f855 to your computer and use it in GitHub Desktop.
My notes on how linkers work

How Linkers Work

These notes explain how a linker works and how to write a linker script. It is very basic, but the goal is to illuminate how it works in the simplest possible way.

The linker for most Linux systems is ld, which has a default linker script. To see the default linker script:

ld --verbose

To specify a custom linker script:

ld -T /path/to/custom/script.ld ...

Further reading:

Example of Linking (without any linker script)

Create foo.asm:

global foo

    section .text

foo:
    mov rax, 3
    ret

Assemble it:

nasm -w+all -f elf64 -o "foo.o" "foo.asm"

Check out the hex dump:

hd foo.o

Look at the ELF header, and the section headers:

readelf -h foo.o
readelf -S foo.o

There should be no program headers in this file, since it's just an object file:

readelf -l foo.o

Check the assembly:

objdump -d foo.o

Notice that the function foo has a dummy/empty-placeholder address (0000...), and each instruction has an address that is just an offset from 0000 (e.g., the first instruction is 0, the next is 5, because the first instruction is 5 bytes long, and that is where this next instruction starts):

foo.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:	b8 03 00 00 00       	mov    $0x3,%eax
   5:	c3                   	retq  

Now make a program that calls the foo function. Create a file called main.asm:

extern foo
global _start

    section .text

_start:
    call foo     ; call foo - result will be in rax
    mov rdi, rax ; put rax into rdi
    mov rax, 60  ; code for sys_exit - will exit with number in rdi 
    syscall      ; call sys_exit

Compile it:

nasm -w+all -f elf64 -o "main.o" "main.asm"

Check out the hex dump:

hd main.o

Look at the ELF header, and the section headers:

readelf -h main.o
readelf -S main.o

There should be no program headers in this file, since it's just an object file:

readelf -l main.o

Check the assembly:

objdump -d main.o

Now, link the files into one executable:

ld -o "main.elf" "foo.o" "main.o"

Execute it and check that the exit code is 3:

./main.elf
echo $? # should be 3

Check out the hex dump:

hd main.elf

Look at the ELF header, the section headers, and the program headers:

readelf -h main.elf
readelf -S main.elf
readelf -l main.elf

Check the assembly:

objdump -d main.elf

Notice that foo is now in the file, and that _start calls it directly. So the code from the object files have been put together into this one new file, and the dummy/empty-placeholder addresses have been filled in to make everything connect together.

Example of Linking with Linker Script

In a new folder, create two files, foo.asm:

global foo

    section .text

foo:
    mov rax, 3
    ret

and main.asm:

extern foo
global main

    section .text

main:
    call foo
    mov rdi, rax
    mov rax, 60
    syscall

Create a Makefile too:

all: clean
	nasm -w+all -f elf64 -o foo.o foo.asm
	nasm -w+all -f elf64 -o main.o main.asm
	ld -o main.elf foo.o main.o


.PHONY: clean
clean:
	rm -rf *.o
	rm -rf main.elf

Notice that main.asm file does not have a _start function. By default, ld will look for a _start function as the entry point. Try to link this:

make

It returns a warning that it couldn't find _start and it's going to start at 401000 instead, which is the beginning of foo. We want to tell the linker that main is the entry point.

Create a file called custom.ld, with these contents:

ENTRY (main)

Now, change the Makefile to this:

all: clean
        nasm -w+all -f elf64 -o foo.o foo.asm
        nasm -w+all -f elf64 -o main.o main.asm
        ld -T custom.ld -o main.elf foo.o main.o


.PHONY: clean
clean:
        rm -rf *.o
        rm -rf main.elf

That tells ld to use the linker script custom.ld. Now build again:

make

Check the entry point address by looking at the elf header:

readelf -h main.elf

For me, it says the entry point is 0x10:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x10
  Start of program headers:          64 (bytes into file)
  Start of section headers:          4336 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         1
  Size of section headers:           64 (bytes)
  Number of section headers:         5
  Section header string table index: 4

Look at the assembly to see that address 0x10 is main:

objdump -d main.elf

Indeed it is:

main.elf:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:	b8 03 00 00 00       	mov    $0x3,%eax
   5:	c3                   	retq   
   6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
   d:	00 00 00 

0000000000000010 <main>:
  10:	e8 eb ff ff ff       	callq  0 <foo>
  15:	48 89 c7             	mov    %rax,%rdi
  18:	b8 3c 00 00 00       	mov    $0x3c,%eax
  1d:	0f 05                	syscall

Notice, however, that these addresses all start at 0. The first address in the .text section is 0x00, and then everything starts counting up from there.

If we try to execute this program, the kernel will seg fault:

./main.elf
Segmentation fault

We need to tell the linker to put the code at different addresses.

In custom.ld, add this:

ENTRY (main)

SECTIONS {
  . = 0x10000;
}

Here we start a SECTIONS stanza. Then we set . to 0x10000. The dot . refers to the location counter. At the start of the SECTIONS stanza, the linker assumes the location counter is 0, so if we want it to be something different, we need to set it to something different. Here we set it to 0x10000, which tells the linker to start counting from 0x10000 instead of 0. And then, the first instruction that the linker will put into the resulting executable will be at address 0x10000, and all the other instructions the linker adds after that will be offset from there.

To see this, compile the program:

make

Then look at the assembly now:

objdump -d main.elf

All of the addresses now start at 0x10000 and go up from there:

main.elf:     file format elf64-x86-64


Disassembly of section .text:

0000000000010000 <foo>:
   10000:	b8 03 00 00 00       	mov    $0x3,%eax
   10005:	c3                   	retq   
   10006:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
   1000d:	00 00 00 

0000000000010010 <main>:
   10010:	e8 eb ff ff ff       	callq  10000 <foo>
   10015:	48 89 c7             	mov    %rax,%rdi
   10018:	b8 3c 00 00 00       	mov    $0x3c,%eax
   1001d:	0f 05                	syscall

This program now executes (at least it does for me):

./main.elf
echo $?   # I get 3, as expected

By default, the linker put our code from foo.o and main.o into the .text section of the resulting executable. But, we can tell the linker exactly what to do here.

Let's tell the linker to create a custom section in the executable called .foo, and let's put the code from foo.o's .text section inside of it. To do that, change custom.ld to this:

ENTRY (main)

SECTIONS {
  . = 0x10000;
  .foo : { foo.o(.text) }
}

Here we add a new entry to the SECTIONS stanza. This time, we define a section called .foo. What goes inside the .foo section? We say that the linker should look in foo.o, and take the code from the .text section that it finds there.

Rebuild the program:

make

Now look at the assembly:

objdump -d main.elf

Notice that the foo function now lives in its own section called .foo:

main.elf:     file format elf64-x86-64


Disassembly of section .foo:

0000000000010000 <foo>:
   10000:	b8 03 00 00 00       	mov    $0x3,%eax
   10005:	c3                   	retq   

Disassembly of section .text:

0000000000010010 <main>:
   10010:	e8 eb ff ff ff       	callq  10000 <foo>
   10015:	48 89 c7             	mov    %rax,%rdi
   10018:	b8 3c 00 00 00       	mov    $0x3c,%eax
   1001d:	0f 05                	syscall

We can also see that the linker put the code it found from main.o into the .text section. By default, it puts code from an object file's .text section into the executable's .text section, unless we say otherwise in custom.ld.

We can be explicit and tell the linker to put the code from main.o's .text section into the executable's .text section if we like. Change custom.ld to this:

ENTRY (main)

SECTIONS {
  . = 0x10000;
  .foo : { foo.o(.text) }
  .text : { main.o(.text) }
}

Recompile, and check the assembly to see that it has placed the .text section from foo.o into the executable's .foo section, and it placed the text section from main.o into the executable's .text section:

make
objdump -d main.elf

And indeed, that is what I see:

main.elf:     file format elf64-x86-64


Disassembly of section .foo:

0000000000010000 <foo>:
   10000:	b8 03 00 00 00       	mov    $0x3,%eax
   10005:	c3                   	retq   

Disassembly of section .text:

0000000000010010 <main>:
   10010:	e8 eb ff ff ff       	callq  10000 <foo>
   10015:	48 89 c7             	mov    %rax,%rdi
   10018:	b8 3c 00 00 00       	mov    $0x3c,%eax
   1001d:	0f 05                	syscall

We can tell the linker to put the code from the .text section from all object files by using * as a wilcard to match any file. Change custom.ld to this:

ENTRY (main)

SECTIONS {
  . = 0x10000;
  .text : { *(.text) }
}

This says that the .text section in the executable should be populated with the .text sections from all (i.e., *) object files.

Rebuild and check the assembly to confirm:

make
objdump -d main.elf

And indeed, that is what I see:

main.elf:     file format elf64-x86-64


Disassembly of section .text:

0000000000010000 <foo>:
   10000:	b8 03 00 00 00       	mov    $0x3,%eax
   10005:	c3                   	retq   
   10006:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
   1000d:	00 00 00 

0000000000010010 <main>:
   10010:	e8 eb ff ff ff       	callq  10000 <foo>
   10015:	48 89 c7             	mov    %rax,%rdi
   10018:	b8 3c 00 00 00       	mov    $0x3c,%eax
   1001d:	0f 05                	syscall

Adding New Sections

In a new folder, create three assembly files. First, foo.asm:

global foo

    section .text

foo:
    mov rax, 3
    ret

Second, bar.asm:

global bar

    section .text

bar:
    mov rax, 5
    ret

And third, main.asm:

extern foo
global main

    section .text

main:
    call foo
    mov rdi, rax
    mov rax, 60
    syscall

Create a linker script, custom.ld:

ENTRY (main)

SECTIONS {
  . = 0x10000;
  .text : { *(.text) }
}

And a Makefile:

all: clean build link

build:
        nasm -w+all -f elf64 -o foo.o foo.asm
        nasm -w+all -f elf64 -o bar.o bar.asm
        nasm -w+all -f elf64 -o main.o main.asm

link:
        ld -T custom.ld -o main.elf foo.o main.o

relink:
        ld -T custom.ld -o main.elf bar.o foo.o main.o

.PHONY: clean
clean:
        rm -rf *.o
        rm -rf main.elf

Compile and run, just to make sure that main.elf exits with code 3:

make
./main.elf
echo $?   # should be 3

Look at the assembly for bar.o:

objdump -d bar.o

The function bar returns 5 rather than 3 (in foo.o, the function foo returns 3):

bar.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <bar>:
   0:	b8 03 00 00 00       	mov    $0x5,%eax
   5:	c3                   	retq

We want to add the bar function into our executable. To do that, we can just re-link, but include bar.o in there. In the Makefile, this is what the relink target does:

make relink

Now look at the executable:

objdump -d main.elf

We can see that bar has been included:

main.elf:     file format elf64-x86-64


Disassembly of section .text:

0000000000010000 <bar>:
   10000:	b8 05 00 00 00       	mov    $0x5,%eax
   10005:	c3                   	retq   
   10006:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
   1000d:	00 00 00 

0000000000010010 <foo>:
   10010:	b8 03 00 00 00       	mov    $0x3,%eax
   10015:	c3                   	retq   
   10016:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
   1001d:	00 00 00 

0000000000010020 <main>:
   10020:	e8 eb ff ff ff       	callq  10010 <foo>
   10025:	48 89 c7             	mov    %rax,%rdi
   10028:	b8 3c 00 00 00       	mov    $0x3c,%eax
   1002d:	0f 05                	syscall

If you like, you can change main.asm to call bar now:

extern foo
extern bar
global main

    section .text

main:
    call bar
    mov rdi, rax
    mov rax, 60
    syscall

Rebuild and relink:

make build relink

Run the new executable, and confirm that it exits with an exit code of 5:

./main.elf
echo $?   # should be 5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment