These notes explain how a linker works and how to write a linker script. It is very basic, but the goal is to illuminate how it works in the simplest possible way.
The linker for most Linux systems is ld
, which has a default linker script. To see the default linker script:
ld --verbose
To specify a custom linker script:
ld -T /path/to/custom/script.ld ...
Further reading:
- https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_chapter/ld_3.html (manual)
- https://home.cs.colorado.edu/~main/cs1300/doc/gnu/ld_3.html
Create foo.asm
:
global foo
section .text
foo:
mov rax, 3
ret
Assemble it:
nasm -w+all -f elf64 -o "foo.o" "foo.asm"
Check out the hex dump:
hd foo.o
Look at the ELF header, and the section headers:
readelf -h foo.o
readelf -S foo.o
There should be no program headers in this file, since it's just an object file:
readelf -l foo.o
Check the assembly:
objdump -d foo.o
Notice that the function foo
has a dummy/empty-placeholder address (0000...
), and each instruction has an address that is just an offset from 0000
(e.g., the first instruction is 0
, the next is 5
, because the first instruction is 5 bytes long, and that is where this next instruction starts):
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: b8 03 00 00 00 mov $0x3,%eax
5: c3 retq
Now make a program that calls the foo
function. Create a file called main.asm
:
extern foo
global _start
section .text
_start:
call foo ; call foo - result will be in rax
mov rdi, rax ; put rax into rdi
mov rax, 60 ; code for sys_exit - will exit with number in rdi
syscall ; call sys_exit
Compile it:
nasm -w+all -f elf64 -o "main.o" "main.asm"
Check out the hex dump:
hd main.o
Look at the ELF header, and the section headers:
readelf -h main.o
readelf -S main.o
There should be no program headers in this file, since it's just an object file:
readelf -l main.o
Check the assembly:
objdump -d main.o
Now, link the files into one executable:
ld -o "main.elf" "foo.o" "main.o"
Execute it and check that the exit code is 3:
./main.elf
echo $? # should be 3
Check out the hex dump:
hd main.elf
Look at the ELF header, the section headers, and the program headers:
readelf -h main.elf
readelf -S main.elf
readelf -l main.elf
Check the assembly:
objdump -d main.elf
Notice that foo
is now in the file, and that _start
calls it directly. So the code from the object files have been put together into this one new file, and the dummy/empty-placeholder addresses have been filled in to make everything connect together.
In a new folder, create two files, foo.asm
:
global foo
section .text
foo:
mov rax, 3
ret
and main.asm
:
extern foo
global main
section .text
main:
call foo
mov rdi, rax
mov rax, 60
syscall
Create a Makefile
too:
all: clean
nasm -w+all -f elf64 -o foo.o foo.asm
nasm -w+all -f elf64 -o main.o main.asm
ld -o main.elf foo.o main.o
.PHONY: clean
clean:
rm -rf *.o
rm -rf main.elf
Notice that main.asm
file does not have a _start
function. By default, ld
will look for a _start
function as the entry point. Try to link this:
make
It returns a warning that it couldn't find _start
and it's going to start at 401000
instead, which is the beginning of foo
. We want to tell the linker that main
is the entry point.
Create a file called custom.ld
, with these contents:
ENTRY (main)
Now, change the Makefile to this:
all: clean
nasm -w+all -f elf64 -o foo.o foo.asm
nasm -w+all -f elf64 -o main.o main.asm
ld -T custom.ld -o main.elf foo.o main.o
.PHONY: clean
clean:
rm -rf *.o
rm -rf main.elf
That tells ld
to use the linker script custom.ld
. Now build again:
make
Check the entry point address by looking at the elf header:
readelf -h main.elf
For me, it says the entry point is 0x10
:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x10
Start of program headers: 64 (bytes into file)
Start of section headers: 4336 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 1
Size of section headers: 64 (bytes)
Number of section headers: 5
Section header string table index: 4
Look at the assembly to see that address 0x10
is main
:
objdump -d main.elf
Indeed it is:
main.elf: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: b8 03 00 00 00 mov $0x3,%eax
5: c3 retq
6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
d: 00 00 00
0000000000000010 <main>:
10: e8 eb ff ff ff callq 0 <foo>
15: 48 89 c7 mov %rax,%rdi
18: b8 3c 00 00 00 mov $0x3c,%eax
1d: 0f 05 syscall
Notice, however, that these addresses all start at 0
. The first address in the .text
section is 0x00
, and then everything starts counting up from there.
If we try to execute this program, the kernel will seg fault:
./main.elf
Segmentation fault
We need to tell the linker to put the code at different addresses.
In custom.ld
, add this:
ENTRY (main)
SECTIONS {
. = 0x10000;
}
Here we start a SECTIONS
stanza. Then we set .
to 0x10000
. The dot .
refers to the location counter. At the start of the SECTIONS
stanza, the linker assumes the location counter is 0
, so if we want it to be something different, we need to set it to something different. Here we set it to 0x10000
, which tells the linker to start counting from 0x10000
instead of 0
. And then, the first instruction that the linker will put into the resulting executable will be at address 0x10000
, and all the other instructions the linker adds after that will be offset from there.
To see this, compile the program:
make
Then look at the assembly now:
objdump -d main.elf
All of the addresses now start at 0x10000
and go up from there:
main.elf: file format elf64-x86-64
Disassembly of section .text:
0000000000010000 <foo>:
10000: b8 03 00 00 00 mov $0x3,%eax
10005: c3 retq
10006: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1000d: 00 00 00
0000000000010010 <main>:
10010: e8 eb ff ff ff callq 10000 <foo>
10015: 48 89 c7 mov %rax,%rdi
10018: b8 3c 00 00 00 mov $0x3c,%eax
1001d: 0f 05 syscall
This program now executes (at least it does for me):
./main.elf
echo $? # I get 3, as expected
By default, the linker put our code from foo.o
and main.o
into the .text
section of the resulting executable. But, we can tell the linker exactly what to do here.
Let's tell the linker to create a custom section in the executable called .foo
, and let's put the code from foo.o
's .text
section inside of it. To do that, change custom.ld
to this:
ENTRY (main)
SECTIONS {
. = 0x10000;
.foo : { foo.o(.text) }
}
Here we add a new entry to the SECTIONS
stanza. This time, we define a section called .foo
. What goes inside the .foo
section? We say that the linker should look in foo.o
, and take the code from the .text
section that it finds there.
Rebuild the program:
make
Now look at the assembly:
objdump -d main.elf
Notice that the foo
function now lives in its own section called .foo
:
main.elf: file format elf64-x86-64
Disassembly of section .foo:
0000000000010000 <foo>:
10000: b8 03 00 00 00 mov $0x3,%eax
10005: c3 retq
Disassembly of section .text:
0000000000010010 <main>:
10010: e8 eb ff ff ff callq 10000 <foo>
10015: 48 89 c7 mov %rax,%rdi
10018: b8 3c 00 00 00 mov $0x3c,%eax
1001d: 0f 05 syscall
We can also see that the linker put the code it found from main.o
into the .text
section. By default, it puts code from an object file's .text
section into the executable's .text
section, unless we say otherwise in custom.ld
.
We can be explicit and tell the linker to put the code from main.o
's .text
section into the executable's .text
section if we like. Change custom.ld
to this:
ENTRY (main)
SECTIONS {
. = 0x10000;
.foo : { foo.o(.text) }
.text : { main.o(.text) }
}
Recompile, and check the assembly to see that it has placed the .text
section from foo.o
into the executable's .foo
section, and it placed the text
section from main.o
into the executable's .text
section:
make
objdump -d main.elf
And indeed, that is what I see:
main.elf: file format elf64-x86-64
Disassembly of section .foo:
0000000000010000 <foo>:
10000: b8 03 00 00 00 mov $0x3,%eax
10005: c3 retq
Disassembly of section .text:
0000000000010010 <main>:
10010: e8 eb ff ff ff callq 10000 <foo>
10015: 48 89 c7 mov %rax,%rdi
10018: b8 3c 00 00 00 mov $0x3c,%eax
1001d: 0f 05 syscall
We can tell the linker to put the code from the .text
section from all object files by using *
as a wilcard to match any file. Change custom.ld
to this:
ENTRY (main)
SECTIONS {
. = 0x10000;
.text : { *(.text) }
}
This says that the .text
section in the executable should be populated with the .text
sections from all (i.e., *
) object files.
Rebuild and check the assembly to confirm:
make
objdump -d main.elf
And indeed, that is what I see:
main.elf: file format elf64-x86-64
Disassembly of section .text:
0000000000010000 <foo>:
10000: b8 03 00 00 00 mov $0x3,%eax
10005: c3 retq
10006: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1000d: 00 00 00
0000000000010010 <main>:
10010: e8 eb ff ff ff callq 10000 <foo>
10015: 48 89 c7 mov %rax,%rdi
10018: b8 3c 00 00 00 mov $0x3c,%eax
1001d: 0f 05 syscall
In a new folder, create three assembly files. First, foo.asm
:
global foo
section .text
foo:
mov rax, 3
ret
Second, bar.asm
:
global bar
section .text
bar:
mov rax, 5
ret
And third, main.asm
:
extern foo
global main
section .text
main:
call foo
mov rdi, rax
mov rax, 60
syscall
Create a linker script, custom.ld
:
ENTRY (main)
SECTIONS {
. = 0x10000;
.text : { *(.text) }
}
And a Makefile
:
all: clean build link
build:
nasm -w+all -f elf64 -o foo.o foo.asm
nasm -w+all -f elf64 -o bar.o bar.asm
nasm -w+all -f elf64 -o main.o main.asm
link:
ld -T custom.ld -o main.elf foo.o main.o
relink:
ld -T custom.ld -o main.elf bar.o foo.o main.o
.PHONY: clean
clean:
rm -rf *.o
rm -rf main.elf
Compile and run, just to make sure that main.elf
exits with code 3:
make
./main.elf
echo $? # should be 3
Look at the assembly for bar.o
:
objdump -d bar.o
The function bar
returns 5
rather than 3
(in foo.o
, the function foo
returns 3
):
bar.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <bar>:
0: b8 03 00 00 00 mov $0x5,%eax
5: c3 retq
We want to add the bar
function into our executable. To do that, we can just re-link, but include bar.o
in there. In the Makefile
, this is what the relink
target does:
make relink
Now look at the executable:
objdump -d main.elf
We can see that bar
has been included:
main.elf: file format elf64-x86-64
Disassembly of section .text:
0000000000010000 <bar>:
10000: b8 05 00 00 00 mov $0x5,%eax
10005: c3 retq
10006: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1000d: 00 00 00
0000000000010010 <foo>:
10010: b8 03 00 00 00 mov $0x3,%eax
10015: c3 retq
10016: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1001d: 00 00 00
0000000000010020 <main>:
10020: e8 eb ff ff ff callq 10010 <foo>
10025: 48 89 c7 mov %rax,%rdi
10028: b8 3c 00 00 00 mov $0x3c,%eax
1002d: 0f 05 syscall
If you like, you can change main.asm
to call bar
now:
extern foo
extern bar
global main
section .text
main:
call bar
mov rdi, rax
mov rax, 60
syscall
Rebuild and relink:
make build relink
Run the new executable, and confirm that it exits with an exit code of 5:
./main.elf
echo $? # should be 5