- https://runestone.academy/runestone/books/published/armTutorial/index.html
- https://azeria-labs.com/writing-arm-assembly-part-1/
Filenames should normally end with an .s
extension. E.g., hello-world.s
, not hello-world.asm
.
Comments begin with:
@
//
/* My comment */
You can use #
at the very start of a line to indicate a comment:
# Move 10 into r0
mov r0, #10
But you can't do this:
mov r0, #10 # Move 10 into r0
Declare a section:
.section NAME
contents here...
Values for NAME
can be:
.data
- Stores global data (read and write, not execute).rodata
- Stores global read-only data.bss
- Has uninitialized blank space (read and write, not execute).text
- Stores code (read and execute, but not write)
Sections should appear in an assembly file in that order.
.section .data
my_variable: .word 0xabc
.section .rodata
my_constant: .word 0x32
.section .bss
my_slot: .space 4
.section .text
.global _start
_start:
mov r0, #5
bx lr
.global foo
declaresfoo
as a global symbol
-
R0-R12 are GPRs
-
R13 is SP (stack pointer)
-
R14 is LR (link register) - stores the return address, but the value is often pushed to the stack at the start of a function and then popped at the end of the function, which effectively puts the value back in LR.
-
R15 is PC (program counter)
-
CSPR (current status of the program register) - flags
In the data section:
.data
msg: .ascii "My string"
location: .byte 10
my_int32: .int 3
my_int16: .short 4
my_float: .float 10.23
my_int_array: .int 10,11,12
Specifying the size of a piece of data:
.byte 0xab
-0xab
is 1-byte long.hword 0xbeef
-0xbeef
is 16-bits long (half word).word 0xdeadbeef
-0xdeadbeef
is 32-bits long (word)
Giving a piece of data a label/name:
x: .word 0xdeadbeef
- makesx
the label for0xdeadbeef
Pad to the next word limit:
.align
Example:
.section .data
a: .byte 8
b: .byte 0xab
.align
To allocate empty space, use .space
to specify the number of bytes:
.section .bss
my_slot: .space 4 @ allocates 4 bytes of empty space
Load the address of a stored piece data by its label with ldr DEST, =LABEL
:
.section .data
x: .word 0xdeadbeef
.section .text
ldr r1, =x @put the address of 'x' in r1
Note that ldr DEST, =LABEL
loads the address of LABEL
, not the value of LABEL
.
Load a piece of data at an address with ldr DEST, [ADDR]
:
ldr r2, [r1] @get the word starting at address stored in r1 and put it in r2
Note that this loads not the address in R1, but the value stored at the address in R1.
So =LABEL
is like &LABEL
in C
, and [r1]
is like *r1
in C
.
Store a value at an address with str SRC, [ADDR]
:
str r2, [r3] @put the contents of r2 at whatever address is in r3
Note that ldr
and str
load and store words, i.e., the bytes in the next 4 addresses starting at [r1]
.
In big-endian, bytes are numbered from left to right (or top to bottom in the memory stack). The left-most byte is the lowest address. So, the word is built top down: start at the top address, that's your first byte. Then go to the next memory slot down, that's your second byte.
+----------+------+
| ADDRESS | DATA |
+----------+------+
| 00000000 | 0a |
+----------+------+ +-----------------------------------+
| 00000001 | 0b | ===> | 0a | 0b | 0c | 0d |
+----------+------+ +-----------------------------------+
| 00000002 | 0c | byte 0 byte 1 byte 2 byte 3
+----------+------+
| 00000003 | 0d |
+----------+------+
Little-endian is the opposite. It's built bottom up. It's like the English words are in order, but the letters in each word are backwards:
+----------+------+
| ADDRESS | DATA |
+----------+------+
| 00000000 | 0d |
+----------+------+ +-----------------------------------+
| 00000001 | 0c | ===> | 0a | 0b | 0c | 0d |
+----------+------+ +-----------------------------------+
| 00000002 | 0b | byte 0 byte 1 byte 2 byte 3
+----------+------+
| 00000003 | 0a |
+----------+------+
ARM (and intel) is little endian.
ARM is a load/store architecture, which means it can only operate on values in registers. So, if you need to modify something in memory, you have to put it into a register first, do what you want to it, then put it back.
Literal and computed values as instruction operands:
#9
is the literal integer9
#-9
is the literal integer-9
#0xff
is the hex value0xff
.#0b110
is the binary value110
#'d'
is the ascii characterd
[r0, #2]
sums the value inr0
and the literal integer2
[r0, #-2]
subtracts the integer2
from the value inr0
Moving values:
mov r1, #100 @Place 100 in R1
mov r1, r3 @Place the value of R3 in R1
Arithmetic:
add r3, r1, #3 @r3 = r1 + 3
add r3, r1, r2 @r3 = r1 + r2
add r4, r4, #1 @r4++ (r4 + 1, and put back in r4)
Each function gets a "stack" of addresses in memory that it can use for scratch storage. The stack grows downwards, for a pre-determined number of addresses:
+----------+------+
| ADDRESS | DATA |
+----------+------+
| 00000000 | | <-- Bottom of the stack
+----------+------+
| 00000001 | |
+----------+------+
| 00000002 | |
+----------+------+
| 00000003 | |
+----------+------+
|
V
+----------+------+
| 00000009 | | <-- Top of the stack
+----------+------+
Addresses below the bottom of the stack fall in the "red zone," which we shouldn't write to, because it may be used by other functions in the program.
When we add things to the stack, we add them to the top of the stack, i.e., at the lower addresses. E.g., the first item we push on the stack will go in 0000009
. The next will go in 00000008
, and so on. When we pop values off the stack, we must go in reverse order (the stack is LIFO - last one in, first one out).
At the start of a function, SP
(i.e., R13
) points to the top of the stack.
We can load and store values on the stack using ldr
and str
, just as we can with any other addresses in memory.
To get to the next stack address, subtract the number of bytes you care about from SP
to get the new address. To go back to the previous stack address, add the number of bytes you care about to SP
to get the new address.
Each address on the stack has a slot that can hold one byte. So, if we want to put a 32-bit word on the stack, we have to use four slots, one for each byte of the 32-bit word. Hence, we need to subtract/add 4 from SP
to get to the next/previous available address.
.section .text
.global _start
_start:
mov r0, #0xaa
mov r1, #0xbb
@ Put r0 on the stack
sub sp, sp, #4 @ move SP down 4 slots
str r0, [sp] @ put r0 on the stack, at the address of SP
@ it fills up 4 slots, 1 byte each
@ Put r1 on the stack
sub sp, sp, #4 @ move SP down 4 slots
str r1, [sp] @ put r1 in the address of SP
@ Restore r2
ldr r1, [sp] @ load into r1 the 4 bytes starting at [sp]
add sp, sp, #4 @ move SP up 4 slots
ldr r0, [sp] @ load into r0 the 4 bytes starting at [sp]
Push a list of registers to the top of the stack:
push {r0, r1, r3}
This pushes the 4 bytes stored in r0
onto the stack, then the 4 bytes stored in r1
onto the stack, then the 4 bytes stored in r3
onto the stack, in that order. It also increments SP
(to SP - 12
).
To pop the same list of registers off the stack and put their values back into their respective registers:
pop {r0, r1, r3}
Not that pop
pops this list in reverse order: it pops r3
first, then r1
, then r0
, and it updates SP
(to SP + 12
).
ARM uses bl
and bx
to jump and then return. It stores the return address in the "link register" (lr
), which is r14
. bl
stands for "branch and link," and bx
stands for "branch exchange." The "exchange" part means that with the bx
instruction, we can tell the kernel to start executing the next code in Thumb mode or not. We can "exchange" our execution mode to another mode if we like.
bl
sets thelr
register to the address right after thebl
instruction.bx lr
returns to the address inlr
.
.section .text
.global _start
foo:
00 mov r0, #3
04 bx lr @ return to the address in lr (0x0c) and continue
_start:
08 bl foo @ put the next address (0x0c) in lr, and go to foo (0x00)
0c mov r0, r1
...
Say we have a caller function foo
, and we want to call another function bar
. The caller performs the following steps:
- Put any data in
r0-r3
onto the stack for later. - Put arguments for
bar
intor0-r3
. - Use
BL bar
to jump tobar
.
Then, the callee (i.e., bar
) takes the following steps:
- The "prologue" of the function:
- Push
lr
on the stack, to save it for later - Save registers
r4-r9
by pushing them onto the stack. We will need to restore these registers in the epilogue of the function, in case the caller is using them.
- Push
- The "body" of the function:
- Do any work we need to do in the function.
- We can find the arguments passed to our function
bar
inr0-r3
.
- The "epilogue" of the function:
- Put any values we want to return to the caller
foo
in registersr0-r3
. - Pop the stored registers
r4-r9
from the stack, so as to restore them. - Pop
lr
to restore it. - Return to the caller
foo
withbx lr
.
- Put any values we want to return to the caller
Back to the caller (i.e., foo
):
- Pull out any returned values from
r0-r3
that we need. - Pop
r0-r3
from the stack, to restore them. This way everything is back to the way it was when we calledbar
.