- Example Code and Makefile
- Vocabulary
- Assembly Code
- x86_64 Registers
- Register Sizes
- Addressing Modes
- Notable Instructions
- Calling anothter Function
- Jumping
- Division
- Interesting Weird Things
--gstabs+
- Useful GDB Commands
- References
File | Description |
---|---|
asm.s . |
Hand-written assembly code for foo and bar with standard calling conventions. |
asm-opt.s |
Same as asm.s , except optimized--both functions are leaf functions. |
main.h |
Header defining our intended prototypes for foo and bar . |
main.c |
C code that uses foo and bar . |
Makefile |
Makefile to compile, assemble, and link all unoptimized source into main executable. |
The included Makefile
is configured to make the asm.s
and not asm-opt.s
. One can easily add additional
targts to build a version of main
that uses the optimized code.
- assembly - lowest level programming language that is intuitively readable by humans; it has a direct mapping to a binary encoding that indicates what wires should be on or off leading into the CPU
- source code - any code that is hand-written, including assembly
- directive - see below
- label - see below
- instruction - see below
as
- GNU Assembler; see Usingas
(manual)- ATT&T syntax - read this
- leaf function - a function that calls no other functions, and can complete its work within the set of registers already passed; does not need to preserve the call frame
- generalized function - a function that calls other functions; must preserve the call frame
Assembly code has three different kinds of elements:
-
Directives begin with a dot and indicate structural information useful to the assembler, linker, or debugger, but are not in and of themselves assembly instructions.
-
Labels end with a colon and indicate by their position the association between names and locations.
-
Instructions are the actual assembly code, typically indented to visually distinguish them from directives and labels.
These are the general purpose registers.
Register | Usage | Preserved |
---|---|---|
%rax |
temporary register; with variable arguments passes information about the number of vector registers used; 1st return register | No |
%rbx |
callee-saved register; optionally used as base pointer | Yes |
%rcx |
used to pass 4th integer argument to functions | No |
%rdx |
used to pass 3rd argument to functions; 2nd return register | No |
%rsp |
stack pointer | Yes |
%rbp |
callee-saved register; optionally used as frame pointer | Yes |
%rsi |
used to pass 2nd argument to functions | No |
%rdi |
used to pass 1st argument to functions | No |
%r8 |
used to pass 5th argument to functions | No |
%r9 |
used to pass 6th argument to functions | No |
%r10 |
temporary register, used for passing a function’s static chain pointer | No |
%r11 |
temporary register | No |
%r12 |
callee-saved register | Yes |
%r13 |
callee-saved register | Yes |
%r14 |
callee-saved register | Yes |
%r15 |
callee-saved register | Yes |
%rip |
instruction pointer | NA |
%eflags |
status / condition bits | NA |
There are other registers for floating point operations.
Each general purpose register can be accessed in 64, 32, 16, 8 (high), and 8 (low) modes.
Consider the registers that all share a
in the middle of their name. They all
occupy the same general memory location. The rax
register refers to the full 64 bit
value. The others refer to lower bits within the register. The diagram below shows the
masks for each register name within the same space (0s omitted for readability):
FFFF FFFF FFFF FFFF rax - all 64 bits; r = full register
FFFF FFFF eax - lower 32 bits; e = extended
FFFF ax - lower 16 bits
FF ah - within the lower 16 bits, the high 8 bits; h = high
FF al - lower 8 bits; l = low
Here the different general purpose registers along with their sizes:
FFFF FFFF FFFF FFFF rax FFFF FFFF FFFF FFFF rbx FFFF FFFF FFFF FFFF rcx FFFF FFFF FFFF FFFF rdx
FFFF FFFF eax FFFF FFFF ebx FFFF FFFF ecx FFFF FFFF edx
FFFF ax FFFF bx FFFF cx FFFF dx
FF ah FF bh FF ch FF dh
FF al FF bl FF cl FF dl
FFFF FFFF FFFF FFFF rsi FFFF FFFF FFFF FFFF rdi FFFF FFFF FFFF FFFF rsp FFFF FFFF FFFF FFFF rbp
FFFF FFFF esi FFFF FFFF edi FFFF FFFF esp FFFF FFFF ebp
FFFF si FFFF di FFFF sp FFFF bp
FFFF FFFF FFFF FFFF rip
FFFF FFFF eip
FFFF ip
mov
, like most instructions, has a single letter suffix that determines the amount of data to be moved.
The following names are used to describe data values of various sizes:
Suffix | Name | Bytes | Bits |
---|---|---|---|
b |
BYTE | 1 | 8 |
w |
WORD | 2 | 16 |
l |
LONG | 4 | 32 |
q |
QUADWORD | 8 | 64 |
Mode | Example | Pretend it's C |
---|---|---|
Global Symbol | movq x, %rax |
rax = c; |
Immediate | movq $56, %rax |
rax = 56; |
Register | movq %rbx, %rax |
rax = rbx; |
Indirect | movq (%rsp), %rax |
rax = *rbx; |
Base-Relative | movq -8(%rbp), %rax |
rax = *(rbx - (char *) 8) |
Offset-Scale-Base-Relative | movq -16(%rbx, %rcx, 8), %rax |
rax = // see below |
That last one is tricky! The offset part of a memory address can be specified directly as a static value (called a displacement) or through an address computation made up of one or more of the following components:
- disp - the displacement value; an 8, 16, or 32 bit value.
- base — the value in a general-purpose register.
- index — the value in a general-purpose register.
- scale - the scale factor; a value of 2, 4, or 8 that is multiplied by the index value.
The general form is disp(base, index, scale)
which roughly translates to dereferencing the address
base + index*scale + disp
, where everything assumed to be byte arithmetic.
This is not an exhaustive list!
Instruction | Example | Pretend it's C | Notes |
---|---|---|---|
mov |
movq x, %rax |
rax = x; |
|
inc |
incq %rax |
++rax; |
|
dec |
decq %rax |
--rax; |
|
add |
addq %rbx, %rax |
rax = rax + rbx; |
|
sub |
subq %rbx, %rax |
rax = rax - rbx; |
|
imul |
mulq %rbx, %rax |
rax = rax * rbx; |
|
and |
andq %rbx, %rax |
rax = rax & rbx; |
|
xor |
xorq %rbx, %rax |
rax = rax ^ rbx; |
|
shr |
shrq $4, %rax |
rax = rax >> 4; |
unsigned |
shl |
shlq $5, %rax |
rax = rax << 5; |
unsigned |
sar |
shrq $4, %rax |
rax = rax >> 4; |
signed |
sal |
shlq $5, %rax |
rax = rax << 5; |
signed |
imul |
imul $0x10, %rax |
rax = rax * 16; |
You can use call
and ret
to transfer control between functions.
Instruction | Example | Pretend it's C | Notes |
---|---|---|---|
call |
callq foo |
foo(); |
automatic pushq %rip |
ret |
retq |
return; |
automatic popq %rip; |
In a function, you can access the old value of %rip
by immediately popping the stack.
You can use jmp
for an unconditional jump. Other instructions are
available to only jump if certain conditions are met after executing
a cmp
instruction.
Instruction | Example | Notes |
---|---|---|
jmp |
jmpq *%rdx |
unconditional jump; absolute/long |
See this link for information about conditional statements and jumps. These provide control flow in our assembly programs.
The snippet below performs unsigned 31 / 2
.
31
is stored in %rdx:%rax
:
movq $0, %rdx
movq $31, %rax
movq $2, %rbx
divq %rbx
The quotient is stored in %rax
. The remainder stored in %rdx
.
If you need to perform signed division, then use idivq
.
To perform signed division, you need to sign-extend %rax
into %rdx:%rax
using something like the cqto
instruction.
32 bit instructions will automatically zero the top 32 bits of the respective 64 bit registers, while 16 or 8 bit instructions do not.
example:
movq $0xFFFFFFFFFFFFFFFF, %rax ; // FFFF FFFF FFFF FFFF rax
movb $0, %al ; // FFFF FFFF FFFF FF00 rax
movw $0, %ax ; // FFFF FFFF FFFF 0000 rax
movl $0, %eax ; // 0000 0000 0000 0000 rax
This option will cause as
to generate stabs debugging information
for each assembler line, with GNU extensions that probably only gdb
can handle, and that could make other
debuggers crash or refuse to read your program. This may help debugging assembler code.
$ as --gstabs+ -o asm.o asm.s
(gdb) layout regs
-- text user interface (TUI) mode, displaying registers and instructions- Use
C-p
andC-n
while in TUI mode to go up and down in the history (just like Emacs)
- Use
(gdb) x/8gx $sp
-- examine the 8 quadwords above (and including) the stack pointer