Skip to content

Instantly share code, notes, and snippets.

@mepcotterell
Last active January 15, 2020 20:35
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mepcotterell/c3f79ffff822f5edcd3c9884e294fd21 to your computer and use it in GitHub Desktop.
Save mepcotterell/c3f79ffff822f5edcd3c9884e294fd21 to your computer and use it in GitHub Desktop.
Assembly Notes
.text ; // place what follows in the text segment
.globl foo ; // makes the symbol foo visible to the linker
.type foo, @function ; // sets the type of the symbol; foo is intended to be a function
foo:
movq $42, %rax ; // move 42 into return register
ret ; // return
.text
.globl bar
.type bar, @function
bar:
movl $21, 0(%rdi) ; // move 21 into memory pointed to by %rdi
ret ; // return

Assembly Notes

Table of Contents

Example Code and Makefile

File Description
asm.s. Hand-written assembly code for foo and bar with standard calling conventions.
asm-opt.s Same as asm.s, except optimized--both functions are leaf functions.
main.h Header defining our intended prototypes for foo and bar.
main.c C code that uses foo and bar.
Makefile Makefile to compile, assemble, and link all unoptimized source into main executable.

The included Makefile is configured to make the asm.s and not asm-opt.s. One can easily add additional targts to build a version of main that uses the optimized code.

Vocabulary

  • assembly - lowest level programming language that is intuitively readable by humans; it has a direct mapping to a binary encoding that indicates what wires should be on or off leading into the CPU
  • source code - any code that is hand-written, including assembly
  • directive - see below
  • label - see below
  • instruction - see below
  • as - GNU Assembler; see Using as (manual)
  • ATT&T syntax - read this
  • leaf function - a function that calls no other functions, and can complete its work within the set of registers already passed; does not need to preserve the call frame
  • generalized function - a function that calls other functions; must preserve the call frame

Assembly Code

Assembly code has three different kinds of elements:

  • Directives begin with a dot and indicate structural information useful to the assembler, linker, or debugger, but are not in and of themselves assembly instructions.

  • Labels end with a colon and indicate by their position the association between names and locations.

  • Instructions are the actual assembly code, typically indented to visually distinguish them from directives and labels.

x86_64 Registers

These are the general purpose registers.

Register Usage Preserved
%rax temporary register; with variable arguments passes information about the number of vector registers used; 1st return register No
%rbx callee-saved register; optionally used as base pointer Yes
%rcx used to pass 4th integer argument to functions No
%rdx used to pass 3rd argument to functions; 2nd return register No
%rsp stack pointer Yes
%rbp callee-saved register; optionally used as frame pointer Yes
%rsi used to pass 2nd argument to functions No
%rdi used to pass 1st argument to functions No
%r8 used to pass 5th argument to functions No
%r9 used to pass 6th argument to functions No
%r10 temporary register, used for passing a function’s static chain pointer No
%r11 temporary register No
%r12 callee-saved register Yes
%r13 callee-saved register Yes
%r14 callee-saved register Yes
%r15 callee-saved register Yes
%rip instruction pointer NA
%eflags status / condition bits NA

There are other registers for floating point operations.

Register Sizes

Each general purpose register can be accessed in 64, 32, 16, 8 (high), and 8 (low) modes. Consider the registers that all share a in the middle of their name. They all occupy the same general memory location. The rax register refers to the full 64 bit value. The others refer to lower bits within the register. The diagram below shows the masks for each register name within the same space (0s omitted for readability):

FFFF FFFF FFFF FFFF rax - all 64 bits; r = full register
          FFFF FFFF eax - lower 32 bits; e = extended
               FFFF  ax - lower 16 bits
               FF    ah - within the lower 16 bits, the high 8 bits; h = high
                 FF  al - lower 8 bits; l = low

Here the different general purpose registers along with their sizes:

FFFF FFFF FFFF FFFF rax FFFF FFFF FFFF FFFF rbx FFFF FFFF FFFF FFFF rcx FFFF FFFF FFFF FFFF rdx
          FFFF FFFF eax           FFFF FFFF ebx           FFFF FFFF ecx           FFFF FFFF edx
               FFFF  ax                FFFF  bx                FFFF  cx                FFFF  dx
               FF    ah                FF    bh                FF    ch                FF    dh
                 FF  al                  FF  bl                  FF  cl                  FF  dl

FFFF FFFF FFFF FFFF rsi FFFF FFFF FFFF FFFF rdi FFFF FFFF FFFF FFFF rsp FFFF FFFF FFFF FFFF rbp
          FFFF FFFF esi           FFFF FFFF edi           FFFF FFFF esp           FFFF FFFF ebp
               FFFF  si                FFFF  di                FFFF  sp                FFFF  bp
               
FFFF FFFF FFFF FFFF rip
          FFFF FFFF eip
               FFFF  ip

Addressing Modes

mov, like most instructions, has a single letter suffix that determines the amount of data to be moved. The following names are used to describe data values of various sizes:

Suffix Name Bytes Bits
b BYTE 1 8
w WORD 2 16
l LONG 4 32
q QUADWORD 8 64
Mode Example Pretend it's C
Global Symbol movq x, %rax rax = c;
Immediate movq $56, %rax rax = 56;
Register movq %rbx, %rax rax = rbx;
Indirect movq (%rsp), %rax rax = *rbx;
Base-Relative movq -8(%rbp), %rax rax = *(rbx - (char *) 8)
Offset-Scale-Base-Relative movq -16(%rbx, %rcx, 8), %rax rax = // see below

That last one is tricky! The offset part of a memory address can be specified directly as a static value (called a displacement) or through an address computation made up of one or more of the following components:

  • disp - the displacement value; an 8, 16, or 32 bit value.
  • base — the value in a general-purpose register.
  • index — the value in a general-purpose register.
  • scale - the scale factor; a value of 2, 4, or 8 that is multiplied by the index value.

The general form is disp(base, index, scale) which roughly translates to dereferencing the address base + index*scale + disp, where everything assumed to be byte arithmetic.

Notable Instructions

This is not an exhaustive list!

Instruction Example Pretend it's C Notes
mov movq x, %rax rax = x;
inc incq %rax ++rax;
dec decq %rax --rax;
add addq %rbx, %rax rax = rax + rbx;
sub subq %rbx, %rax rax = rax - rbx;
imul mulq %rbx, %rax rax = rax * rbx;
and andq %rbx, %rax rax = rax & rbx;
xor xorq %rbx, %rax rax = rax ^ rbx;
shr shrq $4, %rax rax = rax >> 4; unsigned
shl shlq $5, %rax rax = rax << 5; unsigned
sar shrq $4, %rax rax = rax >> 4; signed
sal shlq $5, %rax rax = rax << 5; signed
imul imul $0x10, %rax rax = rax * 16;

Calling another Function

You can use call and ret to transfer control between functions.

Instruction Example Pretend it's C Notes
call callq foo foo(); automatic pushq %rip
ret retq return; automatic popq %rip;

In a function, you can access the old value of %rip by immediately popping the stack.

Jumping

You can use jmp for an unconditional jump. Other instructions are available to only jump if certain conditions are met after executing a cmp instruction.

Instruction Example Notes
jmp jmpq *%rdx unconditional jump; absolute/long

See this link for information about conditional statements and jumps. These provide control flow in our assembly programs.

Division

The snippet below performs unsigned 31 / 2. 31 is stored in %rdx:%rax:

movq	$0, %rdx
movq	$31, %rax
movq	$2, %rbx
divq	%rbx

The quotient is stored in %rax. The remainder stored in %rdx.

If you need to perform signed division, then use idivq. To perform signed division, you need to sign-extend %rax into %rdx:%rax using something like the cqto instruction.

Interesting Weird Things

32 bit instructions will automatically zero the top 32 bits of the respective 64 bit registers, while 16 or 8 bit instructions do not.

example:
movq	$0xFFFFFFFFFFFFFFFF, %rax ; // FFFF FFFF FFFF FFFF rax
movb	$0, %al                   ; // FFFF FFFF FFFF FF00 rax
movw	$0, %ax                   ; // FFFF FFFF FFFF 0000 rax
movl	$0, %eax                  ; // 0000 0000 0000 0000 rax

--gstabs+

This option will cause as to generate stabs debugging information for each assembler line, with GNU extensions that probably only gdb can handle, and that could make other debuggers crash or refuse to read your program. This may help debugging assembler code.

$ as --gstabs+ -o asm.o asm.s

Useful GDB Commands

  • (gdb) layout regs -- text user interface (TUI) mode, displaying registers and instructions
    • Use C-p and C-n while in TUI mode to go up and down in the history (just like Emacs)
  • (gdb) x/8gx $sp -- examine the 8 quadwords above (and including) the stack pointer

References

.text ; // place what follows in the text segment
.globl foo ; // makes the symbol foo visible to the linker
.type foo, @function ; // sets the type of the symbol; foo is intended to be a function
foo:
pushq %rbp ; // save old call frame
movq %rsp, %rbp ; // initialize new call frame
movq $42, %rax ; // move 42 into return register
movq %rbp, %rsp ; // prepare old call frame
popq %rbp ; // restore old call frame
ret ;
.text
.globl bar
.type bar, @function
bar:
pushq %rbp ; // save old call frame
movq %rsp, %rbp ; // initialize new call frame
movl $21, 0(%rdi) ; // move 21 into memory pointed to by %rdi
movq %rbp, %rsp ; // prepare old call frame
popq %rbp ; // restore old call frame
ret ; // return
#include "main.h"
int main() {
int x = foo();
bar(&x);
return x;
} // main
#ifndef MAIN_H
#define MAIN_H
int foo(void);
void bar(int * i);
#endif // MAIN_H
main: main.o asm.o
gcc -o main main.o asm.o
main.o: main.s
as --gstabs+ -o main.o main.s
main.s: main.c main.h
gcc -std=c17 -Wall --pedantic-errors -g -O0 -S -o main.s main.c
asm.o: asm.s main.h
as --gstabs+ -o asm.o asm.s
clean:
rm -f main
rm -f main.o
rm -f main.s
rm -f asm.o
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment