Skip to content

Instantly share code, notes, and snippets.

@loicmolinari
Last active May 19, 2024 00:54
Show Gist options
  • Save loicmolinari/c947149282bf264e392697bf7b73b083 to your computer and use it in GitHub Desktop.
Save loicmolinari/c947149282bf264e392697bf7b73b083 to your computer and use it in GitHub Desktop.

x86-64 ASM sheet

Addressing

  • No segmentation (except for fs and gs for special purposes like threading)

  • Relative to base register

    • used for data on the stack, arrays, structs and class members
    • [base + index * scale + immediate_offset]
    • base is mandatory, can be any 64-bit register
    • index can be any 64-bit register except rsp
    • scale can be 1, 2, 4, or 8
    • immediate_offset (called displacement with Gas) relative to the base register
    • Gas syntax is immediate_offset(base, index, scale)
  • RIP-relative (a.k.a. PC-relative)

    • used for static data
    • contains a 32-bit sign-extended offset relative to the instruction pointer
    • explicitely specified using mov eax [rel label] or default rel / default abs commands with NASM (uses 32-bit absolute addressing otherwise)
    • explicitely specified using mov eax label(%rip) with Gas
  • 32-bit absolute

    • 32 bits constant address sign-extended to 64 bits
    • works for addresses below 2^31
    • don't use for simple memory operands since RIP-relative addressing is shorter, faster (no need for relocations) and works everywhere
    • used to access static arrays with an index register like mov ebx, [intarray + rsi*4] though it doesn't work for Windows and Linux DLLs and for MacOSX exes and DLLs because addresses are above 2^32 (it is used by gcc and clang for Linux exes, an image base relative addressing is used on Windows exes by MASM)
    • an alternative that works everywhere is first loading the static array address into rbx using lea with a RIP-relative address and then address relatively from this base register (lea rbx, [array] then mov eax, [rbx + rcx*4]), other static arrays can then be accessed relatively (mov [(array2-array1) + rbx + rcx*4], eax)
  • 64-bit absolute

    • mov eax, dword [qword a]
    • can only be used with mov and registers al, ax, eax or rax (src or dst)
    • can't contain a segment, base or index register

Position-Independent Code (PIC)

  • Easier and faster than the 32-bit Global Offset Table (GOT) technique since RIP-relative is position independent (note that the technique to access static arrays with an index register described earlier is position independent too)

General purpose registers

bit 0 - 63 bit 0 - 31 bit 0 - 15 bit 8 - 15 bit 0 - 7
rax eax ax ah al
rbx ebx bx bh bl
rcx ecx cx ch cl
rdx edx dx dh dl
rsi esi si sil
rdi edi di dil
rbp ebp bp bpl
rsp esp sp spl
r8 r8d r8w r8b
r9 r9d r9w r9b
r10 r10d r10w r10b
r11 r11d r11w r11b
r12 r12d r12w r12b
r13 r13d r13w r13b
r14 r14d r14w r14b
r15 r15d r15w r15b
rflags flags
rip

rflags register

  • CF (Carry Flag, bit 0) — Set if an arithmetic operation generates a carry or a borrow out of the most-significant bit of the result; cleared otherwise. This flag indicates an overflow condition for unsigned-integer arithmetic. It is also used in multiple-precision arithmetic.
  • PF (Parity Flag, bit 2) — Set if the least-significant byte of the result contains an even number of 1 bits; cleared otherwise.
  • AF (Auxiliary carry Flag, bit 4) — Set if an arithmetic operation generates a carry or a borrow out of bit 3 of the result; cleared otherwise. This flag is used in binary-coded decimal (BCD) arithmetic.
  • ZF (Zero Flag, bit 6) — Set if the result is zero; cleared otherwise.
  • SF (Sign Flag, bit 7) — Set equal to the most-significant bit of the result, which is the sign bit of a signed integer. (0 indicates a positive value and 1 indicates a negative value.)
  • OF (Overflow Flag, bit 11) — Set if the integer result is too large a positive number or too small a negative number (excluding the sign-bit) to fit in the destination operand; cleared otherwise. This flag indicates an overflow condition for signed-integer (two’s complement) arithmetic.

Saturation and wraparound modes (of the instruction set)

  • Wraparound arithmetic — With wraparound arithmetic, a true out-of-range result is truncated (that is, the carry or overflow bit is ignored and only the least significant bits of the result are returned to the destination). Wraparound arithmetic is suitable for applications that control the range of operands to prevent out-of-range results. If the range of operands is not controlled, however, wraparound arithmetic can lead to large errors. For example, adding two large signed numbers can cause positive overflow and produce a negative result.
  • Signed saturation arithmetic — With signed saturation arithmetic, out-of-range results are limited to the representable range of signed integers for the integer size being operated on. For example, if positive overflow occurs when operating on signed word integers, the result is saturated to 7FFFH, which is the largest positive integer that can be represented in 16 bits; if negative overflow occurs, the result is saturated to 8000H.
  • Unsigned saturation arithmetic — With unsigned saturation arithmetic, out-of-range results are limited to the representable range of unsigned integers for the integer size. So, positive overflow when operating on unsigned byte integers results in FFH being returned and negative overflow results in 00H being returned.

Stack frames

Data transfer instructions

  • MOV — Move data between general-purpose registers; move data between memory and general-purpose or segment registers; move immediates to general-purpose registers.
  • CMOVcc — Conditional move.
  • XCHG — Exchange.
  • BSWAP — Byte swap.
  • XADD — Exchange and add.
  • CMPXCHG — Compare and exchange.
  • CMPXCHG8B / CMPXCHG16B — Compare and exchange 8/16 bytes.
  • PUSH — Push onto stack.
  • POP — Pop off of stack.
  • PUSHA / PUSHAD — Push general-purpose registers onto stack.
  • POPA / POPAD — Pop general-purpose registers from stack.
  • CWD / CDQ / CQO — Convert word to doubleword/Convert doubleword to quadword.
  • CBW / CWDE / CDQE — Convert byte to word/Convert word to doubleword in rax register.
  • MOVSX / MOVSXD — Move and sign extend.
  • MOVZX — Move and zero extend.

Binary arithmetic instructions

  • ADCX — Unsigned integer add with carry.
  • ADOX — Unsigned integer add with overflow.
  • ADD — Integer add.
  • ADC — Add with carry.
  • SUB — Subtract.
  • SBB — Subtract with borrow.
  • IMUL — Signed multiply.
  • MUL — Unsigned multiply.
  • IDIV — Signed divide.
  • DIV — Unsigned divide.
  • INC — Increment.
  • DEC — Decrement.
  • NEG — Negate.
  • CMP — Compare.

Logical instructions

  • AND — Perform bitwise logical AND.
  • OR — Perform bitwise logical OR.
  • XOR — Perform bitwise logical exclusive OR.
  • NOT — Perform bitwise logical NOT.

Shift and rotate instructions

Bit and byte instructions

  • BT — Bit test.
  • BTS — Bit test and set.
  • BTR — Bit test and reset.
  • BTC — Bit test and complement.
  • BSF — Bit scan forward.
  • BSR — Bit scan reverse.
  • SETcc — Set byte on condition.
  • TEST — Logical compare.
  • CRC32 — Provides hardware acceleration to calculate cyclic redundancy checks for fast and efficient implementation of data integrity protocols.
  • POPCNT — This instruction calculates the number of bits set to 1 in the second operand (source) and returns the count in the first operand (a destination register).

Control transfer instructions

String instructions

rflags control instructions

  • STC — Set carry flag.
  • CLC — Clear the carry flag.
  • CMC — Complement the carry flag.
  • CLD — Clear the direction flag.
  • STD — Set direction flag.
  • LAHF — Load flags into ah register.
  • SAHF — Store ah register into flags.
  • PUSHF / PUSHFQ — Push rflags onto stack.
  • POPF / POPFQ — Pop rflags from stack.
  • STI — Set interrupt flag.
  • CLI — Clear the interrupt flag.

Miscellaneous instructions

  • LEA — Load effective address.
  • NOP — No operation.
  • UD — Undefined instruction.
  • XLAT / XLATB — Table lookup translation.
  • CPUID — Processor identification.
  • MOVBE — Move data after swapping data bytes.
  • PREFETCHW — Prefetch data into cache in anticipation of write.
  • CLFLUSH — Flushes and invalidates a memory operand and its associated cache line from all levels of the processor’s cache hierarchy.
  • CLFLUSHOPT — Flushes and invalidates a memory operand and its associated cache line from all levels of the processor’s cache hierarchy with optimized memory system throughput.
  • RDRAND — Retrieves a random number generated from hardware.
  • RDSEED — Seed the random number generator from hardware.

User-mode extended states save/restore instructions

  • XSAVE — Save processor extended states to memory.
  • XSAVEC — Save processor extended states with compaction to memory.
  • XSAVEOPT — Save processor extended states to memory, optimized.
  • XRSTOR — Restore processor extended states from memory.
  • XGETBV — Reads the state of an extended control register.

Bit manipulation instructions (BMI1, BMI2)

  • ANDN — Bitwise AND of first source with inverted 2nd source operands.
  • BEXTR — Contiguous bitwise extract.
  • BLSI — Extract lowest set bit.
  • BLSMSK — Set all lower bits below first set bit to 1.
  • BLSR — Reset lowest set bit.
  • BZHI — Zero high bits starting from specified bit position.
  • LZCNT — Count the number leading zero bits.
  • MULX — Unsigned multiply without affecting arithmetic flags.
  • PDEP — Parallel deposit of bits using a mask.
  • PEXT — Parallel extraction of bits using a mask.
  • RORX — Rotate right without affecting arithmetic flags.
  • SARX / SHLX / SHRX — Shift arithmetic/logic left/right without affecting flags.
  • TZCNT — Count the number trailing zero bits.

x87 FPU overview

  • x87 FPU state is aliased to the MMX state, care must be taken when making transitions to MMX instructions to prevent incoherent or unexpected results.

x87 FPU data transfer instructions

  • FLD — Load floating-point value.
  • FST / FSTP — Store floating-point value without/with pop.
  • FILD — Load integer.
  • FIST / FISTP — Store integer with/without pop.
  • FBLD — Load BCD.
  • FBSTP — Store BCD and pop.
  • FXCH — Exchange registers.
  • FCMOVcc — Floating-point conditional move.

x87 FPU basic arithmetic instructions

x87 FPU comparison instructions

x87 FPU transcendental instructions

x87 FPU load constants instructions

x87 FPU control instructions

x87 FPU and SIMD state management instructions

  • FXSAVE — Save x87 FPU and SIMD state.
  • FXRSTOR — Restore x87 FPU and SIMD state.

MMX overview

  • SIMD execution model to handle 64-bit packed integer data.
  • Eight new 64-bit data registers, called MMX registers.
  • Three new packed data types:
    • 64-bit packed byte integers (signed and unsigned)
    • 64-bit packed word integers (signed and unsigned)
    • 64-bit packed doubleword integers (signed and unsigned)
  • MMX state is aliased to the x87 FPU state, care must be taken when making transitions to x87 FPU instructions to prevent incoherent or unexpected results.

MMX data transfer instructions

  • MOVD / MOVQ — Move doubleword/quadword from/to MMX registers.

MMX conversion instructions

MMX packed arithmetic instructions

  • PADDB / PADDW / PADDD — Add packed byte/word/doubleword integers.
  • PADDSB / PADDSW — Add packed signed byte/word integers with signed saturation.
  • PADDUSB / PADDUSW — Add packed unsigned byte/word integers with unsigned saturation.
  • PSUBB / PSUBW / PSUBD — Subtract packed byte/word/doubleword integers.
  • PSUBSB / PSUBSW — Subtract packed signed byte/word integers with signed saturation.
  • PSUBUSB / PSUBUSW — Subtract packed unsigned byte/word integers with unsigned saturation.
  • PMULHW — Multiply packed signed word integers and store high result.
  • PMULLW — Multiply packed signed word integers and store low result.
  • PMADDWD — Multiply and add packed word integers.

MMX comparison instructions

MMX logical instructions

  • PAND — Bitwise logical AND.
  • PANDN — Bitwise logical AND NOT.
  • POR — Bitwise logical OR.
  • PXOR — Bitwise logical exclusive OR.

MMX shift and rotate instructions

MMX state management instructions

  • EMMS — Empty MMX state.

SSE overview

  • Expand the SIMD execution model by adding facilities for handling packed and scalar single-precision floating-point values contained in 128-bit registers.
  • Sixteen (eight for 32-bit mode) new 128-bit packed single-precision floating-point XMM registers available.
  • 128-bit packed and scalar single-precision floating-point instructions.
  • Enhancements to MMX instruction set with new operations on packed integer operands located in MMX registers.
  • Explicit prefetching of data, control of the cacheability of data, control of the ordering of store operations.

SSE data transfer instructions

  • MOVAPS — Move four aligned packed single-precision floating-point values between XMM registers or between XMM register and memory.
  • MOVUPS — Move four unaligned packed single-precision floating-point values between XMM registers or between XMM register and memory.
  • MOVHPS — Move two packed single-precision floating-point values to an from the high quadword of an XMM register and memory.
  • MOVHLPS — Move two packed single-precision floating-point values from the high quadword of an XMM register to the low quadword of another XMM register.
  • MOVLPS — Move two packed single-precision floating-point values to an from the low quadword of an XMM register and memory.
  • MOVLHPS — Move two packed single-precision floating-point values from the low quadword of an XMM register to the high quadword of another XMM register.
  • MOVMSKPS — Extract sign mask from four packed single-precision floating-point values.
  • MOVSS — Move scalar single-precision floating-point value between XMM registers or between an XMM register and memory.

SSE packed arithmetic instructions

  • ADDPS — Add packed single-precision floating-point values.
  • ADDSS — Add scalar single-precision floating-point values.
  • SUBPS — Subtract packed single-precision floating-point values.
  • SUBSS — Subtract scalar single-precision floating-point values.
  • MULPS — Multiply packed single-precision floating-point values.
  • MULSS — Multiply scalar single-precision floating-point values.
  • DIVPS — Divide packed single-precision floating-point values.
  • DIVSS — Divide scalar single-precision floating-point values.
  • RCPPS — Compute reciprocals of packed single-precision floating-point values.
  • RCPSS — Compute reciprocal of scalar single-precision floating-point values.
  • SQRTPS — Compute square roots of packed single-precision floating-point values.
  • SQRTSS — Compute square root of scalar single-precision floating-point values.
  • RSQRTPS — Compute reciprocals of square roots of packed single-precision floating-point values.
  • RSQRTSS — Compute reciprocal of square root of scalar single-precision floating-point values.
  • MAXPS — Return maximum packed single-precision floating-point values.
  • MAXSS — Return maximum scalar single-precision floating-point values.
  • MINPS — Return minimum packed single-precision floating-point values.
  • MINSS — Return minimum scalar single-precision floating-point values.

SSE comparison instructions

  • CMPPS — Compare packed single-precision floating-point values.
  • CMPSS — Compare scalar single-precision floating-point values.
  • COMISS — Perform ordered comparison of scalar single-precision floating-point values and set flags in rflags register.
  • UCOMISS — Perform unordered comparison of scalar single-precision floating-point values and set flags in rflags register.

SSE logical instructions

  • ANDPS — Perform bitwise logical AND of packed single-precision floating-point values.
  • ANDNPS — Perform bitwise logical AND NOT of packed single-precision floating-point values.
  • ORPS — Perform bitwise logical OR of packed single-precision floating-point values.
  • XORPS — Perform bitwise logical XOR of packed single-precision floating-point values.

SSE shuffle and unpack instructions

  • SHUFPS — Shuffles values in packed single-precision floating-point operands.
  • UNPCKHPS — Unpacks and interleaves the two high-order values from two single-precision floating-point operands.
  • UNPCKLPS — Unpacks and interleaves the two low-order values from two single-precision floating-point operands.

SSE conversion instructions

  • CVTPI2PS — Convert packed doubleword integers to packed single-precision floating-point values.
  • CVTSI2SS — Convert doubleword integer to scalar single-precision floating-point value.
  • CVTPS2PI — Convert packed single-precision floating-point values to packed doubleword integers.
  • CVTTPS2PI — Convert with truncation packed single-precision floating-point values to packed doubleword integers.
  • CVTSS2SI — Convert a scalar single-precision floating-point value to a doubleword integer.
  • CVTTSS2SI — Convert with truncation a scalar single-precision floating-point value to a scalar doubleword integer.

SSE MXCSR management instructions

  • LDMXCSR — Load MXCSR register.
  • STMXCSR — Save MXCSR register state.

SSE 64-bit integer instructions (MMX enhancements)

  • PAVGB / PAVGW — Compute average of packed unsigned byte integers.
  • PEXTRW — Extract word.
  • PINSRW — Insert word.
  • PMAXUB — Maximum of packed unsigned byte integers.
  • PMAXSW — Maximum of packed signed word integers.
  • PMINUB — Minimum of packed unsigned byte integers.
  • PMINSW — Minimum of packed signed word integers.
  • PMOVMSKB — Move byte mask.
  • PMULHUW — Multiply packed unsigned integers and store high result.
  • PSADBW — Compute sum of absolute differences.
  • PSHUFW — Shuffle packed integer word in MMX register.

SSE cacheability control, prefetch and ordering instructions

  • MASKMOVQ — Non-temporal store of selected bytes from an MMX register into memory.
  • MOVNTQ — Non-temporal store of quadword from an MMX register into memory.
  • MOVNTPS — Non-temporal store of four packed single-precision floating-point values from an XMM register into memory.
  • PREFETCHh — Load 32 or more of bytes from memory to a selected level of the processor’s cache hierarchy.
  • SFENCE — Serializes store operations.

SSE2 overview

  • Packed and scalar 128-bit double-precision floating-point instructions.
  • Additional 64-bit and 128-bit packed byte/word/doubleword/quadword integers instructions.
  • 128-bit versions of integer instructions introduced with MMX and SSE.
  • Additional cacheability-control and instruction-ordering instructions.

SSE2 FP64 data movement instructions

  • MOVAPD — Move two aligned packed double-precision floating-point values between XMM registers or between and XMM register and memory.
  • MOVUPD — Move two unaligned packed double-precision floating-point values between XMM registers or between and XMM register and memory.
  • MOVHPD — Move high packed double-precision floating-point value to an from the high quadword of an XMM register and memory.
  • MOVLPD — Move low packed single-precision floating-point value to an from the low quadword of an XMM register and memory.
  • MOVMSKPD — Extract sign mask from two packed double-precision floating-point values.
  • MOVSD — Move scalar double-precision floating-point value between XMM registers or between an XMM register and memory.

SSE2 FP64 packed arithmetic instructions

  • ADDPD — Add packed double-precision floating-point values.
  • ADDSD — Add scalar double precision floating-point values.
  • SUBPD — Subtract packed double-precision floating-point values.
  • SUBSD — Subtract scalar double-precision floating-point values.
  • MULPD — Multiply packed double-precision floating-point values.
  • MULSD — Multiply scalar double-precision floating-point values.
  • DIVPD — Divide packed double-precision floating-point values.
  • DIVSD — Divide scalar double-precision floating-point values.
  • SQRTPD — Compute packed square roots of packed double-precision floating-point values.
  • SQRTSD — Compute scalar square root of scalar double-precision floating-point values.
  • MAXPD — Return maximum packed double-precision floating-point values.
  • MAXSD — Return maximum scalar double-precision floating-point values.
  • MINPD — Return minimum packed double-precision floating-point values.
  • MINSD — Return minimum scalar double-precision floating-point values.

SSE2 FP64 logical instructions

  • ANDPD — Perform bitwise logical AND of packed double-precision floating-point values.
  • ANDNPD — Perform bitwise logical AND NOT of packed double-precision floating-point values.
  • ORPD — Perform bitwise logical OR of packed double-precision floating-point values.
  • XORPD — Perform bitwise logical XOR of packed double-precision floating-point values.

SSE2 FP64 compare instructions

  • CMPPD — Compare packed double-precision floating-point values.
  • CMPSD — Compare scalar double-precision floating-point values.
  • COMISD — Perform ordered comparison of scalar double-precision floating-point values and set flags in rflags register.
  • UCOMISD — Perform unordered comparison of scalar double-precision floating-point values and set flags in rflags register.

SSE2 FP64 shuffle and unpack instructions

  • SHUFPD — Shuffles values in packed double-precision floating-point operands.
  • UNPCKHPD — Unpacks and interleaves the high values from two packed double-precision floating-point operands.
  • UNPCKLPD — Unpacks and interleaves the low values from two packed double-precision floating-point operands.

SSE2 FP64 conversion instructions

  • CVTPD2PI — Convert packed double-precision floating-point values to packed doubleword integers.
  • CVTTPD2PI — Convert with truncation packed double-precision floating-point values to packed doubleword integers.
  • CVTPI2PD — Convert packed doubleword integers to packed double-precision floating-point values.
  • CVTPD2DQ — Convert packed double-precision floating-point values to packed doubleword integers.
  • CVTTPD2DQ — Convert with truncation packed double-precision floating-point values to packed doubleword integers.
  • CVTDQ2PD — Convert packed doubleword integers to packed double-precision floating-point values.
  • CVTPS2PD — Convert packed single-precision floating-point values to packed double-precision floating-point values.
  • CVTPD2PS — Convert packed double-precision floating-point values to packed single-precision floating-point values.
  • CVTSS2SD — Convert scalar single-precision floating-point values to scalar double-precision floating-point values.
  • CVTSD2SS — Convert scalar double-precision floating-point values to scalar single-precision floating-point values.
  • CVTSD2SI — Convert scalar double-precision floating-point values to a doubleword integer.
  • CVTTSD2SI — Convert with truncation scalar double-precision floating-point values to scalar doubleword integers.
  • CVTSI2SD — Convert doubleword integer to scalar double-precision floating-point value.

SSE2 FP32 instructions (SSE enhancements)

  • CVTDQ2PS — Convert packed doubleword integers to packed single-precision floating-point values.
  • CVTPS2DQ — Convert packed single-precision floating-point values to packed doubleword integers.
  • CVTTPS2DQ — Convert with truncation packed single-precision floating-point values to packed doubleword integers.

SSE2 integer instructions

  • MOVDQA — Move aligned double quadword.
  • MOVDQU — Move unaligned double quadword.
  • MOVQ2DQ — Move quadword integer from MMX to XMM registers.
  • MOVDQ2Q — Move quadword integer from XMM to MMX registers.
  • PMULUDQ — Multiply packed unsigned doubleword integers.
  • PADDQ — Add packed quadword integers.
  • PSUBQ — Subtract packed quadword integers.
  • PSHUFLW — Shuffle packed low words.
  • PSHUFHW — Shuffle packed high words.
  • PSHUFD — Shuffle packed doublewords.
  • PSLLDQ — Shift double quadword left logical.
  • PSRLDQ — Shift double quadword right logical.
  • PUNPCKHQDQ — Unpack high quadwords.
  • PUNPCKLQDQ — Unpack low quadwords.

SSE2 cacheability control and ordering instructions

  • CLFLUSH — Flush cacheline.
  • LFENCE — Serializes load operations.
  • MFENCE — Serializes load and store operations.
  • PAUSE — Improves the performance of “spin-wait loops”.
  • MASKMOVDQU — Non-temporal store of selected bytes from an XMM register into memory.
  • MOVNTPD — Non-temporal store of two packed double-precision floating-point values from an XMM register into memory.
  • MOVNTDQ — Non-temporal store of double quadword from an XMM register into memory.
  • MOVNTI — Non-temporal store of a doubleword from a general-purpose register into memory.

References

Instruction tables

Examples

Utils

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment