yanghu/assembly.md

## assembly.md

      
    Raw
  

              assembly.md
            
          
    ##IA32 Registers:


Eight registers(first six are general purpose):

eax(ax/ah/al)
ecx
edx
ebx
esi/si
edi/di
esp/sp (Stack pointer)
ebp/bp (Frame pointer)


The lower order bits can be accessed without altering higher bits.


Build code with level 1 optimization:
gcc -O1 -S code.c
gcc -O1 -o code.c
gcc -O1 prog code.o main.c

Check binary file::
objdump -d main.o

Operand specifiers


$Imm: Immediate(liteal) value
Ea: register R[Ea]
Imm: Absolute memoery address value M[Imm]
(Ea): Indirect memoery address M[R[Ea]]
Imm(Eb): Base+displacement M[Imm+R[Eb]]
(Eb,ei): Indexed. M[R[Eb]+R[Ei]]
Imm(Eb,Ei): Indexted M[Imm+R[Eb]+R[Ei]]
(,Ei,s): Scale indexed:  M[R[Ei]*s]
Imm( ,Ei,s): Scaled indexed:  M[Imm+R[Ei]*s]
(Eb,Ei,s): Scaled Indexed: M[R[Eb]+R[Ei]*s]
Imm(Eb,Ei,s): Scaled indexed: M[Imm+R[Eb]+R[Ei]*s]

Note: scaling factor s must be either 1,2,4,8
Data movement instructions

MOV can only move from immediate value(literal) to register/memory, register to register/memory, and memory to register.  If need to move between memory, needs two steps: load value from memory to register, then write register value to destination.


MOV S,D: move value from S to D

movb
movw
movl


MOVS S,D: sign-extended move (byte to word, byte to dw, word to dw)

movsbw
movsbl
movswl


MOVZ S,D: zero-extended move (byte to word, byte to dw, word to dw)

movzbw
movzbl
mvzwl


PUSH S: reduce ESP by 4 (points to next word in memory), then move S into memoery where stack pointer ESP points to,   Equivalent:
subl $4, %esp    #R[%esp] <- R[%esp] - 4
movl S, (%esp)   #M[R[%esp]] <- S


POP D: Pop double word which ESP points to, put it to D, then move ESP.   Equivalent:
movl (%esp), D    #M[R[%esp]] -> D
addl $4, %esp     #R[%esp] <- R[%esp] + 4


Arithmatic and logical operations

Load effective address (leal): it's a variant of movl. First operand is a form of a memoery reference. but does not use it to reference memory, but just copy the value (of the expression) to the second operand. Compilers often find clever uses of leal that have nothing to do with effective address computations. The destination operand must be a register.
## if %edx saves value of x, then 
leal 7(%edx, %edx, 4), %eax    # saves 5x+7 into %eax.  (first operand is a scaled indexed memery expression)


INC D: increment
DEC D: decrement
NEG D: negate
NOT D: complement
ADD S,D: add, and destination is D
SUB S,D: subtract: D <- D-S
IMUL S,D: multiply D <- D*S
XOR S,D
OR S,D
AND S,D
SAL k,D: left shift D <- D<<k
SHL k,D: left shift (same as SAL)
SAR k,D: arithmetic right shift D <- D >>(A) k
SHR k,D: logical right shift D <- D >>(L) k

Special arithmetic operations


imull S: signed full multiply. Multiply S and R[%eax] and saves results in R[%edx]:R[%eax]
mull S: unsigned full multiply
cltd: Convert R[%eax] into quad word(64bit), store into R[%edx]:R[%eax]
idivl S: signed divide R[%edx] <- R[%edx]:R[%eax] mod S |  R[%eax] <- R[%edx]:R[%eax] / S
divl S: unsigned divide

Control

Condition codes


CF: Carry Flag: the most recent operation generated a carry out of the most significant bit. used to detect overflow for unsigned operations
ZF: Zero Flag: the most recent operation yielded zero
SF: Sign Flag: The most recent operation yielded a negative value
OF: Overflow Flag: The most recent operation caused a two's-complement overflow - either negative or positive

For example, we perform ADD for C assignments t = a + b.
CF:  (unsgiend) t < (unsigned) a   #Unsigned overflow
ZF: (t==0)   #Zero
SF: (t<0)    #Negative
OF:  (a<0 == b<0) && (t<0 != a<0)   #signed overflow    (a,b同号, t, a异号)

The leal instruction doesn't alter any codes, since it's intended for address operation.

For logical operations such as XOR, carry and overflow flags are set to 0.
For shift operations, carry flag is set to the last bit shifted out, while over flow flag is 0
Inc and Dec set overflow and zero flags, but leave carry flag unchanged

CMP and TEST
CMP acts like sub, but doesn't update destination, just set flags.
TEST acts like AND, but doesn't update destination, jsut set flags.
Accessing the condition codes

Rather than directly read them, there are three common ways to acess codes:

set a single byte to 0 or 1 depending on some combination of the condietion codes
We can conditionally jump to some other part of the program
we can conditionally transfer data

SET instructions

For the first case, we uset SET instructions. the descriptions apply to the case where a comparison instruction has been executed, setting the condition codes.
Note that SET only sets one byte.(%al,%bl,etc.), we need to clear high-order 24 bits. like this setl %al ;   movzbl %al, %eax

sete D/setz D: Equal/zero  D <- ZF
setne D/setnz D: not equal/not zero D <- ~ZF
sets D: negative D<-SF
setns D: non negative
setg D/setnle D: greater(signed >) : D <- ~(SF^OF)&~ZF
setge D/setnl D: greater or equal (signed >=): D <- ~(SF^OF)
setl D/setnge D: Less
setle D/setng D: Less or equal(signed <=) D <- (SF^OF)|ZF
seta D/setnbe D: Above (unsigned >) D <- ~CF & ~ZF
setae D/ setnb: Above or equal (unsigned >=) D <- ~CF
setb D/ setnae D: Below D <- CF
setbe D/ setna D: Below or equal D <- CF|ZF

Jump instructions

jump instructions jumps to _label_s in assembly code. some instructions jumps unconditionally, some depends on flags.
Conditional jumps either jump, or continue executing.
Direct jump: target is encoded as part of the instruction, like a label.
indirect jump: target is read from a register or a memory location. written using "*" followed by an operand specificier:
jmp *%eax     #uses the value in register %eax as the jump target
jmp *(%eax)   #reads jump target from memory which is addressed using value in %eax

Jump encodings: Most commonly are PC relative. They encode differences between the address of the target instruction, and the address of the instruction immediately following the jump.  Or use absolute address, which is 4 bytes.

jump instructions


jmp Label: jump to label
jmp *Operand: jump to address specified by Operand
je Label/jz: equal/zero
jne Label/jnz: not equal/not zero
js: negative
jns non negative
jg/jnle: greater
jge/jnl: greater or equal
jl/jnge: less
jle/jng: less or equal
ja/jnbe: above
jae/jnb: above or equal
jb/jnae: below
jbe/jna: below or equal


###Conditional move instructions
Conditional data movement insteaad of control, which is better suited for pipelining and usually better performance

cmovge S, D
cmovnz S, D
...

For control branching, processor needs branch prediction during pipelining to process future instructions, when the prediction misses, it can incur a serious penalty. While for conditional move(data branching), pipeline is always full and the flow of control does not depend on data.
However, if the branches are expensive to calculate, using data branching may not be good because both branches(then-expr and else-expr) needs to be calculated.
Switch statements and jump table

A jump table is an array where entry i is the address of a code segment implementing the action of the program should be taken when the switch index equals i.
jmp  *.L8(, %eax, 4)    # example of switch. where L8 is a lable, 
#and it is followed by actual labels that we jumps to. %eax stores 
#the index. the actual jump destination is the value stored at (4*R[%eax] + L8)


## procedure.md

      
    Raw
  

              procedure.md
            
          
    Stack Frame Structure

IA32 programs make use of program stack to support procedure calls.
Machine use stack to pass arguments, to store return information, to save registers for later restoration, and for local storage.
The portion of the stack allocated for a single procedure call is called a stack frame. The topmost stack frame is delimited by two pointers, %ebp serving as the frame pointer, and %esp as stack pointer. The stack pointer can move while the procedure is executing.
Suppose procedure P(caller) calls procedure Q(the callee). The arguments to Q are contained with the stack frame for P. In addition, when P calls Q, the return address within P where the program should resume execution when it returns from Q is pushed into stack, forming the end of P's stack frame.
The stack frame of Q starts with the saved value of the frame pointer(Q's frame's starting position, which is a copy of register %ebp), followed by copies of any other saved register values.
The arguments passed to Q are positioned at offset 4+4i to frame pointer %ebp, the first one offset is 8. The stack grows toward lower address and top of stack is pointed to by %esp

Transfering control

the instructions supporting procedure calls and returns are:

call Label
call *Operand: Procedure call.
leave : Prepare stack for return
ret: Return from call

call pushes a return address on the stack, and jump to the start of the called procedure. the return address is the address of the instruction immediately following the call in the program, so that execution will resume when returns.
ret pops an address of the stack, and jumps to its location. The proper use is to execute ret when the stack pointer is pointing to the place where call stored its return address.
leave prepares the stack for returning. it is equivalent to
movl %ebp, %esp  #set stack point points to the beginning of frame
popl %ebp        #restore saved %ebp and set stack pointer to end of caller's frame
                 # (which is the return address,e.g., the instruction after call instruction)

Program counter: %eip (or %rip for 64bit cpu)
Register Usage Conventions


caller-save registers: %eax, %edx, %ecx


callee-save registers: %ebx, %esi, %edi
When procedure Q is called by P, it can overwrite caller-save registers without destroying any data required by P(caller). (because caller P saved those registers).
while Q must save values of any callee-saved registers (%ebx, %esi, %edi) and restore them before returning, because P (or some higher-level procedure) may need these values for its future computations.
In addition, stack pointer and frame pointers are store using registers %ebp and %esp