- Number representation
- Unsigned: represented in traditional binary
- On n-bit architecture, (2^n) total
- (Umax=2^n-1) (minus one because of zero)
- All operations are technically modulo (2^n)
- Ints are not integers!
- Signed magnitude: represents signed numbers by using first bit as sign
- Represents (-(2^{n-1}-1) \textrm{ to } 2^{n-1}-1) with a duplicate 0
- Does not work well with arithmetic
- Two's Complement: alternate to signed magnitude good for arithmetic
- Operation and representation
- Represents negatives by flipping all bits and adding 1
- First bit is sign bit - 1 represents negative
- Can be passed through arithmetic with no further modification
- Ex. 24 = 0..011000 -> 1..100111 (1's Comp) -> 1..101000 (2's Comp) = -24
- Ex. -24 = 1..101000 -> 0..010111 (1's Comp) -> 0..011000 (2's Comp) = 24
- Ex. 6 + (-5) = 0..0110 + 1..1011 = 0..0001 = 1
- Ex. (-7) + 3 = 1..1001 + 0..0011 = 1..1100 = -4
- Alternative method: flip all bits to the left of the rightmost 1
- On n-bit architecture, (Tmax=2^(n-1)-1, Tmin=-2^(n-1))
- Special case (Tmin = -2^{63} = 10..0b: 2sComp(Tmin) = 2sComp(10..0) = 10..0 = Tmin)
- So 2sComp(Tmin) = Tmin or -Tmin = Tmin (because the magnitude of Tmin is greater than Tmax)
- Floats (32 bits)
- IEEE 754 Standard
- (Sign (1b)) ( Exponent (8b) ) ( Fraction (23b) )
- Exponent sign is part of 8-bits by adding a bias to the field value mod 2^31
- Exponent base is 2 not 10
- IEEE 754 Standard
- Floats are not reals!
- Doubles are just double-precision floats so 64-bits
- Unsigned: represented in traditional binary
- Funky arithmetic
- Example 1: Is (x^2 >= 0) always?
- For unsigned ints, yes because they are unsigned
- For signed ints, no because arithmetic is mod n which can be negative
- For floats, yes
- Example 2: Is ((x+y)+z = x+(y+z)) always?
- For ints (signed and unsigned), yes because modular arithmetic is associative
- For floats, no because float addition varies with order of magnitude
- ((1e20 + -1e20) + 3.14 \rightarrow 3.14)
- (1e20 + (-1e20 + 3.14) \rightarrow 0)
- Example 1: Is (x^2 >= 0) always?
- Computer arithmetic
- Cannot generate random values - always follows strict mathematical rules
- Not all usual mathematical properties hold for computer data types
- Int operations follow "ring" properties
- Floating point operations follow "ordering" properties (not field though)
- GR#1 Ints are not Integers, Floats are not Reals
- GR#2 You have to Assembly language
- Probably won't be used after this course because compilers are better than you
- Textbook uses variant of x86 called y86-64
- GR#3 Memory matters
- Not infinite
- Must be allocated and managed
- Pointer errors are very hard to debug
- GR#4 There's more to performance than asymptotic complexity
- Ex. row major order iteration of 2D array is much faster than column major
- Fortran is column major, but most other languages are row major
- Ex. row major order iteration of 2D array is much faster than column major
- More integer representation details
- Unsigned:
(\begin{align} & X[n] & * & X[n-1] & * & ... & * & X[2] & * & X[1] & * & X[0] \ & 2^n & * & 2^{n-1} & * & ... & * & 2^2 & * & 2^1 & * & 2^0 \end{align}) - Signed:
- In 2's complement, positive values are the same as their unsigned counterparts
- The magnitude of 2's complement can be found by doing 2's complement operation
- 2's complement value: (VT(x) = -X[w-1] * 2^{w-1} + \sum_{i=0}^{w-2}{X[i] * 2^{i}})
- Unsigned:
- GR#5 Computers do more than execute programs
- Bits vs. Byte
- No intrinsic reason that a byte must be 8 bits but it was decided by Fred Brook
- Used to be 6 bits per byte (because it was enough to store a char)
- All modern computers have Byte Addressable Memory (each byte accessed individually)
- Word is usually 4 bytes (grouped by address last two digits), short usually 2 bytes
- By grouping into words and shorts, some memory can address an entire chunk of 4 bytes
- Half byte is a nibble (each nibble is a hex digit)
- Little Endian vs Big Endian
- For some value 0xFF4292A1,
- FF is MSB=Most Significant Byte, A1 is LSB=Least Significant Byte
- Little Endian starts with LSB in memory
- Big Endian starts with MSB in memory
- For some value 0xFF4292A1,
- Fractional binary numbers
- Ex: (1011.101b = 12^3 + 02^2 + 12^1 + 12^0 + 12^{-1} + 02^{-2} + 1*2^{-3} \ = 8 + 2 + 1 + \frac{1}{2} + \frac{1}{8} = 11 + 0.5 + 0.125 = 11.625)
- Numbers of form (0.111111...) are just below 1 (like
$0.999...$ is just below 1 in decimal)- Represented (1.0 - \varepsilon)
- Floating point representation
- ((-1)^s M 2^E) where
$s$ is sign bit,$E$ is exponent of 2,$M$ is fraction field - Single Precision: 32 bits = (1 s) + (8 E) + (23 M)
- Double Precision: 64 bits = (1 s) + (11 E) + (52 M)
- Extended Precision: 80 bits (Intel only) = (1 s) + (15 E) + (63 or 64 M)
- "Normalized" Values
- If E=00..0 or E=11..1, value is "special"
- Exponent coded as biased value (E=Exp-Bias)
- ((-1)^s M 2^E) where
- Examples
- 357 = 0000000101100101 (16-bit 2's complement)
- -357 = 1111111010011011 (16-bit 2's complement)
- 357.0 = 0 10000111 011001000..0 (IEEE 754 standard float)
- Sign = 0 because positive
- Exp = 8 + Bias = 8 + 127 = 135
- Frac = After decimal of 1.00100101 (with trailing 0s)
- Floating point special values
- Denormalized values
- When Exp = 000..0
- Exponent value: E = 1 - Bias (instead of E = 0 - Bias as would be normal for Exp=0)
- 0 before decimal point, not 1 like usual
- Cases:
- Exp=000..0, Frac=000..0
- Represents zero (positive and negative are distinct)
- Exp=000..0, Frac
$\neq$ 000..0- Numbers closest to zero, equispaced
- Exp=000..0, Frac=000..0
- Smallest normalized value (Exp = 000..01)
- E = Exp - Bias = 1 - 127 = -126
- Value = (1-2^{126})
- Cannot be normalized
- Infinity
- When Exp=111..1, frac = 000..0
- represents
$\pm\infty$ - 1.0/0.0 = -1.0/-0.0 =
$+\infty$ , 1.0/-0.0 =$-\infty$
- represents
- When Exp=111..1, frac = 000..0
- Not a Number (NaN)
- When Exp=111..1, frac
$\neq$ 000..0
- When Exp=111..1, frac
- Denormalized values
- More Examples
- Gap size in large floats:
- [\begin{align}
& 0 \ 11111110 \ 111..11 \
- & 0 \ 11111110 \ 111..10 \ \hline & 0 \ 11111110 \ 000..01 \ = & 0.000..01 * 2^{127} \textrm{ (Must be normalized)} \ = & 1.000..00 * 2^{127-23} = 1.0 * 104 \ \approx & 2*10^{31} \textrm{ (HUGE!)} \end{align}]
- [\begin{align}
& 0 \ 11111110 \ 111..11 \
- Gap size in large floats:
- Float Rounding
- Basic idea:
- First computer exact result
- Make fit into desired precision
- Possibly overflow if exponent is too large
- Possibly round to fit in Frac
- Basic idea:
- Float Multiplication
- Exact result:
- Sign bit (s_1 \oplus s_2)
- Significand (M_1 \times M_2)
- Exponent (E_1 + E_2)
- Then fix result to fit correctly in float
- Exact result:
- Float Addition
- Must first get binary points aligned:
- ((-1)^{s_1}\cdot M_1\cdot 2^{E_1} + (-1)^{s_2}\cdot M_2\cdot 2^{E_2})
- Then fix correclty to float
- Must first get binary points aligned:
- Mathematical properties of FP addition
- Closed under addition?
- Yes but may produce infinity or NaN
- Commutative?
- Yes
- Associative? No: (in case of numbers of different magnitudes as seen earlier)
- 0 is Additive Identity?
- Yes
- Every element has additive inverse?
- Almost: everything except infinities and NaNs
- Monotonicity ((a \geq b \Rightarrow a+c \geq b+c))
- Almost: except for infinities and NaNs
- Closed under addition?
- Mathematical properties of FP multiplication
- Compare to Commutative ring
- Closed under multiplication?
- Yes (but may generate infinities and NaNs)
- Commutative?
- Yes
- Associative?
- No: possibility of overflow and inexactness of rounding (as in addition)
- 1 is Multiplicative Identity?
- Yes
- Multiplication distributive over addition?
- No: possibility of overflow and inexactness of rounding (as in associativity)
- Closed under multiplication?
- Monotonicity?
- Almost: all but infinities and NaNs
- Compare to Commutative ring
- Floating point specifications in C
- C guarantees two levels of precision
- float: 32 bits (1 + 8 + 25)
- double: 64 bits (1 + 11 + 52)
- Some machines offer a long double
- Conversions/Casting
- double/float
$\rightarrow$ int- truncates fractional part (like rounding toward zero): (\lfloor frac \rfloor)
- Not defined when out of range or NaN (usually sets to TMin)
- int
$\rightarrow$ double- Exact conversion as long as int has
$\leq$ 53 bit word size
- Exact conversion as long as int has
- double/float
- C guarantees two levels of precision
- FP Puzzles
- Assume
int x = _; float f = _; double d = _;
and neither d nor f is NaN. -
x == (int) (float) x
?- Not true for large numbers because large x cannot be represented in float without loss of precision.
-
x == (int) (double) x
?- True because 32-bit int largest number can be represented in
$\leq$ 53 bits without loss of precision.
- True because 32-bit int largest number can be represented in
-
f == (float) (double) f
?- True because conversion from float to double can be done without loss of precision and conversion back should always work because number is within range of float.
-
d == (double) (float) d
?- Not true because large doubles would lose precision when converted to float.
-
f == -(-f)
?- True because negation only affects the sign bit which can be easily reverted with another negation.
-
2/3 == 2/3.0
?- Not true because
2/3
does integer division where2/3.0
does floating point division.
- Not true because
-
d < 0.0
$\rightarrow$ d*2 < 0.0
?- True because multiplying by 2 does not affect sign bit.
-
d > f
$\rightarrow$ -f < -d
?- In comparing float to double, the float is promoted to double so this should always apply.
-
d * d >= 0.0
?- True because sign bit is XORed so the resulting sign will always be positive.
-
(d+f)-d == f
?- Not true because FP multiplication is not associative.
- Assume
- C Primer - Highlights
- Preprocessor
-
#include <X.h>
or#include "X.h"
includes header X.h -
#define X Y
replaces all instances of Y with X
-
- Command line arguments
- Arrays and structures
- Pointers and dynamic memory
- malloc
- free
- Preprocessor
- Intel x86 Processors
- Dominate laptop/desktop/server industry
- Backwards compatible back to 8086 (1978)
- Complex instruction set computer (CISC)
- Many different instructions with many different formats
- Only a few are regularly used
- Hard to match performance of Reduced Instruction Set Computers (RISC)
- But Intel has done just that.
- Many different instructions with many different formats
- Intel's Competitor: Advanced Micro Devices (AMD)
- Historically follows behind Intel: a little bit slower, a lot cheaper
- Built Opteron: tough competitor to Pentium 4
- Developed x86_64 successfully after Intel failed with its Itanium
- Most processors support 64-bit mode these days, but many programs operate in only 32 bits.
- Definitions
- Architecture (also Instruction Set Architecture - ISA):
- Only details of processor design that an assembly code writer must understand
- Architecture (also Instruction Set Architecture - ISA):
- Assembly/Machine Code View
- CPU:
- Program Counter (PC)
- Registers
- Condition Codes (what each instruction does)
- Arithmetic Logic Unit (ALU)
- Memory:
- Code
- Data
- Stack
- Bus connection:
- Addresses
- Data
- Instructions
- CPU:
- Turning C code into object code
- C program (x.c) -> (Compiler
gcc -S
) - Asm program (GAS syntax) (x.s) -> (Assembler
gcc
oras
) - Object program (x.o) -> (Linker
gcc
orld
) <~ Static Libraries (*.a) - Executable program (x) <~ Dynamic Libraries (*.so)
- C program (x.c) -> (Compiler
- Example of C -> Asm
- C:
long plus(long x, long y); void sumstore(long x, long y, long *dest) { long t = plus(x, y); *dest = t; //* }
- Assembly:
sumstore: pushq %rbx movq %rdx, %rbx call plus movq %rax, (%rbx) popq %rbx ret
- Machine Code:
0x0400595: 0x53 0x48 ... 0x5B 0xC3
- Missing everything except links between functions and things (provided by linker file)
- C:
- Assembly Characteristics:
- Data Types
- "Integer" data of 1, 2, 4, or 8 bytes
- Data values
- Addresses (untyped pointers)
- Floating point data of 4, 8, or 10 bytes
- Code: byte sequences encoding series of instructions
- No aggregate types such as arrays or structures
- Just continuously allocated bytes in memory
- "Integer" data of 1, 2, 4, or 8 bytes
- Operations
- Perform arithmetic operations (ALU)
- Move data between memory locations and registers
- Conditional and unconditional branching
- Data Types
- Disassembler:
-
objdump -d <executable>
converts machine code to more readable form - assembly - Also possible using
gdb
- Note: Reverse engineering is forbidden by Microsoft End User License Agreement
-
- x86-64 Integer Registers
- 8-byte examples
- %rax
- %rbx
- %rcx
- %rdx
- %rsi
- %rdi
- %rsp -> stack pointer
- %rbp -> "frame" pointer
- %r8
- ...
- %r15
- %rax (8-byte) -> %eax (4-byte) -> %ax (2-byte) -> %ah/%al (1-byte)
- 8-byte examples
- Moving Data
-
movq
moves quad word (4 words = 8 bytes) - Operand Types
- Constant integer data
- Prefixed with $ (i.e.
$0x400, $ -533) - Encoded with 1, 2, or 4 bytes
- Prefixed with $ (i.e.
- Register
- %rax and others seen previously
- %rsp and sometimes %rbp reserved
- Others sometimes have special purposes
- Memory Address
- Simplest example: (%rax) gives address stored in %rax
- Constant integer data
- Details of movq (source -> dest): [ movq \begin{cases} Immediate \quad \begin{cases} Register \ \ Memory \ \end{cases} \ Register \quad\quad \begin{cases} Register \ Memory \quad\quad \ \end{cases} \ Memory \end{cases} ]
-
leaq
- load effective address quad
- equivalent to
p = &x[i];
- Cool optimization:
- C code:
long m12(long x) { return x*12; }
- Assembly:
leaq (%rdi,%rdi,2), %rax # t <- x + x*2 (equivalent to t = 3*x) salq $2, %rax # return t << 2 (equivalent to return 4*t)
- C code:
- Control flow
- Temporary data
%rax
- Location of runtime stack
%rsp
- Location of current code control point
%rip
- Status of recent tests:
- CF: carry flag - carry out from addition is set
- ZF: zero flag - if t == 0
- SF: sign flag (signed) - if t < 0
- OF: overflow flag (signed) - sign bit is not what it should be (i.e. adding two positive numbers gives negative)
- Control flags set by ALU ops but not by
leaq
- Jumps
jX Condition Description jmp 1 Unconditional je ZF Equal / Zero jne ~ZF Not Equal / Not Zero js SF Negative jns ~SF Nonnegative jg ~(SF^OF)&~ZF > (signed) jge ~(SF^OF) >= (signed) jl (SF^OF) < (signed) jle (SF^OF) ZF ja ~CF&~ZF Above (unsigned) jb CF Below (unsigned) - Bad cases for conditional move
- Expensive computations -
val = Test(x) ? Hard1(x) : Hard2(x);
- Risky computations -
val = p ? *p : 0;
- Computations with side effects -
val = x > 0 ? x*=7 : x+=3;
- Expensive computations -
- Temporary data
- Jump Tables (switch statement)
.section .rodata
.align
.L4:
.quad .L2
.quad .L1
.quad .L3
.quad .L5
- Stack
- %rsp points to top of stack
- Stack is 'upside down' because stack bottom is at highest address and stack grows down with %rsp at lowest address element
- Heap is usually below stack
- Parameters to procedures
- First 6 parameters are stored in %rdi, %rsi, %rdx, %rcx, %r8, %r9
- Remaining parameters are pushed onto the stack if necessary with arg 7 on top of stack
- Return of function is stored in %rax
- Other registers should be pushed and popped unless it is known that the register can be clobbered without issue
- Recursive procedures
- Any language supporting recursion must implement a stack
- Parameters pneumonic
rdi
- Dolly'srsi
- Specialrdx
- Dressrcx
- Costsr8
- $89r9
- ^
- For stack parameters you do not pop them off, just address
(%rbp)
for param 7 and8(%rbp)
for param 8 if param 7 is 8 bytes long - Array allocation
T A[l]
is array length l of elements type T named A- In memeory, this is just a strip of memory with
l*sizeof(T)
bytes which can be accessed usingA[2]
or just*(A + 2*sizeof(T))
.
- Arrays
- C:
#define ZLEN 5 typedef zip_dig char[]; void zincr(zip_dig z) { size_t i; for (i = 0; i < ZLEN; i++) { z[i]++; } }
- ASM:
# %rdi = z movl $0, %eax # i = 0 jmp .L3 # goto middle .L4: # loop: addl $1, (%rdi,%rax,4) # z[i]++ addq $1, %rax # i++ .L3: # middle: cmpq $4, %rax # test i == 4 jbe .L4 # if <=, goto loop rep; ret
- Nothing to see here.
- Hardware Control Language (HCL)
- Similar to Hardware Description Language (HDL)
- Most prevalently Verilog and VHDL
- Conventions
- Single bits are low caps, word-level structures are caps
- Similar to Hardware Description Language (HDL)
- Decoders
- (n \times 2^n) Decoder takes
$n$ inputs and has$2^n$ outputs where output i is 1 and all others are 0 when the inputs represent i in binary - Used to do correct action based on opcode section of instruction
- (n \times 2^n) Decoder takes
- Multiplexers (MUX)
- Has
$n$ selector inputs and$2^n$ data inputs with 1 output where the output is equal to the$i^{th}$ data input when the selector inputs represent i - Used in selecting which register to use
- HCL 1-bit multiplexer:
bool out = (s && a) || (!s && b);
- Has
- HCL Word-Level Examples
- Minimum of 3
int Min3 = [ A < B && A < C : A; B < A && B < C : B; 1 : C; ];
- 4-Way Multiplexer
int Out4 = [ !s1 && !s0 : D0; !s1 : D1; !s2 : D2; 1 : D3; ];
- Minimum of 3
- Arithmetic Logic Unit (ALU)
- Uses a multiplexer to output the appropriate operator on inputs
- Sets flags OF, ZF, CF, SF in the process
- Flip Flops
- RS Flip Flop
- Two NOR gates with back flow to opposite gate
- Usually has a clock input to prevent race conditions
- input R,S: 01 -> set to 1, 10 -> reset to 0, 00 -> no action (for reading)
- 2 outputs Q, Q' (or Q+, Q-) where Q is value of
- RS Flip Flop
- Latching
- Random Access Memory (RAM)
- From CPU to Memory, there are three lines:
- 1-way CPU to Memory: control bus (read or write, a few other things)
- 1-way CPU to Memory: address bus (may be chunked)
- 2-way: data bus carries values to and from memory
- From CPU to Memory, there are three lines:
- HCL Overview
- Data Types
- bool : Boolean
- a, b, c, ...
- int : words (usually thinking 64 bit words)
- A, B, C, ...
- bool : Boolean
- Operations
- Logic operations:
&&, ||, !
- Word Comparisons:
==, !=, <, <=, >, >=
- Set Membership:
A in { B, C, D }
- Logic operations:
- Word Expressions:
- Case Expressions:
int Z = [ (case 1) : (value); (case 2) : (value); 1 : (default case); ];
- Case Expressions:
- Data Types
- Y86-64 Instruction Set
- Instructions
- halt: 00
- nop: 10
- cmovXX rA, rB : 2(fn) (rA) (rB)
- irmovq V, rB : 30 (F) (rB) (--V--) // Value into register
- rmmovq rA, D(rB) : 40 (rA) (rB) (--D--) // Reg to Mem
- mrmovq D(rA) rB : 50 (rA) (rB) (--D--) // Mem to reg
- OPq rA rB : 6(fn) rA rB
- jXX Dest : 7(fn) (Dest)
- call Dest : 80 (Dest)
- ret : 90
- pushq rA : A0 (rA) (F)
- popq rA : B0 (rA) (F)
- Functions for OPq
- addq : 60
- subq : 61
- andq : 62
- xorq : 63
- Jumps for jXX
- jmp : 70
- jle : 71
- jl : 72
- je : 73
- jne : 74
- jqe : 75
- jg : 76
- Instructions
- Sequential Hardware Structure
- Fetch: get value pointed by program counter (PC) (aka instruction pointer, IP) and move to instruction register (IR); increment PC
- Decode: Figure out what the command is
- Execute: Execute the command (in the case of ALU stuff)
- Memory: Move store memory stuff
- Write Back: write to registers
- PC: Start back from top
- Pipelining is when multiple stages are occuring simultaneously (why the process is split into 5 subprocedures)
- Instruction example
- For instruction
OPq rA, rB : 6(fn) (rA) (rB)
- Fetch: read two bytes
- Decode: read operand registers
- Execute: perform ALU operation, set condition codes
- Memory: Do nothing
- Write Back: Update register
- PC Update: Increment PC by 2 (2 because the OPq has an extra byte of registers)
- Exactly what happens in
OPq rA, rB : 6(fn) (rA) (rB)
- Fetch
- (\textrm{icode:ifun} \leftarrow M_1[PC]) // Read instruction byte
- (rA:rB \leftarrow M_1[PC+1]) // Read register byte
- (valP \leftarrow PC+2) // Computer next PC
- Decode
- (valA \leftarrow R[rA]) // Read operand A
- (valB \leftarrow R[rB]) // Read operand B
- Execute
- (valE \leftarrow valB\textrm{ OP }valA) // Perform ALU operation
- (\textrm{set CC}) // Set condition code register
- Memory // NOP
- Write Back
- (R[rB] \leftarrow valE) // Write back result
- PC Update
- (PC \leftarrow valP) // Update PC
- Fetch
- Notes on other opcodes
- For ops like rmmovq which have large data parts (i.e. D is 8 bytes), use (M_8) instead of (M_1)
- For stack operations, one register represents no register (represented F above and usually 0xF in 15-register system)
- In popq Decode:
- (valA \leftarrow R[%rsp]) // valA is used to put value at old %rsp into rA
- (valb \leftarrow R[%rsp]) // valB is used to calculate %rsp + 8 for new %rsp
- cmovXX moves rA into rB only when the condition is true
- If condition is false, 0xF is used as destination
- For instruction
- Y86-64 Instruction Mapping
0x1000: call 0x12300
- Fetch: (icode:ifun \leftarrow M_1[PC] \ valC \leftarrow M_8[PC+1] \ valP \leftarrow PC + 9)
- Decode: (valA \leftarrow R[rsp] \ valB \leftarrow R[rsp])
- Execute: (valE \leftarrow valB + (-8))
- Memory: (M_8[valE] \leftarrow valP)
- Write Back: (R[rsp] \leftarrow valE)
- PC Update: (PC \leftarrow valC)
ret
(0x90)- Fetch: (icode:ifun \leftarrow M_1[PC])
- Decode: (valA \leftarrow R[rsp] \ valB \leftarrow R[rsp])
- Execute: (valE \leftarrow valB + 8)
- Memory: (valM \leftarrow M_8[valA])
- Write Back: (R[rsp] \leftarrow valE)
- PC Update: (PC \leftarrow valM)
- (Insert stuff here)
- iaddq V, rB
- Fetch: (icode:ifun \leftarrow M_1[PC] \ rA:rB \leftarrow M_1[PC+1] \ valC \leftarrow M_8[PC+2] \ valP \leftarrow PC + 10)
- Decode: (valB \leftarrow R[rB])
- Execute: (valE \leftarrow valB + valC)
- Memory:
- Write Back: (R[rB] \leftarrow valE)
- PC Update: (PC \leftarrow valP)
- HW Convert to assembly:
void swappair(long a[], n) { // Assumes n is length of a and n is even int i; long swap; for (i = 0; i < n; i = i + 2) { swap = a[i]; a[i] = a[i+1]; a[i+1] = swap; } }
- N/A
- #TODO