hdamron17/LectureNotes.md

## LectureNotes.md

      
    Raw
  

              LectureNotes.md
            
          
    CSCE 212H: Introduction to Computer Architecture

2018-01-18


Number representation

Unsigned: represented in traditional binary

On n-bit architecture, (2^n) total
(Umax=2^n-1) (minus one because of zero)
All operations are technically modulo (2^n)
Ints are not integers!


Signed magnitude: represents signed numbers by using first bit as sign

Represents (-(2^{n-1}-1) \textrm{ to } 2^{n-1}-1) with a duplicate 0
Does not work well with arithmetic


Two's Complement: alternate to signed magnitude good for arithmetic

Operation and representation
Represents negatives by flipping all bits and adding 1
First bit is sign bit - 1 represents negative
Can be passed through arithmetic with no further modification
Ex.  24 = 0..011000 -> 1..100111 (1's Comp) -> 1..101000 (2's Comp) = -24
Ex. -24 = 1..101000 -> 0..010111 (1's Comp) -> 0..011000 (2's Comp) =  24
Ex. 6 + (-5) = 0..0110 + 1..1011 = 0..0001 = 1
Ex. (-7) + 3 = 1..1001 + 0..0011 = 1..1100 = -4
Alternative method: flip all bits to the left of the rightmost 1
On n-bit architecture, (Tmax=2^(n-1)-1, Tmin=-2^(n-1))
Special case (Tmin = -2^{63} = 10..0b: 2sComp(Tmin) = 2sComp(10..0) = 10..0 = Tmin)
So 2sComp(Tmin) = Tmin or -Tmin = Tmin (because the magnitude of Tmin is greater than Tmax)


Floats (32 bits)

IEEE 754 Standard

(Sign (1b)) (   Exponent (8b)   ) (          Fraction (23b)          )
Exponent sign is part of 8-bits by adding a bias to the field value mod 2^31
Exponent base is 2 not 10


Floats are not reals!
Doubles are just double-precision floats so 64-bits


Funky arithmetic

Example 1: Is (x^2 >= 0) always?

For unsigned ints, yes because they are unsigned
For signed ints, no because arithmetic is mod n which can be negative
For floats, yes


Example 2: Is ((x+y)+z = x+(y+z)) always?

For ints (signed and unsigned), yes because modular arithmetic is associative
For floats, no because float addition varies with order of magnitude

((1e20 + -1e20) + 3.14 \rightarrow 3.14)
(1e20 + (-1e20 + 3.14) \rightarrow 0)


Computer arithmetic

Cannot generate random values - always follows strict mathematical rules
Not all usual mathematical properties hold for computer data types
Int operations follow "ring" properties
Floating point operations follow "ordering" properties (not field though)


GR#1 Ints are not Integers, Floats are not Reals
GR#2 You have to Assembly language

Probably won't be used after this course because compilers are better than you
Textbook uses variant of x86 called y86-64


2018-01-23


GR#3 Memory matters

Not infinite
Must be allocated and managed
Pointer errors are very hard to debug


GR#4 There's more to performance than asymptotic complexity

Ex. row major order iteration of 2D array is much faster than column major

Fortran is column major, but most other languages are row major


More integer representation details

Unsigned: 

(\begin{align}
& X[n] & * & X[n-1] & * & ... & * & X[2] & * & X[1] & * & X[0] \
& 2^n & * & 2^{n-1} & * & ... & * & 2^2 & * & 2^1 & * & 2^0
\end{align})
Signed:
In 2's complement, positive values are the same as their unsigned counterparts
The magnitude of 2's complement can be found by doing 2's complement operation
2's complement value: (VT(x) = -X[w-1] * 2^{w-1} + \sum_{i=0}^{w-2}{X[i] * 2^{i}})


GR#5 Computers do more than execute programs
Bits vs. Byte
No intrinsic reason that a byte must be 8 bits but it was decided by Fred Brook
Used to be 6 bits per byte (because it was enough to store a char)
All modern computers have Byte Addressable Memory (each byte accessed individually)
Word is usually 4 bytes (grouped by address last two digits), short usually 2 bytes
By grouping into words and shorts, some memory can address an entire chunk of 4 bytes
Half byte is a nibble (each nibble is a hex digit)

2018-01-25


Little Endian vs Big Endian

For some value 0xFF4292A1,

FF is MSB=Most Significant Byte, A1 is LSB=Least Significant Byte
Little Endian starts with LSB in memory
Big Endian starts with MSB in memory


Fractional binary numbers

Ex: (1011.101b = 12^3 + 02^2 + 12^1 + 12^0 + 12^{-1} + 02^{-2} + 1*2^{-3} \
= 8 + 2 + 1 + \frac{1}{2} + \frac{1}{8} = 11 + 0.5 + 0.125 = 11.625)
Numbers of form (0.111111...) are just below 1 (like $0.999...$ is just below 1 in decimal)

Represented (1.0 - \varepsilon)


Floating point representation

((-1)^s M 2^E) where $s$ is sign bit, $E$ is exponent of 2, $M$ is fraction field
Single Precision: 32 bits = (1 s) + (8 E) + (23 M)
Double Precision: 64 bits = (1 s) + (11 E) + (52 M)
Extended Precision: 80 bits (Intel only) = (1 s) + (15 E) + (63 or 64 M)
"Normalized" Values

If E=00..0 or E=11..1, value is "special"
Exponent coded as biased value (E=Exp-Bias)


2018-01-30


Examples

357 = 0000000101100101 (16-bit 2's complement)
-357 = 1111111010011011 (16-bit 2's complement)
357.0 = 0 10000111 011001000..0 (IEEE 754 standard float)

Sign = 0 because positive
Exp = 8 + Bias = 8 + 127 = 135
Frac = After decimal of 1.00100101 (with trailing 0s)


Floating point special values

Denormalized values

When Exp = 000..0
Exponent value: E = 1 - Bias (instead of E = 0 - Bias as would be normal for Exp=0)
0 before decimal point, not 1 like usual
Cases:

Exp=000..0, Frac=000..0

Represents zero (positive and negative are distinct)


Exp=000..0, Frac $\neq$ 000..0

Numbers closest to zero, equispaced


Smallest normalized value (Exp = 000..01)

E = Exp - Bias = 1 - 127 = -126
Value = (1-2^{126})
Cannot be normalized


Infinity

When Exp=111..1, frac = 000..0

represents $\pm\infty$

1.0/0.0 = -1.0/-0.0 = $+\infty$, 1.0/-0.0 = $-\infty$


Not a Number (NaN)

When Exp=111..1, frac $\neq$ 000..0


More Examples

Gap size in large floats:

[\begin{align}
& 0 \ 11111110 \ 111..11 \

& 0 \ 11111110 \ 111..10 \
\hline
& 0 \ 11111110 \ 000..01 \
= & 0.000..01 * 2^{127} \textrm{ (Must be normalized)} \
= & 1.000..00 * 2^{127-23} = 1.0 * 104 \
\approx & 2*10^{31} \textrm{ (HUGE!)}
\end{align}]


Float Rounding

Basic idea:

First computer exact result
Make fit into desired precision

Possibly overflow if exponent is too large
Possibly round to fit in Frac


Float Multiplication

Exact result:

Sign bit (s_1 \oplus s_2)
Significand (M_1 \times M_2)
Exponent (E_1 + E_2)


Then fix result to fit correctly in float


Float Addition

Must first get binary points aligned:

((-1)^{s_1}\cdot M_1\cdot 2^{E_1} + (-1)^{s_2}\cdot M_2\cdot 2^{E_2})
Then fix correclty to float


Mathematical properties of FP addition

Closed under addition?

Yes but may produce infinity or NaN


Commutative?

Yes


Associative?
No: (in case of numbers of different magnitudes as seen earlier)
0 is Additive Identity?

Yes


Every element has additive inverse?

Almost: everything except infinities and NaNs


Monotonicity ((a \geq b \Rightarrow a+c \geq b+c))

Almost: except for infinities and NaNs


2018-02-01


Mathematical properties of FP multiplication

Compare to Commutative ring

Closed under multiplication?

Yes (but may generate infinities and NaNs)


Commutative?

Yes


Associative?

No: possibility of overflow and inexactness of rounding (as in addition)


1 is Multiplicative Identity?

Yes


Multiplication distributive over addition?

No: possibility of overflow and inexactness of rounding (as in associativity)


Monotonicity?

Almost: all but infinities and NaNs


Floating point specifications in C

C guarantees two levels of precision

float: 32 bits (1 + 8 + 25)
double: 64 bits (1 + 11 + 52)
Some machines offer a long double


Conversions/Casting

double/float $\rightarrow$ int

truncates fractional part (like rounding toward zero): (\lfloor frac \rfloor)
Not defined when out of range or NaN (usually sets to TMin)


int $\rightarrow$ double

Exact conversion as long as int has $\leq$ 53 bit word size


FP Puzzles

Assume int x = _; float f = _; double d = _; and neither d nor f is NaN.

x == (int) (float) x?

Not true for large numbers because large x cannot be represented in float without loss of precision.


x == (int) (double) x?

True because 32-bit int largest number can be represented in $\leq$ 53 bits without loss of precision.


f == (float) (double) f?

True because conversion from float to double can be done without loss of precision and conversion back should always work because number is within range of float.


d == (double) (float) d?

Not true because large doubles would lose precision when converted to float.


f == -(-f)?

True because negation only affects the sign bit which can be easily reverted with another negation.


2/3 == 2/3.0?

Not true because 2/3 does integer division where 2/3.0 does floating point division.


d < 0.0 $\rightarrow$ d*2 < 0.0?

True because multiplying by 2 does not affect sign bit.


d > f $\rightarrow$ -f < -d?

In comparing float to double, the float is promoted to double so this should always apply.


d * d >= 0.0?

True because sign bit is XORed so the resulting sign will always be positive.


(d+f)-d == f?

Not true because FP multiplication is not associative.


C Primer - Highlights

Preprocessor


#include <X.h> or #include "X.h" includes header X.h

#define X Y replaces all instances of Y with X


Command line arguments
Arrays and structures
Pointers and dynamic memory

malloc
free


Intel x86 Processors

Dominate laptop/desktop/server industry
Backwards compatible back to 8086 (1978)
Complex instruction set computer (CISC)

Many different instructions with many different formats

Only a few are regularly used


Hard to match performance of Reduced Instruction Set Computers (RISC)

But Intel has done just that.


Intel's Competitor: Advanced Micro Devices (AMD)

Historically follows behind Intel: a little bit slower, a lot cheaper
Built Opteron: tough competitor to Pentium 4
Developed x86_64 successfully after Intel failed with its Itanium
Most processors support 64-bit mode these days, but many programs operate in only 32 bits.


Definitions

Architecture (also Instruction Set Architecture - ISA):

Only details of processor design that an assembly code writer must understand


Assembly/Machine Code View

CPU:

Program Counter (PC)
Registers
Condition Codes (what each instruction does)
Arithmetic Logic Unit (ALU)


Memory:

Code
Data
Stack


Bus connection:

Addresses
Data
Instructions


2018-02-06


Turning C code into object code

C program (x.c) -> (Compiler gcc -S)
Asm program (GAS syntax) (x.s) -> (Assembler gcc or as)
Object program (x.o) -> (Linker gcc or ld) <~ Static Libraries (*.a)
Executable program (x) <~ Dynamic Libraries (*.so)


Example of C -> Asm

C:
long plus(long x, long y);

void sumstore(long x, long y, long *dest) {
  long t = plus(x, y);
  *dest = t; //*
}

Assembly:
sumstore:
  pushq   %rbx
  movq    %rdx, %rbx
  call    plus
  movq    %rax, (%rbx)
  popq    %rbx
  ret


Machine Code:
0x0400595:
  0x53
  0x48
  ...
  0x5B
  0xC3


Missing everything except links between functions and things (provided by linker file)


Assembly Characteristics:

Data Types

"Integer" data of 1, 2, 4, or 8 bytes

Data values
Addresses (untyped pointers)


Floating point data of 4, 8, or 10 bytes
Code: byte sequences encoding series of instructions
No aggregate types such as arrays or structures

Just continuously allocated bytes in memory


Operations

Perform arithmetic operations (ALU)
Move data between memory locations and registers
Conditional and unconditional branching


Disassembler:


objdump -d <executable> converts machine code to more readable form - assembly
Also possible using gdb

Note: Reverse engineering is forbidden by Microsoft End User License Agreement


x86-64 Integer Registers

8-byte examples

%rax
%rbx
%rcx
%rdx
%rsi
%rdi
%rsp -> stack pointer
%rbp -> "frame" pointer
%r8
...
%r15


%rax (8-byte) -> %eax (4-byte) -> %ax (2-byte) -> %ah/%al (1-byte)


Moving Data


movq moves quad word (4 words = 8 bytes)
Operand Types

Constant integer data

Prefixed with $ (i.e. $0x400, $-533)
Encoded with 1, 2, or 4 bytes


Register

%rax and others seen previously
%rsp and sometimes %rbp reserved
Others sometimes have special purposes


Memory Address

Simplest example: (%rax) gives address stored in %rax


Details of movq (source -> dest):
[
movq
\begin{cases}
Immediate \quad
\begin{cases}
Register \ \
Memory \
\end{cases} \
Register \quad\quad
\begin{cases}
Register \
Memory \quad\quad \
\end{cases} \
Memory
\end{cases}
]


2018-02-08


leaq

load effective address quad
equivalent to p = &x[i];


Cool optimization:

C code:
long m12(long x) {
  return x*12;
}


Assembly:
leaq (%rdi,%rdi,2), %rax  # t <- x + x*2 (equivalent to t = 3*x)
salq $2, %rax             # return t << 2 (equivalent to return 4*t)


Control flow

Temporary data %rax
Location of runtime stack %rsp
Location of current code control point %rip
Status of recent tests:

CF: carry flag - carry out from addition is set
ZF: zero flag - if t == 0
SF: sign flag (signed) - if t < 0
OF: overflow flag (signed) - sign bit is not what it should be (i.e. adding two positive numbers gives negative)


Control flags set by ALU ops but not by leaq
Jumps


jX
Condition
Description


jmp
1
Unconditional


je
ZF
Equal / Zero


jne
~ZF
Not Equal / Not Zero


js
SF
Negative


jns
~SF
Nonnegative


jg
~(SF^OF)&~ZF
> (signed)


jge
~(SF^OF)
>= (signed)


jl
(SF^OF)
< (signed)


jle
(SF^OF)
ZF


ja
~CF&~ZF
Above (unsigned)


jb
CF
Below (unsigned)


Bad cases for conditional move

Expensive computations - val = Test(x) ? Hard1(x) : Hard2(x);
Risky computations - val = p ? *p : 0;
Computations with side effects - val = x > 0 ? x*=7 : x+=3;


2018-02-13


Jump Tables (switch statement)

.section    .rodata
    .align
.L4:
    .quad   .L2
    .quad   .L1
    .quad   .L3
    .quad   .L5


Stack

%rsp points to top of stack
Stack is 'upside down' because stack bottom is at highest address and stack grows down with %rsp at lowest address element
Heap is usually below stack


Parameters to procedures

First 6 parameters are stored in %rdi, %rsi, %rdx, %rcx, %r8, %r9
Remaining parameters are pushed onto the stack if necessary with arg 7 on top of stack
Return of function is stored in %rax
Other registers should be pushed and popped unless it is known that the register can be clobbered without issue


Recursive procedures

Any language supporting recursion must implement a stack


2018-02-15


Parameters pneumonic

rdi - Dolly's
rsi - Special
rdx - Dress
rcx - Costs
r8  - $89
r9  - ^


For stack parameters you do not pop them off, just address (%rbp) for param 7 and 8(%rbp) for param 8 if param 7 is 8 bytes long
Array allocation

T A[l] is array length l of elements type T named A
In memeory, this is just a strip of memory with l*sizeof(T) bytes which can be accessed using A[2] or just *(A + 2*sizeof(T)).


2018-02-20


Arrays

C:

#define ZLEN 5
typedef zip_dig char[];

void zincr(zip_dig z) {
  size_t i;
  for (i = 0; i < ZLEN; i++) {
    z[i]++;
  }
}

ASM:

  # %rdi = z
  movl  $0,  %eax           #   i = 0
  jmp   .L3                 #   goto middle
.L4:                        # loop:
  addl  $1,  (%rdi,%rax,4)  #   z[i]++
  addq  $1,  %rax           #   i++
.L3:                        # middle:
  cmpq  $4,  %rax           #   test i == 4
  jbe   .L4                 #   if <=, goto loop
  rep; ret


2018-02-22


Nothing to see here.

2018-03-05


Hardware Control Language (HCL)

Similar to Hardware Description Language (HDL)

Most prevalently Verilog and VHDL


Conventions

Single bits are low caps, word-level structures are caps


Decoders

(n \times 2^n) Decoder takes $n$ inputs and has $2^n$ outputs where output i is 1 and all others are 0 when the inputs represent i in binary
Used to do correct action based on opcode section of instruction


Multiplexers (MUX)

Has $n$ selector inputs and $2^n$ data inputs with 1 output where the output is equal to the $i^{th}$ data input when the selector inputs represent i
Used in selecting which register to use
HCL 1-bit multiplexer: bool out = (s && a) || (!s && b);


HCL Word-Level Examples

Minimum of 3
int Min3 = [
  A < B && A < C : A;
  B < A && B < C : B;
  1              : C;
];

4-Way Multiplexer
int Out4 = [
  !s1 && !s0 : D0;
  !s1        : D1;
  !s2        : D2;
  1          : D3;
];


Arithmetic Logic Unit (ALU)

Uses a multiplexer to output the appropriate operator on inputs
Sets flags OF, ZF, CF, SF in the process


Flip Flops

RS Flip Flop

Two NOR gates with back flow to opposite gate
Usually has a clock input to prevent race conditions
input R,S: 01 -> set to 1, 10 -> reset to 0, 00 -> no action (for reading)
2 outputs Q, Q' (or Q+, Q-) where Q is value of


Latching

Transparent 1-Bit Latch


Random Access Memory (RAM)

From CPU to Memory, there are three lines:

1-way CPU to Memory: control bus (read or write, a few other things)
1-way CPU to Memory: address bus (may be chunked)
2-way: data bus carries values to and from memory


HCL Overview

Data Types

bool : Boolean

a, b, c, ...


int : words (usually thinking 64 bit words)

A, B, C, ...


Operations

Logic operations: &&, ||, !

Word Comparisons: ==, !=, <, <=, >, >=

Set Membership: A in { B, C, D }


Word Expressions:

Case Expressions:
int Z = [ (case 1) : (value); (case 2) : (value); 1 : (default case); ];


2018-03-08


Y86-64 Instruction Set

Instructions

halt: 00
nop: 10
cmovXX rA, rB : 2(fn) (rA) (rB)
irmovq V, rB : 30 (F) (rB) (--V--)   // Value into register
rmmovq rA, D(rB) : 40 (rA) (rB) (--D--)  // Reg to Mem
mrmovq D(rA) rB : 50  (rA) (rB) (--D--)  // Mem to reg
OPq rA rB : 6(fn) rA rB
jXX Dest : 7(fn) (Dest)
call Dest : 80 (Dest)
ret : 90
pushq rA : A0 (rA) (F)
popq rA : B0 (rA) (F)


Functions for OPq

addq : 60
subq : 61
andq : 62
xorq : 63


Jumps for jXX

jmp : 70
jle : 71
jl : 72
je : 73
jne : 74
jqe : 75
jg : 76


Sequential Hardware Structure

Fetch: get value pointed by program counter (PC) (aka instruction pointer, IP) and move to instruction register (IR); increment PC
Decode: Figure out what the command is
Execute: Execute the command (in the case of ALU stuff)
Memory: Move store memory stuff
Write Back: write to registers
PC: Start back from top


Pipelining is when multiple stages are occuring simultaneously (why the process is split into 5 subprocedures)
Instruction example

For instruction OPq rA, rB : 6(fn) (rA) (rB)

Fetch: read two bytes
Decode: read operand registers
Execute: perform ALU operation, set condition codes
Memory: Do nothing
Write Back: Update register
PC Update: Increment PC by 2 (2 because the OPq has an extra byte of registers)


Exactly what happens in OPq rA, rB : 6(fn) (rA) (rB)

Fetch

(\textrm{icode:ifun} \leftarrow M_1[PC])   // Read instruction byte
(rA:rB \leftarrow M_1[PC+1])               // Read register byte
(valP \leftarrow PC+2)                     // Computer next PC


Decode

(valA \leftarrow R[rA])                    // Read operand A
(valB \leftarrow R[rB])                    // Read operand B


Execute

(valE \leftarrow valB\textrm{ OP }valA)    // Perform ALU operation
(\textrm{set CC})                          // Set condition code register


Memory
  // NOP
Write Back

(R[rB] \leftarrow valE)                    // Write back result


PC Update

(PC \leftarrow valP)                       // Update PC


Notes on other opcodes

For ops like rmmovq which have large data parts (i.e. D is 8 bytes), use (M_8) instead of (M_1)
For stack operations, one register represents no register (represented F above and usually 0xF in 15-register system)
In popq Decode:

(valA \leftarrow R[%rsp])   // valA is used to put value at old %rsp into rA
(valb \leftarrow R[%rsp])   // valB is used to calculate %rsp + 8 for new %rsp


cmovXX moves rA into rB only when the condition is true

If condition is false, 0xF is used as destination


2018-03-22


Y86-64 Instruction Mapping

0x1000: call 0x12300

Fetch: (icode:ifun \leftarrow M_1[PC] \ valC \leftarrow M_8[PC+1] \ valP \leftarrow PC + 9)
Decode: (valA \leftarrow R[rsp] \ valB \leftarrow R[rsp])
Execute: (valE \leftarrow valB + (-8))
Memory: (M_8[valE] \leftarrow valP)
Write Back: (R[rsp] \leftarrow valE)
PC Update: (PC \leftarrow valC)


ret (0x90)

Fetch: (icode:ifun \leftarrow M_1[PC])
Decode: (valA \leftarrow R[rsp] \ valB \leftarrow R[rsp])
Execute: (valE \leftarrow valB + 8)
Memory: (valM \leftarrow M_8[valA])
Write Back: (R[rsp] \leftarrow valE)
PC Update: (PC \leftarrow valM)


2018-03-29


(Insert stuff here)

2018-04-03


iaddq V, rB

Fetch: (icode:ifun \leftarrow M_1[PC] \ rA:rB \leftarrow M_1[PC+1] \ valC \leftarrow M_8[PC+2] \ valP \leftarrow PC + 10)
Decode: (valB \leftarrow R[rB])
Execute: (valE \leftarrow valB + valC)
Memory:
Write Back: (R[rB] \leftarrow valE)
PC Update: (PC \leftarrow valP)


2018-04-05


HW Convert to assembly:
void swappair(long a[], n) {
  // Assumes n is length of a and n is even
  int i;
  long swap;
  for (i = 0; i < n; i = i + 2) {
    swap = a[i];
    a[i] = a[i+1];
    a[i+1] = swap;
  }
}


2018-04-24


N/A

2018-04-26


#TODO
jX	Condition	Description
jmp	1	Unconditional
je	ZF	Equal / Zero
jne	~ZF	Not Equal / Not Zero
js	SF	Negative
jns	~SF	Nonnegative
jg	~(SF^OF)&~ZF	> (signed)
jge	~(SF^OF)	>= (signed)
jl	(SF^OF)	< (signed)
jle	(SF^OF)	ZF
ja	~CF&~ZF	Above (unsigned)
jb	CF	Below (unsigned)