143mailliw/reference.md

## reference.md

      
    Raw
  

              reference.md
            
          
    Table of Contents


Table of Contents
The CPU

The Instruction Decoder

Instruction Formats

R (Register-based ALU)
I (Immediate)
J (Jump)


Pseudo-direct Addressing
PC-relative Addressing


Endianness


Programming

Execution Flow

Labels
Unconditional Jumps
Conditional Branching
Subroutines (Procedures & Functions)


Pseudo-Instructions
Accessing Memory

Instructions
Offsets
Sizes
The Stack


Bitwise Logical Operations

Left Shift and Right Shift


lo and hi


C

Hexadecimal and Binary Numbers
Pointers

A Note on Arrays


Appendix

Definitions

ALU
Bus
Register


Other Resources


The CPU

In CSE 120, you built most of a CPU. This included the ALU, program counter, memory bus, and some data flow control circuitry. This is, infact, the core of the CPU, and it will perform most computational operations on it's own. However, programming in assembly makes relevant quite a bit of knowledge that was mostly ignored in CSE 120, thanks to it's role as a dual-purpose design class.
The Instruction Decoder

You directly controlled the circuitry of your CPU in CSE 120, but in 230 we're controlling it using instructions. This is made possible by the Instruction Decoder, which takes an instruction (like add), and turns it in to the relevant control signals to drive components like the memory bus and ALU.
For our example instruction, add, the instruction decoder will recieve the instruction (how it recieves this instruction is dependant on the CPU architecture), and direct the relevant circuitry to perform the operation using the control signals (like the inputs in CSE 120). There are 3 types of inputs for instructions to use (though not all instructions will use all inputs), and they are as follows:

The opcode (operation code), which is the part of the instruction that represents the actual operation to perform, like add, sub, or and. This section of the instruction is decoded into control signals to instruct various components, like the ALU and data buses, what they will be doing this clock cycle. For an add instruction, the instruction decoder would tell the ALU that the current operation is an arithmetic add operation.
The sources, which are the registers that the instruction is acting upon. These are decoded into control signals that inform the data bus and register file (which is where the registers are stored) what registers to send and where to send them to.
The destination, which is the register that the resulting data from the instruction is stored into. Like sources, this input is decoded into control signals that inform the data bus where and how to store data.

Instruction Formats

In most (if not all) CPU designs, an instruction must fit in to a single register. For the version of MIPS we use in class, these registers are 32-bits. This, however, leads to a problem: if an address itself can be 32-bits long, how could we store all of our instructions, which could have more than just an address? This is where the several instruction formats come in to play. The instruction formats allow for different parameters to be stored for different instructions, bypassing the above limitation all together.
The 32-bit MIPS architecture we use in class has the following instruction formats:
R (Register-based ALU)

This is the format that most instructions (like sll, add, and, etc...) use.

    
        opcode rs rt rd shamt funct
    

opcode (operation code, 6 bits): The operation code of the current instruction. For R format instructions, the current operation is determined by funct, and the opcode is always 0.
rs (register source, 5 bits): The first source for the operation.
rt (register target, 5 bits): The second source for the operation.
rd (register destination, 5 bits): The destination register for the result of the operation to be stored in.
shamt (shift amount, 5 bits): The amount to shift the source by. Only used for sll and srl.
funct (function, 6 bits): The actual operation to execute. As mentioned before, all R instructions have an opcode of 0 and the actual operation is determined by this value.

I (Immediate)

This is the format that all immediate instructions (like lui and addi) use.

    
        opcode rs rt immediate
    

opcode (operation code, 6 bits): The operation code of the current instruction.
rs (register source, 5 bits): The source for the operation.
rt (register target, 5 bits): The destination register for the result of the operation to be stored in.
immediate (immediate, 16 bits): The constant value source for the operation.

J (Jump)

This is the format used by j and jal, for jumping to an address.

    
        opcode address
    

opcode (operation code, 6 bits): The operation code of the current instruction.
address (address in program, 26 bits): The address in the current program to jump to.

Pseudo-direct Addressing

Since all instructions require a 6-bit opcode, jump (J) instructions only have 26-bits to store an address. However, memory addresses are 1-word (32-bits) long, which means we can't store a whole address - we're short 6 bits.
To work around this, we use pseudo-direct addressing, which allows us to store most of an address in 26-bits. The first 4 bits are dropped, and the first 4 bits from the program counter are used instead. Then, the remaining 2 bits are saved by simply dropping the last 2-bits of the address. We can drop the two bits since words are always 4 bytes long, meaning their location in memory is always divisible by 4, which means the first two bits (which store 2 and 1) are never used.
For example, let's say we want to store the address 0x0874, which is 00000000 00000000 00001000 01110100 in 32-bit binary. We can split it up like so:
0000 00000000000000001000011101 00

Then, we can drop the 0000 and 00:
00000000000000001000011101

When it comes time to to use the address, we can simply take the value of the program counter and over-write these 26-bits of it to get our address back:
     00000000000000001000011101
0000 01101010100110001100100100 00

This will work for any program that is 256 megabytes or less in size - any more than that, and you'll need to change the 29th bit of the address, which isn't possible. There are several ways to work around that, too, of course:

If you're just loading a new section of the program, you could load a 32-bit value using li and store that value in the program counter using jr.
You could bank your memory, which is where you swap sections of memory between usable and unusable areas in order to access all of it.
You could load the current value of the program counter by jumping and linking to a function that stores $ra in $v0 and then immediately returns.

In any case, I don't see us needing to write programs with more than 2.5 million instructions in them in this class.
PC-relative Addressing

Branch instruction are all I-format instructions, with a 16 bit immediate value. This means we can't store a whole address, and we can't use pseudo-direct addressing. We need a way to store the address within the limit of 16-bits, but directly storing an address is impossible - so we compromise.
Since branch instructions are often used to jump to labels that are shortly after or before the current code, storing a signed count of how many instructions we need to jump back/forth will work for most cases. This is PC-relative addressing - instead of storing the position of the current instruction, we store how many instructions we need to jump, with negative numbers going backwards.
For example, take the instruction bne $at $t0 Label, where Label is 4 instructions ahead of bne (meaning there is 3 instructions between bne and Label). This means, when we assemble our instruction into machine code, the address will simply be stored as 4 in binary. If we were to go back 4 instructions, it would be stored as -4.
Endianness

Endianness refers to where the CPU places the most signifigant bytes of a number while working with it. A big-endian CPU stores the most signifigant byte of the number first, and a little-endian CPU stores the most sigifigant byte last.
For example, let's say we stored the 32-bit number 0x1234ABCD in to memory location 4000.
On a big-endian CPU, this will be stored in the manner you will expect, with memory location 4000 containing 0x12, memory location 4001 containing 0x34, and so on:


Location
Contents


4000
00010010 (0x12)


4001
00110100 (0x34)


4002
10101011 (0xAB)


4003
11001101 (0xCD)


However, on a little-endian CPU (like the one we're using in class), the bytes will be stored in reverse order, with memory location 4000 containing 0xCD, memory location 4001 containing 0xAB, and so on:


Location
Contents


4000
11001101 (0xCD)


4001
10101011 (0xAB)


4002
00110100 (0x34)


4003
00010010 (0x12)


Programming

The instructions and registers used here are specifically for 32-bit MIPS assembly, which is what we're using in class. The general concepts should be applicable to other CPU architectures, but not directly.
Execution Flow

Labels

A key part of controlling execution in assembly is labels, which allows you to specify points to jump to in your code using jump or branch instructions. A label can be declared like this:
label:
    inst $1, $2, $3

Note:
A useful trick for the Zylabs simulator is to jump to an empty label placed at the end of the program, which you can jump to at the end of your program to end the simulation.
For example:
main 
    # main code for your program     
    j exit # this will end your program

exit:
    # do not put anything after the label

Unconditional Jumps

MIPS has several instructions for jumping unconditionally:

j <address> or j <label> (which will be turned in to the other form by the compiler), which sets the program counter (which stores what the next instruction will be) to the specified address or label
jr <$register>, which sets the program counter to the value of the specified register. This is useful for returning from a procedure, using the $ra register.
jal <address/label> works similarly to j, but it stores the next instruction (pre-jump, so the instruction immediately after jal) in $ra, so that the section you jump to is able to return to the previous section of code using jr $ra. This is what allows for functions and procedures to be implemented.

Conditional Branching

MIPS only has two instructions for conditional branching:

beq <$left>, <$right>, <address/label> branches to <address/label> if <$left> is equal to <$right>.
bne <$left>, <$right>, <address/label> branches to <address/label> if <$left> is not equal to <$right>.

Additionally, the Zylabs simulator supports the following pseudo-instructions that allow you to perform more advanced comparisons:

bge <$left>, <$right>, <address/label> branches to <address/label> if <$left> is greater than <$right>.
bgt <$left>, <$right>, <address/label> branches to <address/label> if <$left> is greater than or equal to <$right>.
ble <$left>, <$right>, <address/label> branches to <address/label> if <$left> is less than <$right>.
blt <$left>, <$right>, <address/label> branches to <address/label> if <$left> is less than or equal to <$right>.

Subroutines (Procedures & Functions)

Subroutines are reusable sections of code used to implement procedures and functions. Unlike a simple label (which we jump to with j), subroutines are expected to return the program to the pre-jump state after execution. MIPS has several features designed to allow for easy development using subroutines:

jal (jump and link) and jr (jump register). Before jumping, jal stores the next instruction (if it hadn't jumped) in $ra (return address), so that you can return to the previous section of code by jumping to $ra (with the jr instruction).
Temporary registers ($t0 - $t9). Unlike Saved Temporaries ($s0 - $s7), there are no promises that temporary registers will persist between subroutine calls. Saved Temporaries are not supposed to be modified during a subroutine.
The argument ($a0 - $a3) and value ($v0 and $v1) registers, which are for storing arguments and the return values (if it has any) of the subroutine.
The frame pointer ($fp), which is used to store the start of the stack frame (the starting value of the stack pointer when the subroutine begins).

There are a great many best (or even mandatory) practices you should follow while writing subroutines. Here are the most important ones:

You should store $fp in another register and store the current value of $sp in $fp before using the stack in any capacity.
Do not modify Saved Temporaries ($s0 - $s7). Only modify temporary registers. If you need more room, use the stack.
When using the stack (which you will always be doing if you're writing your code correctly), you must ensure that the stack pointer is restored to value it held before your function began. The frame pointer is useful for this.
When calling another subroutine, all registers with any important data in them (data you will use again) must be stored in the stack. Otherwise, assume this data will be lost. This includes $ra, $fp, any argument and value registers, etc.

Using the above best practices and building blocks, you can build programs with hundreds of subroutines that never conflict with each other.
If you need an example, ping me in the CSE 230 server. A full example would be too long to include here.

Note: While Indela seems to use the words "procedure" and "function" interchangeably, they don't actually mean the same thing. Procedures and functions are both two different types of subroutine: procedures are subroutines that return nothing, and a function is a subroutine that returns a value. This is the only distinction between the two: whether or not it returns a value.

Pseudo-Instructions

MIPS has a very limited instruction set due to it's RISC design, which aims to reduce the instruction set as much as possible. To compensate for this, compilers for the MIPS architecture often offer pseudo-instructions, which allow for certain tasks that do not have a hardware implementation to be performed using one "instruction". For example, MIPS lacks a move instruction, instead providing that fuctionality using the add instruction. Because of this, a move pseudo-instruction exists, which is converted by the assembler in to the previously mentioned add instruction.
The assembler temporary register, $at, is used for any temporary storage required for the implementation of the pseudo-instruction.
For a more in-depth example, consider the blt pseudo-instruction:
# the original pseudo-instruction:
blt $s3, $t0, label

# will be turned in to the regular instructions:
slt $at, $s3, $t0     # sets $at to 1 if $s3 is less than $t0, 0 if it isn't
bne $at, $zero, label # jumps to the specified label if $at isn't 0
Accessing Memory

Instructions

There are two types of instructions regarding memory: storage instructions and loading instructions. There are one of each for all 3 supported data sizes, as well as some unsigned and immediate variants for loading.
Note that the signed instructions extend the signs (so that they can be used for signed two's complement addition and subtraction, which requires the entire word) to take up the whole word. This means if you wish to copy a byte or half while ensuring that values
For loading (performed using <instruction> <$destination>, <memory location/constant>):

lb loads one byte of memory from the specified location.
lh loads one half (two bytes).
lw loads one word (four bytes).
lbu loads one unsigned byte.
lhu loads one unsigned half.
lui loads a constant half value into the upper half of a register. The remaining bits are set to 0.
li loads a constant word value.
la loads the address of a specified label.


Note:
la and li aren't available in the Zylabs simulator.  While I can't imagine we'll need to load any addresses from our program any time soon, the lack of li is quite inconvient. Instead of using li, I recommend you use addi, which is supported, instead, like this: addi <$destination> $zero <constant value>.
You could also use lui and ori to load the upper and lower half respectively, in the event you need to load more than 16 bits.

For storing (performed using <instruction> <$source>, <memory location>):

sb stores the last byte of a register into the specified location.
sh stores the last half (two bytes) of a register.
sw stores the whole register.

Offsets

The instructions for storing and loading data use a special syntax in order to specify the memory location to load from/store to. This syntax allows (or rather forces) you to provide an offset, positive or negative, to be applied to the specified register without modifing the value itself. The offest you specify can be any signed 16-bit whole number.
The syntax is simple: you simply provide the offset, and then a register surrounded by parentheses. For example:
# these are all valid offsets
0($s4), -4($s2), 4($t0), 1($at), 2048($a0)

# this isn't a valid offset because only signed 16-bit numbers are supported
312830($s0)

Note:
All instructions that take a memory location must be provided with an offset. You cannot use a plain register. For example, lh $t0, $s3 is not valid. Instead, use lh $t0, 0($s3), which does not change the position in memory.

Sizes

There are 3 important sizes to consider: word, half, and byte. A word is 4 bytes, a half is 2 bytes, and a byte is 8 bits (individual 1s and 0s). Note that all of these types correspond directly with specific types in C: an int is 1 word, a short is 1 half, and a char is 1 byte.

Note:
The size of a word can vary on other CPU architectures. When you hear a CPU referred to as "64-bit", it means that a word is 64 bits (8 bytes) long and a half is 32 bits (4 bytes) long. When a CPU is referred to as "32-bit", a word is 32 bits (4 bytes) long and a half is 16 bits (2 bytes) long. The architecture we use in class is 32-bit, meaning a word is 32 bits (4 bytes long).
The size of most C types does not change based on the CPU's word length, except for pointers, which are always 1 word long.

The Stack

The stack is the section of memory used for storing fixed-size variables. All stack operations are performed using the stack pointer, $sp.

To allocate memory on the stack (which you must do in order to store anything using stack memory), you can subtract the required amount of memory (in bytes) from the stack pointer. For example: addi $sp, $sp, -4 would allocate 1 word of stack memory.
To deallocate memory on the stack, simply add the amount you used back to the stack pointer: addi $sp, $sp, 4.
To read and write to the stack, simply use the normal instructions with the stack pointer.


Note:
There are several considerations to be made when utilizing the stack heavily:

Stack storage space is not infinite - though you almost certainly will never run out of it in this class. We won't go in to much depth on this, because the size of the stack depends on how you write your program.
Never assume that new allocations on the stack are empty (all 0s). Old (deallocated) data is not necessarily removed from memory.
As with C programming, for every allocation there must be a deallocation - and there is no errors to tell you whether or not you missed one. Keep your code easily readable, and comment often - this is especially important in assembly.


Bitwise Logical Operations

Bitwise logical operations simply apply simple logical operations to each bit (hence the name bitwise, which means with respect to each bit). For example, performing a bitwise OR operation on 1100 and 0101 would result in 1101:


Left
Right
Result


1
0
1


1
1
1


0
0
0


0
1
1


The MIPS instruction set makes available the following bitwise instructions:

and <$destination> <$left> <$right>
andi <$destination> <$left> <constant value right>
or <$destination> <$left> <$right>
ori <$destination> <$left> <constant value right>
nor <$destination> <$left> <$right>

Instructions ending in i are immediate instructions that allow for you to hard-code a 16-bit value.

Note:
All of the above operations are commutative - it does not matter which value is left or right.

Left Shift and Right Shift

The MIPS instruction set also includes left shift and right shift instructions, which allow you move bytes left and right. For example, 10010100 11010010 01101010 01101010 shifted to the left 3 bits results in 10100110 10010011 01010011 01010000.
Notice that the extra bits at the beginning are lost, and the remaning bits are now padded with 0. The right shift operation does the same thing, but in reverse: 10010100 11010010 01101010 01101010 shifted to the right 3 bits results in 00010010 10011010 01001101 01001101.
The instructions for performing these operations are sll (shift left logical), and srl (shift right logical). They both use the same operands: sxl <$destination> <$source> <constant amount>.
lo and hi

lo and hi are used to store values from instructions that produce more than 1 word of results - specifically mult, div, and divu.
For mult, which produces a 64-bit (2 word) result, hi is set to the upper 32 bits, and lo is set to the lower 32 bits. For div and divu, hi stores the remainder, and lo stores the quotient.
The values of hi and lo can be accessed using the mfhi <$destination> and mflo <$destination> instructions.
C

Hexadecimal and Binary Numbers

In C, you can represent hexadecimal numbers as 0x1234ABCD, and binary numbers as 0b10101010. Used in code, it looks like this:
int decimal = 28;
int hex = 0x1C;
int binary = 0b11100;
Pointers

Pointers are variables that point to specific locations in memory. A pointer stores a memory address (the location in memory), and can be de-referenced to read and manipulate that memory.
A pointer can be declared and used like so:
// the * is what denotes the variable is a pointer
int* pointer;
// these are also valid
int * pointer;
int *pointer;

// you can retrieve the address (location in memory) of a variable using the and operator
int a = 4; // a is stored at 0x4000
pointer = &a; // pointer = 0x4000, **not 4!!!**
Pointers can be de-referenced, allowing you to directly access the data stored at the location:
int a = 4;
int* pointer = &a;

// the same * character is used to de-reference a pointer
*pointer = 10; // a is now 10 - the pointer itself doesn't change, just the value it points to
int b = *pointer + a; // since *pointer is pointing to a, this is equivalent to doing a + a, so the value is 20
A Note on Arrays

While I'm sure you're familliar with the concept of arrays by this point in your education, arrays in C work very differently to arrays in Python or Java. Like Java, they are a fixed size, however unlike most modern programming langagues, arrays are simply managed pointers.
This means that when you see array[4], you are actually performing pointer arithmetic. This is extremely important for this class, as it means that when you see things like int A[6], $s1 = A and A[4], they are directly translatable to assembly. For that specific example, the resulting assembly for A[4] would be 16($s1), since an int is 1 word (4 bytes) long, resulting in a 16 byte offset to our initial memory location $s1.
Appendix

Definitions

ALU

The ALU (Arithemtic Logic Unit) is the part of the CPU that performs basic mathematical operations, like addition and subtraction, as well as logical operations like AND and OR.
Bus

A bus controls data flow between various components. For example, the memory bus controls reads and writes to and from the system's memory. Without a bus, the communicating circuits would be required to directly interface with each other's control signals - busses simplify this process.
Register

A register is a small amount of memory that contains a value the CPU is actively working with. Most modern CPUs have many of these, with MIPS having 32 and ARM processors (generally) having up to 43.
Other Resources


University of the Pacific provides a more detailed (but still incomplete) list of instructions than the Green Sheet, available here.
The MIPS Green Sheet itself is available on our Canvas, but it can also be found here.
Location	Contents
4000	`00010010 (0x12)`
4001	`00110100 (0x34)`
4002	`10101011 (0xAB)`
4003	`11001101 (0xCD)`