aaomidi/Midterm.md

## Midterm.md

      
    Raw
  

              Midterm.md
            
          
    CS370

Midterm

Question 1

Suppose as a programmer you have three options for providing mutual exclusion: disabling interrupts, a spin lock, and semaphores.  What considerations would go into your decision for choosing one over the others?

Interrupt Control:

It simply does not allow the OS to preempt processes. If we're given one choice of these to provide mutex, this would, simply, be the worst idea possible. It could allow any process take over the entire CPU. For safety and santiy's sake this should be avoided.

Spin Lock:

This lock simply causes a process to wait until a lock can be acquired. This type of lock causes an issue called busy waiting. Busy waiting is fine for kernel processes and for processes that don't plan to lock for too long. This type of lock is still problematic to allow user level programs to use because of the potential issues with busy waiting and just simply the wasting of processor time. This problem can be mitigated if the waiting was accompanied by a yielding to the CPU but we can't assume that all developers are going to use this.

Semaphores:

Sempahores provides a functionality more than just mutex. They can allow a certain number of threads to access a piece of memory at the same time. This type of locking does not create busy waiting and is suitable for user processes.
Conclusion:
It really depends on the type of system I'm designing. Am I designing a time critical OS? For example am I designing an OS for a smart car then I could potentially use Interrupt Control since it's the fastest of all of these.
Am I writing a kernel and know exactly how long in worse case scenerios a lock can take and this is a multi processor system? Then I'll use a spin lock.
Am I writing a consumer OS? Then definitely a semaphore.
Or you could, you know, like some companies cough cough mojang follow this policy: https://i.imgur.com/bMu28Un.jpg (This isn't strictly true, but memes are necessary for all activities)
Question 2

Inferno uses the x86 XCHG instruction to do mutual exclusion locking in the same manner as the test-and-set-locked instruction.  Does this imply that Inferno does busy waiting?
Well it does imply that, but it actually isn't. Inferno-os takes a rather interesting approch to locking. They actually have a wrapping function around their locks that automatically checks and does a osyield. This means that the lock will not do busy waiting and will not waste CPU cycles doing nothing.
So to answer the question, it does imply that but in practice they've thought about it and using a wrapper function solved the issue. It is also note worthy that if they had not done this busy waiting would've destroyed their single threaded architecture.
Question 3

In Inferno the isched.runhd and isched.runtl members describe a list of processes.  A process is in this list if and only if it is in the Pready state.  In a sense, this is redundant.  Why would the designers create this separate list when they could run through the list of all processes only considering those in the Pread state?
It is more of a design choice rather than redundancy. These references that are stored are merely just more references stored in the system memory. These two lists aren't copies of these processes, they're simply both pointers pointing towards where the actual process is in memory.
There are two reasons having two lists is a good idea:


It's a tiny bit faster to just pick a process from that list and run it rather than looping over the entire list of processes.


When scheduling a new process, we're guaranteed to put it at the end of the list. A process can't just in the middle of the loop pop up and tell the user that they want to be put into the ready state. So that means the ready list can loop without worrying about concurrent data modification problems (It doesn't have to lock the main process list). And It doesn't have to worry about other processes not being scheduled fairlly.


Question 4

Describe how a copy of the machine registers make their way into the process table entry for a process that's being preempted.
The "xecution" of a program in happens inside libinterep/xec.c. Before the program starts to be executed, we restore the state of the virtual machine running the code by copying the registers of the program to a variable inside xec.c. This variable handles all the register types defined in include/interp.c. These registers stimulate a set of registers on a CPU.
Once the proces gets preempted, it copies back the registers from xec to the Prog structure of the process and saves it in there. This way any other virtual CPU can take the program and run it from where it was left.
Question 5

In what way is the Shortest Job First scheduling algorithm optimal?
Shortest job first decreases the average turnaround time for scheduling. For example if we have tasks scheduled with the following order:
A - 5 ms
B - 100 ms
C - 10 ms
If we run these in this order, the average turn around time would be:
(5 + (5 + 100) + (5 + 100 + 10)) / 3 = 75 ms
If we use shortest job first schedling, the average turn around time would be:
5 + (5 + 10) + (5 + 10 + 100)) / 3 = 45 ms
A decrease of 40%. In this way it would be optimal by decreasing the average amount each job needs to wait to be completed.
Question 6

In the mid-80s, AT&T sold a small machine manufactured by Convergent Technology called variously the UNIX PC, the 3B1 and the 7300. This machine used the Motorola 68010 CPU chip. This chip had 32-bit registers which could be used for data and for virtual addresses. It had a 16-bit data bus and a 24-bit address bus between the CPU and the MMU. Only the lower one-fourth of the memory space specified by the address bus was page-mapped memory. (Keep in mind that even though we may be able to generate 32-bit addresses in software, the CPU cannot address that much memory.) The addressable space on the other side of the MMU is the same size as the spaced managed by it.
How big is the memory space addressable by the CPU?
So lets look carefully at the problem statement: "The addressable space on the other side of the MMU is the same size as the spaced managed by it." Which means there are no page tables, each address references a 16-bit words (2 bytes).
This scheme therefore allows 2^24 * (16 bits) = 2^25 bytes of memory to be addressed by the MMU.
The next interesting part of the problem is "Only the lower one-fourth of the memory space specified by the address bus was page-mapped memory."
The thing we need to be careful about this part is what exactly are we quartering.
Addressable by the CPU 2^24 = 16 MB
Usable by the MMU: 2^25 / 4 = 2^23 = 8 MB
Question 7

How big is the space that is mapped by the paged memory management unit?
As mentioned in the question before, the memory that is mapped by the MMU is 8 MB.
Although the architecture of this CPU is a little weird. I suspect that the CPU actually has an 8 bit resolution (data bus) instead of a 16 bit resolution, and the other 8 bits are used for "lookahead/prefetching". The reason I get this sense is that the memory addressable by the CPU is 16 MB and only 1/4th of that is addressable by the MMU which would be 4 MB.
So I'm going to go with the final answer of 4 MB mapped by MMU. And the data resolution is actually 8 bits with 8 bits for "lookahead/prefetching".
Question 8

If each page holds 4096 bytes, show what the address coming out of the CPU looks like for addresses which are mapped into the paged memory space. Show which bits are used for the page number, which are used for the page offset and which have a fixed value.
Lets do a quick conversion:
4096 bytes = 2^12 bytes (therefor k = 12)
So from the most to least significant bits:

8 bits ignored
2 bits fixed
10 bits pageID
12 bits page offset

However since we're talking about the addresses coming out of the CPU the 8 bits ignored wouldn't even be there. So:

2 bits fixed (To ensure 4 MB MMU mapped)
10 bits pageID
12 bits page offset

Question 9

How many bits in the page table entry (PTE) are used for the page frame number (PFN)?
Okay this question took a little longer for me to realize but it would simply be equal to the pageID. So 10 bits.
Question 10

If the PTE has two bits to record all of the page presence, the accessed information and the dirty information, how many bits does the PTE likely contain? (By likely, think about how many bits would make a natural size for storage and programming.) How many bits does that leave unused (actually reserved for future use)?
We know from question 9 that the Page Frame Number is 10 bits. Adding 2 more bits onto that for the aforementioned information (although I'd argue you'd need at least 3 bits for that), you'd get 12 bits.
Programming is all based on powers of two. It'd be best to make this 16 bits. RAM is bought and programmed with the assumption it is a power of 2.
So the PTE would contain 16 bits, saving 4 bits for future.
Question 11

How many entries does the page table have?
From question 8 we know that each page is 2^12 bytes and that we have 2^22 bytes of physical memory space. Therefore we would have 2^10 pages. For each page we need a PTE in this scheme. So the table contains 2^10 PTEs.
Quesiton 12

If each access to the page table takes 200nS and the whole page table has to be written for each context switch how long does that part of the context switch take?
This question is a little unclear on how many accesses it takes to re-write an entire page table. So I'm going to assume it needs to access it once per PTE.
From question 11 we know we have 2^10 PTEs. Accessing each is going to take 200 nS so:
2^10 * 200ns = 204800 ns or about .2 milliseconds.
Question 13

What changes would you have to make to Inferno in order to implement multiple queues for scheduling. Approach this in terms of the type of multiple queues used in CTSS. Be sure to address issues of both data structures and algorithms. Any mix of code, pseudo-code or English is acceptable, but keep in mind that your objective is to give enough detail that someone could take your answer and implement what you've designed.
In CTSS the "higher" priority actually is at a lower ID. I'm going to make a small change in my explanation with the "higher" priority being a larger ID for the priority. It seems much more intuitive this way.
To do this I'd rip out the entire scheduling code and replace it with this:
prty structure as a doubly linked list inside Prog replaces runhd and runtl of inferno-os as prtyhd prtytl.
prty contains the following:

int level
prog runhd
prog runtl

there is a static variable of MAX_PRTY which contains the max level.
Add a time_t 'ts' to Prog to record the last unixtime this process was ran.
Periodically scan ts and if there is any ts older than a configured amount of nanoseconds, then increase the priority of that. This could be done alongside GC.
How to increase and decrease priority:
Increasing:
If the process is currently on Priority X: If X != MAX_PRTY then -> Search the doubly linked list find the priority where level is X and X+1. Remove the process from current priority by removing the process from the singly linked structure of runhd and runtl and putting it at the start (not tail to make it more like CTSS) of the list there which is simply changing runhd and then using the old runhd and pointing the process' next to that.
Decreasing:
If the process is currently on Priority X: IF X != 0 -> Do the above but find X-1 instead of X.
Once a process is added to the list, send a signal to preempt any process running in that specific priority. If a process is preempted using this signal, then start the queue at the beginning of this list to emulate the behavior of CTSS.
Depending on the priority, a multiplier is added to the quanta. If its priority 4, think of it as (4+1) * base_quanta. (CTSS is much more involved with the math, but this would achieve the same goals)
Every time a process completes its quanta it's priority is decreased using the algorithm above.
When a process is being executed, it's time_t is updated according to the current system time.
Question 14

The Control Data Corporation 6000 series computers had a novel design.  They had one or two main processors which were used just for numerical computation and several special processors called Peripheral and Control Processors (PCPs).  All of these processors shared the same memory space. (For the purposes of this question, focus on a single main processor and a single PCP.)  One of the instructions on the PCP was the Exchange Jump (EXN).  This instruction interrupted the main processor passing the value of the A register which was the address of a block of memory containing all the register values to load including the program counter.  The main processor then swapped all of the current register values with those in the memory block before continuing after the interrupt.
Describe in pseudo-code how a context switch is performed by the PCP on a CDC 6000 series computer.
The instruction would send a short (length in bits) interrupt to main CPU hardware.
How it would look like if implemented.
static REG *MAIN_REG;
PROG *running;

handleREG(REG *mem) {
	// save state
	running->A = MAIN_REG->A; // Think of it like an array copy since A is going to be an array.
	running->B = MAIN_REG->B;
	// etc
	running->PC = MAIN_REG->PC

	// load state
	MAIN_REG->A = mem->A;
	MAIN_REG->B = mem->B;
	// etc
	MAIN_REG->PC = mem->PC;
}

// instruction receive
handleEXN() {
	void **payload = MAIN_REG->A; // Obtain the pointer to the memory from here.
	handleREG(&payload); // Get the registry pointer at this memory location.
}

How would it look like if we're using it:
// At some point, a future state is stored in memory; this could be as simple as a new instruction pointer.
// Registers may vary in size, so these aren't fixed-size offsets.
A[0] = r0
A[1] = r1
A[2] = r2
etc

// When the PCP is ready and sure that the A register has the address to the stored registers. It executes:
EXN


// The main CPU does its own stuff with the program...


// When the main CPU stops working:
EXN


// This means we have to swap the registers and memory.
// This is likely handled by hardware using a small cache in the CPU:
swap(memory, register):
  cache = register;
  register = memory;
  memory = cache;

swap(A[0], r0)
swap(A[1], r1)
swap(A[2], r2)
...


// If we're only ever swapping between two tasks, we can now simply go back and forth:
Task 1
EXN -> swap
Task 2
EXN -> swap
Task 1
EXM -> swap
Task 2
...


// If we have more than two tasks, we can rotate between them:
i = 0
queue[4] = array of tasks, where each task holds registers

next():
  i = (i + 1) & 3
  A = queue[i]
  EXN

Task 1
next()
Task 2
next()
Task 3
next()
Task 4
next()
Task 1
next()
Task 2
...