imduffy15/OS-Tutorials.md

## OS-Tutorials.md

      
    Raw
  

              OS-Tutorials.md
            
          
    Week 1

An OS does basically two things. What are they?

Software that provides a programmer friendly interface between application programs and the hardware by providing a virtual environment to applications.
Software that handles resource requests from application programs and prevents applications from trampling each other.
Describe the two interfaces to an operating system

Hardware interface - The operating system provides an interface for device drivers to carry out I/O tasks on our behalf.
Software interface - The operating system provides userland space for user applications.
What is the difference between a trap and an interrupt

A trap is generated by userland applications when an exception is thrown. Examples include division by zero, invalid memory access or system calls.
An interrupt is generated by hardware for example when the hardware wants the attention of the CPU. Example, key being pushed on a keyboard.
Name two sources of traps


System calls
Invalid memory access
Devision by zero

What is the difference between code executing in userland and in the kernel?

Userland code is executed in unprivileged mode.
Kernel code is executed in privileged mode.
The difference between these modes is something that the hardware understands and enforces.
The OS runs in privileged mode, it can interact with hardware directly and can manipulate hardware abstractions.
User programs run in unprivileged mode. They cannot interact with hardware directly and see the machine as a virtual machine.
What is a process

A process is a program that's in execution. A process includes the code, the processes state and the memory it can access.
What is a PCB? how big is it?

Process control block.
The OS must know specific information about processes in order manage, control and to implement the process model. The OS maintains a table called the process table, with one entry per process. These entries are called PCBs. They contain information about the process' state, its program counter, stack pointer, memory application, the state of its open files, scheduling information and everything else that must be saved when the process is switched from ready to blocked state so that it can be restarted later as if it had never been stopped.
The size of the PCB is 1.7kb.
What are the two abstractions a process implements?


Exclusive CPU
Process address space

A process thinks it is the only thing running on a computer. It believes that it has full access to the CPU and complete memory access.
Draw a diagram depicting the structure of a linux process.

        +----------------------------------+
        |                                  |
        |               KERNAL             |
        |                                  |
        +----------------------------------+
        |               STACK              | 
        +----------------------------------+ |
        |                                  | v
        |                                  |
        +----------------------------------+ ^
        |                                  | |
        |                                  | |
        |               HEAP               |
        |                                  |
        |                                  |
        +----------------------------------+
        |                                  |
        |               DATA               |
        |                                  |
        +----------------------------------+
        |               TEXT               |
        +----------------------------------+

Describe the role and contents of each of a process's major sections.

Stack - Function parameters and local variables.
Heap - Dynamically loaded memory.
Data - Static variables.
Text - Machine executable functions.
What lies above the stack, what range of addresses does it occupy?

The heap
How are processes created?

Processes are created by first forking another process and then calling exec on that newly cloned process. Exec replaces the current process with the new process.
How can we tell the parent and child apart.

By the PID returned by fork. If it is the child process this will be 0 if it is the parent process it will be the ID of the child.
After creation what does a child process often immediately do?

Executes the exec command and replaces itself with the new process.
If a child is a duplicate of its parent can we be clever with the duplication?

Yes, only duplicate process memory if the process changes something from its parent. (Copy on write)
How does a process terminate? What does it hand back to its creator?

A proces can exit by:

Calling exit.
Returning from main
Calling abort
Receiving a signal from its parent

After exiting the process returns an exit status to the parent process.
When is it safe for the OS to reuse a PID? Where is the exit status stored?

When the original process and all its children have terminated.
The exit status is stored within the PCB of its parent.
Who collects the process's exit status? What if no one collects it?

Its parent.
If the parent doesn't exist the process is adopted by init and it collects its return status.
What happens if a parent process dies before its children

It goes into a zombie state and waits for them to exit.
Draw a diagram that depicts the process life-cycle


A process sleeps in one of two states. What are they? What's the difference?

Interruptible - Responds to signals.
Uninterruptible - Does not respond to signals.
What is copy on write? Where and when is it applied?

Copy on write is where a child process only copies its parents memory when it writes to it.
Why is a child process run before its parent?

This is done in order to maximise the effectiveness of copy on write.
If the parent was run before it's child and modified all pages it would have to create a separate copy for each then when the child process executes and immediately uses exec to replace itself with a new process, all copying becomes unnecessary.
What happens if we hit CTRL-C while a process is running?

CTRL+C will send a SIGINT to the process running.
How many different kind of signals are there? Name some of them?

There are 31 signals.

SIGHUP
SIGINT
SIGALRM
SIGUSR1
SIGUSR2

What is the default action for most signals?

Terminate the process and dump its core.
What can a process do to override the default action?

Mask it with sigprocmask
Why are signals important to programmers?

Signals are useful to programmers because they help avoid the issue of busy waiting, by sleeping until a signal is received the process avoids the need to loop and repeatedly check to see if the event has occurred.
How is a signal delivered to a process? When does the handler run?

A signal is sent to a process by the kernel marking it as pending in its PCB.
When the process next returns from the kernel the signal handler for the pending signal will run.
How is a handler installed


sigaction
sigset

How are signals blocked?

sigprocmask
Why are signals blocked while a handler runs

To avoid deadlocking and infinite signal calls.
Explain the behaviour of the following program

static void handler(int signo) {
 printf("Wake-up!\n"); //print wakeup
 alarm(5); // set alarm for 5 ms now
}

int main() {
 sigset_t set; // initialise the set
 
 sigemptyset(&set); // empty the set
 
 sigset(SIGALRM, handler) // set a handler for sigalrm
 
 alarm(5); // set alarm for 5 ms from now.
 
 while(1) {
  sigsuspend(&set); // Release blocked signals and wait for interrupt.
 }
 
 return 0;
}
Program will print "Wake-up!" every 5 ms due to the "alarms" being fired.
Week 2

Describe the advantages of multithreaded over a single-threaded process


Access to multiple cores
Threads execute in parallel
Design by module
Other thread can run should another one be blocked

Draw a diagram depicting the organisation of a multithreaded process.

+----------------------------------+
|                                  |
|               KERNAL             |
|                                  |
+----------------------------------+
|               STACK              | |
+----------------------------------+ |
|               STACK              | v
+----------------------------------+ |
|                                  | v
|                                  |
+----------------------------------+ ^
|                                  | |
|                                  | |
|               HEAP               |
|                                  |
|                                  |
+----------------------------------+
|                                  |
|               DATA               |
|                                  |
+----------------------------------+
|               TEXT               |
+----------------------------------+

Where can thread support be implemented?

Userspace using libraries.
Kernel space using system calls
What are the advantages of userspace thread support


More efficient, kernel is not involved.
Less system calls are made.

What are the disadvantages of userspace thread support?


Single thread can block whole process
Programmer has to schedule threads

What is another name for userspace thread support?

n to 1
Under userspace thread support, a thread switch will never be initiated by ________

kernel
What is another name for kernel space thread support?

1 to 1
What are the advantages of kernel space thread support?


Kernel can see threads
Single thread does not block whole process

What are the disadvantages of kernel space thread support


Lots of system calls

Combining userspace and kernel space thread support gives ________ thread support

hybrid
What is another name of the approach in the previous question?

m to n model
Several user level threads associated with a single kernel thread are _____ threads.

unbound threads
A single user thread assigned its own personal kernel thread is a ______ thread.

bound thread
What is unusual about Linux's approach to implementing thread support?

Linux sees them as processes sharing address space
What are the advantages and disadvantages to linux's approach?

Advantages:

Simplistic
No new data structures

Disadvantages:

There may of been an efficiency gain in implementing explicit thread support.

What system call is used to create a thread in linux?

clone
What are pthreads? When did they appear? Is there an alternative?

pthread is linux's implementation of the posix thread.
They first appeared in 1996.
Yes, there is windows implementation of it, winthread.
Explain the elements of the following prototypes:

int pthread_create(pthread_t *thread,
const pthread_attr_t *attr,
const *(*start_routine)(void *),
void arg*);

void pthread_exit(void *value_ptr);

Pointer to thread
Thread attributes such as stack size, priority, detached/non-detached.
handler/function name
handler/function arguments

What is wrong with this code?

typedef struct {int a, b; } pair_t;

void * add(void *p_in) {
 pair_t *p = (pair_t *) p_in;
 printf("Answer is: %d\n", p->a + p->b);
}

void adder(int x, int y) {
   pthread_t t;
   pair_t p;
   
   p.a = x;
   p.b = y;
   pthread(&t, NULL, add (void *) &p);
}
What are the options for making arguments available to a thread?

Global data or pass a struct containing the data.
How does pthread voluntarily terminate?

pthread_exit
How do we wait for a pthread to terminate?

pthread_join
We have seen zombie processes. Can we have zombie threads?

Yes they exist if the thread hasn't exited cleanly.
What are the options for dealing with zombie threads?

Detach it.
What are thread attributes? What is going on in this code

pthread_t t;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_setstacksize(&attr, 20 * 1024 * 1024);
pthread_attr_setdetached_state(&attr, PTHREAD_CREATE_DEATACHED);
pthread_create(&t, &attr, startroute, arg);
We create a detached thread with a custom stack size.
What is going on in this code?

void *foo(void *p) {
 pthread_mutex_lock(&mutex);
 // Update p
 pthread_mutex_unlock(&mutex);
 return 0;
}

pthread_mutex_t mutex;
pthread_mutex_init(&mutex, NULL);
pthread_create(&t1, NULL, foo, (void *) n);
pthread_create(&t2, NULL foo, (void *) n);
We create two threads that modified shared data based on a mutex becoming available.
Week 3

What do POSIX condition variables allow our threads to do

Condition variables allows a thread to wait for something to become available.
How do we wait on on a condition variable?

pthread_cond_wait
What are the parameters to pthread_cond_wait

first - condition waiting on
second - mutex
What do condition variables make explicit that java's synchronisation approach makes implicit?

The condition that is being waited on.
How do we wake up a thread waiting on a condition variable?

pthread_cond_signal
Whats going on in this code:

pthread_mutex_lock(&mutex);

while(occupied == size) {
 pthread_cond_wait(&room, &mutex);
}

/* insert data */

pthread_cond_signal(&data);
pthread_mutex_unlock(&mutex);

It waits on a mutex to become available. When the mutex is available it inserts data, sends a signal off to notify other threads that data has been added and it unlocks the mutex.
Draw a diagram depicting how a thread, mutex and condition variable interact


Why is it imperative that the code above uses while rather than if?

To avoid a spontaneous wake up
What is a spontaneous wake up?

A spontaneous wake up is when a thread returns from pthread_cond_wait without any thread having called pthread_cond_signal or pthread_cond_broadcast.
Being woken only serves as a hint that the condition being waited on might have changed.
Being woken serves only as a ____ that the condition waited on has changed

hint
How do we terminate a thread

pthread_cancel to request termination from main
pthread_exit to exit from the thread itself
A cancellation request is sent using ___ and the recipient marked with a ___

pthread_cancel and pending cancel
Why can't we simply terminate a thread immediately when a cancellation request is issued?


It might be in the process of changing shared data
It might hold some lock

The effect of a pending cancel depends on the ___ of the target thread

Cancellation state
The cancellation state of a thread can be ___ or ___

Enabled or Disabled
If the cancellation state is enabled the effect of the pending cancel depends on whether the ___ is ___ or ___

Cancel type is async or deferred
If the cancel type is ___ the cancel is acted upon immediately and any ___ are executed and the thread terminates.


async
clean up handlers

If the cancel type is ___ the cancel is not acted upon until next ___ is reached


deferred
cancellation point

A thread sets its cancellation state with ___

pthread_setcancelstate
Which functions are cancellation points?

pthread_cond_wait
Why is it important you know which functions are cancellation points

If we do not know where cancellation points exist it is possible that we may end up never realising a mutex.
Since pthread_cond_wait is a cancellation point what can go wrong here

pthread_mutex_lock(&mutex);
while(should_wait) {
 pthread_cond_wait(&cv, &mutex);
}
pthread_mutex_unlock(&mutex);

The thread might cancel and the mutex will still be locked.
Explain what is going on in the code below:

pthread_mutex_lock(&mutex);
pthread_cleanup_push(pthread_mutex_unlock, &mutex);
while(should_wait) {
 pthread_cond_wait(&cv, &mutex);
}
pthread_mutex_unlock(&mutex);
pthread_cleanup_pop(1);

We register handler to unlock the mutex after a cancellation point has been reached.
What must be done with errno so it works in a multithread process?

Each thread should be given its own errno.
Some libraries were originally written under the assumption that they would only ever be called from a single-threaded process. Their functions were not ___

thread-safe
What is the difference between gethostbyname and gethostbyname_r

gethostbyname - not thread safe.
gethostnamebyname_r - is thread safe.
What is wrong with this code:

computation_state_t state;

void handler(int signal) {
 display(&state);
}

void procedure(void) {
 sigset(SIGINT, handler);
 
 while(true) {
  update_state(&state);
 }
}

The state could be corrupted. update_state is not async-safe.
Will a POSIX mutex solve the above problem

No this will result in a deadlock.
Functions that can be safely called in signal handlers are ___

async signal safe.
___ functions are async-signal-safe

functions ending with _r are async-signal-safe
How can we solve the problem without the state being corrupted?

Use threads.
Does a process block signals or do threads do so individually? if we wend a signal to a multithreaded process which thread receives the signal?

A process blocks signals.
The thread that gets the signal is chosen randomly from the set that does not have the signal blocked.
How have we solved the problem here?

computation_state_t state;
sigset_t set;

int main () {
 pthread_t thread;
 sigemptyset(&set);
 sigaddset(&set, SIGINT);
 pthread_sigmask(SIG_BLOCK, &set, 0);
 pthread_create(&thread, 0 , monitor, 0);
 long_running_procedure();
}

void long_running_procedure() {
 while(true) {
  pthread_mutex_lock(&mutex);
  update_state(&state);
  pthread_mutex_unlock(&mutex);
 }
}

void *monitor() {
 int sig;
 
 while(true) {
  sigwait(&set, sig);
  pthread_mutex_lock(&mutex);
  display(&state);
  pthread_mutex_unlock(&mutex);
 }
 
 return NULL;
}

We solved the problem here by using a threads.
We updated our state inside a thread and we wrap a mutex around it.
Now even if our function wasn't thread-safe it will still safely update the state because it must acquire the mutex before it can do so.
Do thread-safe and reentrant mean the same thing? which is stricter?

They do not mean the same thing. Reentrant is stricter.
Week 4

I/O devices are generally classified as either ___ devices or ____ devices.


Block Devices
Character Devices

What are the differences between these two device classes? Give examples of each class

A block device stores information in fixed-sized blocks, each within its own address and transfers are in terms of one or more blocks.
Hard disks, CD-ROMs and USB sticks are blocked devices.
A character devices delivers or accepts a stream of bytes, without regard to any block structure.
Printers, modems and mice are all character devices
What is a device controller? Name a controller than can control a class of devices?

Controller is hardware or a firmware that controls the operating of a hardware device. An example is a SCSI controller.
What does a device driver talk to? Do we need a different driver for every single device?

It talks to the device controller.
We do not need a different driver for every single device.
What is a bus? Name some different types of bus.

A bus is a set of wires and a well-defined protocol for sending a set of messages across those wires.
This includes things such as:

PCI
USB
SCSI
IDE

Draw a diagram illustrating the relation between CPU, memory buses and devices

-- come back to this --

What is memory mapped I/O

Instead of having special methods for accessing the values to be read or written, just get them from memory or put them to memory.
The device is connected directly to certain main memory locations
What is DMA? What is a DMA Controller? How many DMA controllers are there?

DMA, direct memory access allows I/O requests to be processed without tying up the CPU.
A DMA controller is hardware that allows for DMA. It may be for one device or shared with multiple.
It varies based on the DMA configuration.
What happened when an interrupt arrives?

The CPU stops what its doing and uses the device number to index an interrupt vector which leads it to the interrupt service routine for that interrupt.
What is programmed I/O what is the problem with it?

Programmed I/O is where the CPU does all the work to make an I/O request.
The process stays on the CPU until all data is received. This results in a busy wait.
What is interrupt driver I/O?

Interrupt I/O allows the CPU to do something useful while data is being received.
The process gives up the CPU, sits on a queue and waits for its I/O request to be completed. When the I/O operation is completed an interrupted will be generated which makes the process as run-able again.
Draw a diagram depicting the various layers involved in performing I/O

- come back to this -

Since all devices are different, all drivers must be different. How does the OS use them?

The operating exposes a well-defined driver interface which third parties use to write their drivers.
If drivers did not implement the same interface what would be required every time a new device was to be supported

Recompiling the kernel
In addition to looking the same to the OS, Unix devices look the same on the outside too. What do they look like? What is major vs. minor device number?

In unix all devices look like files.
A devices file inode contains a major device number and a minor device number. The major number selects a driver and the minor number identifies a particular device.
Given a stream of characters arriving from modem if we block on each character, what's the problem?

We are repeatedly blocking/unblocking a progress for a single character. It is inefficient.
What about allocating a buffer in userland, kernel fills it and interrupts process when full?

There is a possibility that newly arriving characters will overwrite earlier ones
What about doing buffering in the kernel

Same problem again but it is delayed.
What is double-buffering? how does it solve the problem?

While one buffer is full and copied to userspace another is being filled with arriving data.
By the time the second buffer is full the first has been emptied and can start accepting data again.
A disk drive consists of a number of ___ each with one or two recordable ___

platters
surface
Each surface is divided into a number of concentric ___

tracks
Each track is made up of a number of equal sized ___

sectors
Do all tracks contain the same number of sectors?

Not on modern disks
The set of tracks selected by the ___ forms a ___

heads
cylinder
What are the steps involved in performing a read from the disk? where is the bottle neck?


Position in the heads over the correct cylinder. Seek time
Rotate the disk until the desired sector is under the head. rotational latency
Rotate the sector under the head. transfer time

Seek time is the main bottle neck.
What is cylinder skew?

Cylinder skew is the amount by which sector 0 in one track is offset from sector 0 in neighbouring tracks.
Name 3 disk arm scheduling algorithms


Elevator
First in first out
Shortest seek time first

Consider a disk drive with the following characteristics

rotation speed: 10,000 RPM
number of surfaces: 8
sector size: 512 byte
sector/track: 750 (average)
tracks/surface: 100,000
average seek time: 4ms
one track seek time: 0.2ms
maximum seek time: 10 ms
Calculate:
capacity:
surfaces * tracks * sectors * sector size

8 * 100,000 * 750 * 512 = 307200000000 bytes

maximum transfer rate (single track):
bytes read in 1 second = tracks in 1 second * bytes per track
                       = (RPM / 60) * sectors per track * bytes per sector
                       = (10,000 / 60) * 750 * 512
                       = 64000000 bytes

maximum transfer rate (single cylinder):
Due to head skewing its the same.
maximum transfer rate (multiple cylinders):
bytes per cylinder / (time to read 1 cylinder + time to move to next cylinder)

bytes per cylinder = bytes per track * track per cylinder * number of surfaces
                   = 750 * 512 * 8
                   = 3072000

Time to read 1 track = 60 / RPM
                     = 60 / 10000
                     = 0.006 s
                     
Time to read 1 cylinder = time to read 1 track * tracks per cylinder
                        = 0.006 * 8
                        = 0.048 s

Time to move to next cylinder = 0.0002 s

Time to read 1 cylinder + time to move to next cylinder = 0.0482 s

So we have: 3072000 / 0.0482
          = 63.7 million bytes/s

cylinder skew:
(one track seek time * RPM * sectors per track) / 60

(0.0002 * 10000 * 750) / 60 = 25

maximum transfer rate (multiple cylinders with no cylinder skew):
bytes per cylinder / (time to read 1 cylinder + time for 1 revolution)

total time = time to read 1 cylinder + time for 1 revolution
           = 0.048 + time for 1 revolution
           = 0.048 + ( 60/10000 )
           = 0.040 + 0.006
           = 0.054s
           
So we have: 3072000 / 0.054
          = 56.9 million bytes/s

Week 5

Combining CPUs in parallel has been proven a good idea. Can we do the same with disks to prove I/O performance? What is this approach called?

Yes, this approach is known as RAID, redundant array of independent disks.
What does RAID stand for? and SLED? How is RAID usually implemented?

RAID - Redundant array of independent disks.
SLED - Single large expensive drive
A SCSI controller handles all implementation. The array is just exposed to the operating system as a single disk.
Is the operating system aware it is dealing with a RAID? Who does all the work?

The operating system is not aware, all of the implementation is taken care of by the SCSI controller.
RAID uses data striping. What does that mean?

Data striping describes how the files will be stored. It involves splitting the files across x disks.
Why does stripping improve performance for multiple independent I/O requests?

It allows requests to be serviced in parallel
Why does striping improve performance for single multi-block requests?

It allows for reading in parallel
What is fine-grained striping

Fine grained arrays interleave data in small units so all I/O requests, regardless of size, access all disks in the array.
This gives the advantage of high transfer rates for all I/O requests however it has two disadvantages:

Only one I/O request can be serviced at any instant
All disks must waste time positing for each request

What is coarse grained

Coarse grained arrays interleave data in large units. This means:

Small I/O requests access only a subset of the disks and several can be served simultaneously
Large requests benefit from high transfer rates

What is the advantage of fine-grained striping?

Every disk is involved resulting in high transfer rates
What is the disadvantage of fine-grained striping?

Only one I/O request can be serviced at any instant. All disks must waste time positioning for each request.
What are the advantages of coarse grained stripping?


Small I/O requests access only a subset of the disks and several can be served simultaneously.
Large requests benefit from high transfer rates

The larger the RAID the greater performance benefits. What's the draw back?


Not all requests will benefit
Increased chance of failure

How is the reliability issue addressed in RAID?

We assign extra disks for redundancy purposes.
What are the two major considerations when it comes to implementing redundancy?


Do a direct 1:1 mirror of the disk(s).
Use parity information to figure out what data has been lost.

Describe RAID Level 0. Why is it not considered a true RAID?


Data striped across all disks in an array
No parity

Advantages:

Good performance: With N disks, roughly N times speedup.

Disadvantages:

Poor reliability: one disk failure results in data loss.

It is not considered true RAID as it failed to implement redundancy
Describe RAID Level 1. What are its advantages and disadvantages?

Keep a mirrored copy of the data
Advantages:

Good reliability: one disk failure is OK.
Good read performance

Disadvantages:

High cost, one data disk requires one parity disk

Describe RAID2, is it ever implemented?

A data sector striped across data disks
Compute error-correcting parity and store in parity disks
Advantages:

Good reliability with higher storage utilisation than mirroring.

Disadvantages:

Unnecessary cost: disk can already detect failure
Poor random performance

It is never implemented
Describe RAID3

Single parity disk (XOR of each stripe of a data sector)
Advantages:

Same reliability with one disk failure as RAID2 since disk controller can determine what disk fails.
Higher storage utilisation

Disadvantages:

Poor random performance

Describe RAID4

A set of data sectors striped across data disk
Advantages

Same reliability as RAID3
Good random read performance

Disadvantages:

Poor random write and read-modify-write performance

Describe RAID5

Parity sectors distributed across all disks
Advantages

Good performance


What is the MBR? Where is it stored? What does it contain? What reads it?

Master boot record.
It is located at sector 0 of the disk. It contains the partition table.
The BIOS reads it.
What is an inode?

It is the datastructure that represents a file.
It contains information like size, permissions, owner, group, accessed time, modified time, created time.
What is a directory?

File of directory entries
What is a directory entry?

Data structure representing the contents of a directory
How long is a filename? Where are filenames stored?

Filenames are a variable length
A filename is stored within a directory entry or within the heap.
How do we check a directory for a file? Can we speed up this process?

Looking up a file name involves linearly searching a direcrory from beginning to end.
We can use a hash table within each directory to speed up the process
Week 6

A shared file shows up in more than one directory listening and thus requires two directory entries. Sharing poses a problem where a directory antry contains pointers to blocks. What is the problem?

When one version gets edited how does the other one get updated.
What goes into a directory entry when inodes are used?

The filename and the inode pointer
inodes allow two kinds of file sharing to be implemented. Name the two methods?

Hard link, soft link
A ___ is a directory entry that points directory to the inode of a shared file.

Hard Link
A ___ is a directory entry that points to an inode that points to the inode of the shared file.

Soft Link
What is one simple way of improving file system performance

Use buffer cache which is a collection of disk blocks retained in memory.
There may be thousands of blocks in the buffer cache. How do we search it efficiently?

Tree or Hash
The buffer cache is of finite size. This occasionally blocks must be removed. In what order do we maintain the cache to facilitate such removal?

Least recently used to Most recently used
If an inode block were reguarly modified it would remain in the cache for an extended period? Why is this a potential problem?

It would stay in the cache, this means we could experience a data loss should the machine cash.
What two questions must the buffer cache ask when deciding what to do with a block?


How likely is it to be used again
Is it essential to file system consistency

Blocks are likely to be required again in the near future go to the ___ while blocks unlikely to be used again in the near future go to the ___ of the buffer cache.

back, front.
Irrespective of where it goes in the list, what happens with blocks essential to the file system consistency?

They get written out straight away
If a modified block is not essential to file system consistency how long should it be kept in the buffer cache before we write it back to disk?

30 seconds
How much data could we lose on linux? what about on windows?

Linux - 30 seconds worth
Windows - None
what is block read-ahead?

Read blocks based on assumptions
how do cylinder groups improve file system performance?

Cylinder groups involve culstering files and their inodes in the same cylinder group. This results in fast reading.
What are the actions required to delete a file? What can go wrong?


Remove the directory entry for the file
mark the files inode as free
mark the files data blocks as free

What is the log used for in a journaling file system? What benefit does it bring?

The log is used to hold pending actions to be carried out. It allows us to run its contents when restoring from a system crash.
Logged operations must be idempotent. What does this mean? Why must operations have this property?

It means that running them several times should have no harmful side effects.
It must have this property as log operations can be carried out multiple times.
Unix merges different file systems into a single simeless directory structure. Does windows do the same thing?

No within windows each file system has its own distinct drive name.
What is it that enables these different file systems to co-exist and be used without any modification to the linux kernel?

The virtual file system. It abstracts the common operations that all file systems must support and moves them into an implementation-independent layer that calls the file system implementation.
Week 7

To log onto a unix workstatio a user enters a username and password and hits return. What happenms between hitting return and a command shell appearing?

The password entered is hashed, compard with that the username's password in the password and if they match the user is access is granted
How many IDs does a process have associated with it? List them.

A process has 6 IDs associated with it.

Effective User
Effective Group
Real User
Real Group
Saved User
Saved Group

Where do the IDs associated with your login shell come from

They come from a mapping with the passwords file, specifically /etc/passwd
Where do the IDs associated with processes launched by your shell come from

Real and effective are inheritied
Saved is copied from the effective
Where do the IDs associated with files created by a process come from?

The processes effective uids and guid
Where are the permissions on file x stored?

Within its inode.
What is a directory? What is it composed of?

A directory is simplely a file of directory entries.
What is a directory entry? What is it composed of?

A directory entry is composed of files.
It contains a filename and an inode pointer
What do rwx mean when wpplied to a file

Read write and execute
What do rwx mean when applied to a directory

Read write and traverse
A user wants to delete /a/b/c/foo what must be true for deletion?

The user will need execute access on a and b. They will need write access on c.
Permissions on your home directory are drwx-----x. Should you be concerned?

A little, this means it is possible for people to guess file names within your directory and open them.
Write a command to set the permissions on the file foo such that you and your classmates can read it but no one else can. Change it to allow you to read and write it. How does the kernel determine whether or grant or deny access?

chmod 440 foo
chmod 640 foo
The kernel applies access based on the following:

If the effective user id is 0 grant access.
If the effective user id matches the owner of the file and the appropriate permission bit is set, grant access. Otherwise deny
If the effective group id of the process or one of its supplementary group IDs matches the group ID of the file and the appropriate permision bit is set, grant acccess. Otherwise deny access.
If the appropriate other permission bit is set grant access, otherwise deny.

A process cfreates a file. How are the owner group and permissions determined?

They are taken from the effective UID and GUID of the process. The permissions are determined based of the umask value.
A multiuser workstation has a single shared /tmp directory for handly temporary files. What's unusual about this directory?

All users are allowed to write to it however users are not able to delete each others files. This is done by setting a sticky bit.
How can you change your password if you cannot edit the password file?

The program used to change the password, passwd has a sticky bit set. This sticky bit says that when the process is executed the effective owner and group should be taken from the owner and group of passwd rather than the user executing it.
This is known as setuid
If the setuid bit is set on an executable what happens when we exec it

The effective user and group is taken from the excutables owner/group.
You load a CD containing a setuser ID root program on a lab machine. You run the program. Does it run with root privileges?

No, we disable setuid operations on mount.
Does the passwd  program adhere to the principle of least privilege?

No. it has full root permission
Why is it important that you understand the setuser-ID-related system calls?

In order to avoid accidently giving somebody route privileges by not correctly dropping access.
How does a process permanently drop privileges? Temporarily?

Permanently - setreuid it will modify the real and effective user id, and if needs be the saved user id.
Temporarily - seteuid will modify the effective uid. We can gain privileges again by copying over the real id.
What is confusing about the setreuid system call?

In some cases it will modify real, effective and saved user ID. In other it will only modify real and effective.
When and how often are access permions on a requested resource checked? Given file descriptors are inherited across exec, what are the security implications?

Permissions are only checked on the open system call. Since file descriptors are inherited across exec a untrusted program may access an already open file descriptor.
Your setuser ID root program wishes to create a file owned by the real user. How should you do it?


Temporarily drop privileges and become the user.

Your setuser ID root program wishes to check if the real user should be allowed to access a particular file. What are the potential security pitfalls in the coding check.

An attacker could replace the file between the time access and open are executed. A better way to do this is to temporarily drop privileges before open.
Week 8

Why are buffer overflow vulerabilities so serious? What does a buffer overflow vulnerability allow an attacker to do?

They allow the execution of arbitrary code by an attacker on a victims machine.
What causes buffer overflow vulnerabilities? Which functions are the main culprits?

Poor C programming practices combined with a failure to properly santise user input.

gets
strcpy
strcat
scanf
memcpy

The code below implements the risky gets function. What is missing from this code?

char *gets(char *dest) {
 int c = getc();
 char *p = dest;
 while(c != EOF && c != '\n') {
  *p++ = c;
  c = getc();
 }
 *p = '\0'
 return dest;
}

We use the gets function and tthere is no way to specify a limit on a number of characters to be read. Thus we could keep writing to the stack and overwrite the return address.
Study the code below. Draw a picture of the stack just before gets function is called.

void foo() {
 char bufffer[8];
 
 puts("Please enter your name:");
 gets(buffer);
}

+----------------------------------+
|                                  |
|       Stack Frame for main       |
|                                  |
+----------------------------------+
|          Return address          |
+----------------------------------+
|            Saved %ebp            |
+----------------------------------+
|    [7][6][5][4][3][2][1][0]      |
+----------------------------------+

Explain how you could exploit a buffer overflow vulnerability to launch a shell

Modify the return address to run arbirary code that will launch a shell.
What is this?

static char sneakystring[] =
"\xeb\x12\x5e\x31\xc0\x88\x46\x07"
"\x50\x56\x31\xd2\x89\xe1\x89\xf3"
"\xb0\x0b\xcd\x80\xe8\xe9\xff\xff"
"\xff\x2f\x62\x69\x6e\x2f\x73\x68";

Machine code to launch a shell
Exploiting a buffer overflow vulnerability requires doing 2 things. What are they?

Placing a payload into the address space.
Jumping to that address space
Name 5 approaches to defending against buffer overflow attacks


Write correct code
Make areas of process address space non-executable
Use the compiler to do bounds checking
Address space randomisation

Redraw the stack diagram from above when the code is compiled with StackGuard enabled

+----------------------------------+
|                                  |
|       Stack Frame for main       |
|                                  |
+----------------------------------+
|          Return address          |
+----------------------------------+
|              Canary              |
+----------------------------------+
|            Saved %ebp            |
+----------------------------------+
|    [7][6][5][4][3][2][1][0]      |
+----------------------------------+

Name the 3 types of canary used by StackGuard. What is the difference between them?


Terminator canaries: detect runaway strings
Random canaries: detect sequential writes
Random XOR canaries: Detect random access memory writes

Study the ccode below. Whats wrong with the stackguard here?

Canary is only checked on return
We can modify the clearance flag.
Why does address space randomisation make buffer overflow vulnerabilities difficult to exploit?

Address space randomisation makes the stack segment and the start locations of other segments random. Guessing a new return address on the stack is made more difficult if for each invocation of a program the stack starts at a new random address.
Week 9

Linux implements Discretionary Access Control. What does this mean?

The object owner has full control over their own files.
Mary downloads and runs Angry Birds. What are the risks under DAC? With what privileges does the AngryBirds process run?

AngryBirds can access all of marys files.
It runs with all of marys privileges
What is the principle of least privileges? Does linux apply it?

Only give the needed access to the processes.
Linux does not apply it?
How do system administrators typically try to defend against attacks?


Firewalls
Patches
Strong passwords

What is a zero-day vulnerability? And a zero-day attack?


A zero-day is a new vulnerability that no patch exists for.
A zero-day attack is an attack that exploits the zero-day exploit.

What is the zero day problem


Can defend against them but can't prevent them

What is defence in depth?

Using multiple layers of security
What is Mandatory access control?

Access to files is mandated by a system policy
What is SELinux?

A linux module made by the NSA for MAC in linux.
What are the major components of SELinux, draw a diagram depiciting how they interact with DAC.


Security Policy
Access Vector Cache
Policy Enforcement Server
selinuxfs


Any action not ___ by the security policy is ___

Expressly permitted.
Denied
What kinds of permissions can be controlled under selinux

The main type of permission control is type enforcement. This causes program sandboxing through consigning each program to a domain and restricting eac domain to privileges sufficient only for its purpose.
Role-based access control is also available. It is used to prevent unauthorized SELinux users enterining particular domains.
What is an SELinux domain? What role does it play?

A domain is a sandbox. A sandbox constraits access based on two different things:

Role-based access control prevents users from entering unauthorized domains.
Type enforcement limits each domain only ot those privileges it requires in order to do its job.

A process ___ to a new domain by executing a program that serves as an ___ to that domain.

Transitions
Entry point
When is a domain transition allowed?

When the role is authorized to access the domain
What is the idea behind a targeted SELinux policy?

You only sandbox high risk applications
In selinux every subject and object is labelled with a  ____

A security context which contains an selinux user, a role and a domain.
The blink program blinks messages from messages on the LED display. Ordinary users can run the blink executable. Write the SELinux policy rules to enforce this behaviour. What is the benefit?

/usr/bin/blink {system_u, object_r, blink_exec_t}
/etc/blink/msgs.txt {system_u, object_r, blink_data_t}
/dev/led {system_u, object_r, blink_data_t}

allow blink_t blink_data_t : file {read write};
allow user_t blink_exec_t : file {execute};
allow blink_t blink_exec_t : file {entrypoint};
allow user_t blink_t : process {transition};
type_transition user_t blink_exec_t : process blink_t;
role user_t types blink_t

This results in the blink application executing within a sandbox. Users can not explicitly access /dev/led.
What is the difference between an SELinux access vs a transition decision?


Access decisions state what the user is allowed access to.
Transition decisions state if the user is allow to transition from one domain to another.

Draw a diagram depicting the domain transition that occurs whewn blink is executed?

          execute                         transition
bash ------------------> /bin/blink --------------------> blink    {user_u, user_r, blink_t}
{user_u,                 {system_u                         /\
user_r, user_t}           object_r,               /dev/led    /etc/blink/messages.txt
                          blink_exec_t}                   {system_u
                                                          object_r
                                                          blink_data_t}

What is the primary criticism of SELinux?

It is complex and difficult to maintain.
Week 10

What is a monolithic kernel? How is it constructed? Can you name any?

A monolithic kernel is one where all kernel components are linked together into a single program that runs in a privileged mode.
Windows and Linux
What are the advantages of a monolithic kernel design? What are the disadvantages?


Data structures can be easily shared and kernel components can easily communicate with one another as there is just one address space and all kernel components inhabit it together.


A bug in one component can affect the others. A big can bring down the entire OS.


What is a microkernel? how is one constructed? Can you name any?

A microkernel implements only the loewest level details of memory management and virtual memory threads and I/O.
Higher level abstractions such as processes, file systems and device drivers are implemented in servers that run in user space.
Applications communicate with these servers to invokve the kernel rather than with the kernel directly.
Mach and Chorus
What are the advantages of a microkernel? And the disadvantages?


Increased security, if a buggy component fails it does not affect the entire operating system.
Development is straight forward: debuggers are userland applications so debugging a file system implemented in a user land is simpler than debugging one in kernel mode

What does "virtual" mean in computer science?

Virtueal means ot is not physically existing but emulated by software
Who invented virtual machines in the 1960s? What motivated them to do it?

IBM invented them. They did so because they were having difficulty building a multiuser time-sharing system.
What is a hypervisor? A hypervisor implements multiple ___ into each of which is installed ___.

A hypervisor is a virtual machine monitor.

virtual machines
single-user time sharing system

What is the differene between a type-1 hypervisor and a type-2 hypervisor? Can you name any?


A hypervisor running on real hardware is a type-1 hypervisor
A hypervisor running on a host OS is a type-2 hypervisor

Type-1 hypervisors include:

Xen Server
VMWare ESXi

Type 2 hypervisors include:

VMWare workstation
VirtualBox

Why is virtualization good for a company running a number of servers that provide critical services? What causes a machine to crash: hardware or software?

It is a good idea because of server consolidation and service isolation. This means instead of the company having a large amount of physical machines to provide criticial services they have one very powerful machine that runs a hypervisor. From this they then use virtual machines to isolate the criticial services from each other.
Software normally causes a machine to crash.
#### What is a virtual appliance? What can you find at turnkeylinux.org?
A virtual appliance is a runtime environment and an operating system that is bundled together and shipped as a single unit. This is done to save on installation headaches.
Virtual appliances.
Why is virtualization a good idea when legacy hardward is required?

Some applications may require legacy hardware. By using virtualization we can emulate this hardware.
How can virtualization help software testers?

Software developers can test their software on a single machine against a variety of operating systems in a convenient manner without the need to reboot
How can virtualization help operating system developers

Debugging and testing operating systems that are running as applications on virtual machines is simpler than debugging a live oeprating system executing on real hardware
How does an operating system find out what hardware is available to it?

The operating system asks what hardware is available on bootup.
So how can we stop an operating system from seeing / modifying the real hardware?

Intercept requests to the host hardware and spoof responses
To make a guest OS believe it is dealing with a real machine, what must we virtualize?

Memory, CPU and I/O
#### In what mode is the guest operating system executing? Why is this unusual?
Userland. This is unusual because the kernel of the guest operating system is going to need to fulfil system calls.
What does it mean if a hypervisor adpots the "trap-and-emulate" (pure virtualization) approach?

This is where the hypervisor takes control and spoofs a response for the hardware.
Will "trap-and-emulate" (pure virtualization) always work? Who answered the question?

Popek and Goldberg discovered that it will not always work due to sensitive and privileged instructions.
What is a privileged instruction?

A privileged instrsuction when executed in privileged mode execute fully but when executed outside of privileged mode causes a trap
What is a sensitive Instruction?

Sensitive instructions are those that affect the allocation of real system reources whose effects depends on the state of the real hardware
If P is the set of privileged and S is the set of sensitive instructions, what must be true for a "trap-and-emulate" (pure virtualization) approach to be possible?

Sensitive instructions must be a subset of privileged instructions.
What happens when a privileged instruction is executed in a virtual machine running a type-1 hypervisor? In what two ways may this event be handled by the hypervisor?

The real processor traps to the hypervisor. The hypervisor checks the state of the virtual machine. If its in virtual user mode then control is passed to the guest OS. If it is in virtual kernel mode then the effective of the instruction is emulated by the hypervisor.
What does the x86 popf instruction do? what is unusual about it? Why is this a problem for "trap-and-emulate" / pure virtualization?

It is sensitive but not privileged. It will not cause a trap.
If "trap-and-emulate" is not possible does that mean virtualization is impossible?

No, there are workarounds.
Draw a diagram showing where a type-2 hypervisor fits.

Who was one of the early developers of the type2-hypervisor?

Vmware
How does VMware workstation handle instructions like popf?

It uses binary translation. It listens for sensitive instructions that are not privileged and re-writes them to hypervisor instructions.
What does VT-x do?

VT-x allows the processor to notify a hypervisor when a configurable set of events occur.
What is better, binary translation or "trap-and-emulate" / pure virtualization?

Binary translation as it is not as expensive.
What happens when an application makes a system call on a type-1 hypervisor?


The trap goes to the hypervisor which sees it was generated by code executing in virtual user mode.
The hypervisor hands control to the system call handling code in the guest operating system
The guest operating system executes the system call.

What happens when a privileged instruction is executed in a virtual machine running on a type-2 hypervisor? In what two ways may this event be handled by the hypervisor?

The host hands control to the hypervisor and hands control to the system call handling the code in the guest operating system. The guest operating system then executes the system call.
What happens when an application makes a system call on a type-2 hypervisor?


A software interrupt is generated
A trap goes to the host operating system
The hypervisor registeres itself as a debugger with the host and is passed control by the host in response to the trap.
The hypervisor determines that the trap was caused by code executing in virtual user mode.
The hypervisor hands control to the system call handling code in the guest operating system.
The guest operating system executes the system call.


## process-life-cycle.png

      
    Raw
  

              process-life-cycle.png
            
          
## selinux.png

      
    Raw
  

              selinux.png
            
          
## threadmutex.png

      
    Raw
  

              threadmutex.png