czinn/notes.md

## notes.md

      
    Raw
  

              notes.md
            
          
    1 Computer System Overview

1.1 Basic Elements


processor: controls operation of computer, data processing functions
main memory: stores data and programs, typically volatile
I/O modules: move data between computer and external environment, e.g. disks, internet, terminals
system bus: communciation between three systems above

1.2 Evolution of the Microprocessor


microprocessors used to be slower than multi-chip processors, but everything changed when the fire nation attacked they got really small and physics got involved
a single chip can now contain multiple processors
GPUs are more efficient for calculations on arrays of data, some modern processors have vector operations too
Digital Signal Processors handle streaming signals specifically
other specialized processing units (encryption, audio/video codec stuff, etc.) also exist
mobile devices have System on a Chip (SoC), where main memory, I/O devices, radios, etc. are all on same chip as main processor

1.3 Instruction Execution


fetch intstruction, execute instruction, repeat
instructions can be in four categories:

processor-memory: transfer data between processor and memory
processor-I/O: transfer data between processor and I/O
data processing: arithmetic or logic op
control: branches


combinations also possible
it's like ECE 222 but less detailed

1.4 Interrupts


interrupts improve processor utilization by interrupting normal sequencing
I/O devices very slow, so run other processes (or same process, if nonblocking) in meantime and wait for interrupt
when starting instruction, processor checks for interrupt requests; if any, branches to handler instead of next instruction
after interrupt, previously running process is resumed
nine steps, five in hardware, four in software, for handling interrupts:

device issues interrupt signal
finish previous instruction
test for interrupt requests, sends ack signal
save context (PSW, PC)
load PC with entry location of interrupt handler
save additional state in software (e.g. registers, stack pointer, etc.)
process interrupt (examine status info, send commands, acks, etc.)
restore saved state
restore PSW, PC


interrupts often disabled during interrupt handling, though can have multiple priority levels for handlers

1.5 The Memory Hierarchy


access time is inversely proportional to cost, so can't make entire memory out of fast memory
put most frequently accessed data in more expensive but faster memory

this is possible because of locality of reference; data accessed recently will often be accessed again soon
use various swapping strategies to move things between layers of cache based on recent references


levels are usually processor registers, then cache (often in processor as well), then main memory (volatile), then secondary memory (disks, SSD)

1.6 Cache Memory


every processor cycle requires at least one main memory access to fetch instruction, so cache very important to prevent slowdown
cache maintains copy of fixed length blocks from main memory
blocks have tags to indicate which main-memory block it corresponds to
look for word in cache before resorting to main memory lookup
need mapping function, replacement algorithm, and write policy for cache; more details in memory chapter

LRU is best


multiple levels of cache (L1, L2, often L3) can be used

1.7 Direct Memory Access


I/O can be:

programmed, i.e. polling
interrupt-driven, which doesn't block the processor while doing I/O
DMA!


DMA module is delegated task of transfer of complete block to/from memory or I/O device
far more efficient for multiple-word transfers

1.8 Multiprocessor and Multicore Organization


symmetric multiprocessors (SMP) defined as:

two or more similar processors
share main memory and I/O facilities, interconnected by e.g. bus
share access to I/O devices
processors can perform same functions
system controlled by integrated OS


get better performance, availability, growth, scaling
processors can have their own caches
multicore aka chip multiprocessor has multiple cores on single chip

basically SMP but on single chip


1A Performance Characteristics of Two-Level Memories


locality happens because:

besides for branches, execution is sequential
usually some time is spent localized in single process, not always jumping around
loops involve small numbers of instructions repeated a lot
array elements are usually referenced in sequence


spatial locality is tendency to access memory locations that are clustered
temporal locality is tendency to access same memory location again soon
can calculate average access time using times for each memory and hit ratio
average cost per bit depends on costs per bit in each memory and sizes of memory
access efficiency is access time for M1 over average access time; function of hit ratio
hit ratio is much higher than S1/S2 when there is lots of locality

2 Operating System Overview

2.1 Operating System Objectives and Functions


objectives of OS are (ACE):

convenience
efficiency
ability to evolve


convenience is enhanced by following services:

libraries for I/O, file management, etc.
utilities for launching and managing tasks
handling I/O devices
controlling file access, system access
error detection
accounting


key interfaces in system are:

ISA: machine language instructions
ABI: standard for binary portability across platforms; system call interface
API: program access to hardware resources and services; high-level language library calls


efficiency is enhanced by managing resources

OS kernel does scheduling, allocation, etc. to efficiently manage resources


ease of evolution enhanced by allowing:

hardware upgrades via drivers
new services via OS updates
bug fixes via OS updates


2.2 The Evolution of Operating Systems


serial processing: manually run batch jobs, one after another
simple batch systems: monitor automatically runs next job after job completes

can do compilation of user programs by loading compiler
can also support memory protection, timer, privileged instructions, interrupts
programs execute in user mode, monitor executes in kernel mode


multiprogrammed batch system: interleave processes for greater efficiency

becuase I/O takes a lot of time
requires more advanced memory management and I/O management


time-sharing systems: handle multiple interactive jobs

minimize response time and accept commands from terminal
memory protection requirements


2.3 Major Achievements


process:

job in execution, instance of program running, entity that can be executed on processor
multiprogramming requires ability to switch between processes
time-sharing also requires this
real-time systems also require this
lots of problems result from multiple processes:

improper synchronization
failed mutual exclusion
nondeterminate program operation (things can execute in ~random order)
deadlocks


process is systematic way to manage information about task
store program, data needed by program, execution context
threads are extension of this concept, discussed later


memory management:

five responsibilities:

process isolation required when multiple processes in system
automatic allocation/management: program shouldn't have to do its own memory allocation
support modular programming: create, destroy, alter size of modules dynamically
protection/access control: some memory is shared, but some is not shared
long-term storage: manage memory on hard drives


use virtual memory with virtual/logical addresses that get translated to real addresses


information protection and security:

availability: protect system from interruption
confidentiality: can't read unauthorized data
data integrity: can't write unauthorized data
authenticity: verify user identity, validity of messages/data


scheduling and resource management:

especially important for time-sharing
must consider fairness, differential responsiveness, efficiency
round robin plus interrupts is pretty good


2.4 Developments Leading to Modern Operating Systems


microkernel architecture:

OS was traditionally large monolithic kernel with all the features
shift towards microkernel, which has only essential functions in kernel, rest in user space
more customizable, better ability to evolve


multithreading:

separate executable work and process data
can parallelize work, especially on SMP or multicore systems


SMP: better performance, availability, incremental growth, scaling (I think this list was in Chapter 1 as well)
distributed OS: provide illusion of single main memory, secondary memory, but distribute computation across multiple computers
object-oriented design: write OS with OOP, eases development

2.5 Fault Tolerance


reliability R(t) is probability of correct operation up to time t given that operation was correct at t=0
mean time to failure (MTTF) is Integral[R(t), {t, 0, inf}]
mean time to repair is how long it takes for repair or replace faulty element
availability is MTTF / (MTTF + MTTR)
faults can be:

permanent: after it occurs, always present; disk hard crashes, software bugs
temporary: not permanent

transient: occurs only once
intermittent: multiple, unpredictable times


three types of redundancy for managing faults:

spatial (physical) redundancy: multiple components that do the same thing
temporal redundancy: repeat a function when an error occurs; effective for temporary faults
information redundancy: error correction codes


OS can provide:

process isolation: more hermetic environment, because interference from other procs can cause faults
concurrency controls: because problems cause concurrency can
virtual machines: even more isolation, prevent some types of permanent faults
checkpoints and rollbacks: copy of application state in backup; often used in databases


2.6 OS Design Considerations for Multiprocessor and Multicore


design issues for SMP include:

simultaneous concurrent procs/threads: kernel data structures need to account for possiblity of multiple instances of same service call running at once
scheduling: each processor needs to have scheduling, can interfere
synchronization: some processes might depend on other procs, also need to mutex shared resources
memory management: need to exploit hardware parallelism; paging mechanisms need to be consistent
reliability and fault tolerance: graceful degradation if one processor fails


design issues for multicore:

same issues as SMP, but multicore is EVEN MORE PARALLEL so it's just extra hard


parallelism within application: write applications to take advantage of parallel execution

Grand Central Dispatch on OSX/iOS is thread pool mechanism, makes this easier


virtual machine approach: give process entire cores, let it run as if it is only program on system

OS just acts like hypervisor, process handles own resources for most part


2.7 Microsoft Windows Overview


originally extension to MS-DOS, replaced by Windows NT
organizes system into:

executive: core OS services

I/O manager: implements all the APIs for I/O
cache manager: for file-based I/O
object manager: manages Windows Executive objects
plug-and-play manager: automatic device driver installation
power manager: shutting down idle devices, sleeping, writing all memory to disk and shutting off
security reference monitor: manage secure access to all Objects
virtual memory manager: virtual address management, physical memory, paging
process/thread management: create, manage, delete procs and threads
configuration manager: system registry
advanced local procedure call (ALPC) facility: efficient cross-process procedure call mechanism; pretty much like RPC


kernel: scheduling, processor management
hardware abstraction layer (HAL): maps between generic hardware commands and platform-specific stuff
device drivers: extend functionality of executive, mostly to do translation to HAL commands
windowing and graphics system: GUI functions


four types of user-mode processes:

special system process: user-mode OS service, can manage system in more privileged ways; session manager, auth subsystem, service manager, logon proc
service processes: printer spooler, event logger, device driver components, network services; background user-mode activity
environment subsystems: provide different OS "personalities"; either Win32 or POSIX; converts system calls
user applications: EXEs and DLLs


client/server model is used on Windows, which makes it easier to use in distributed computing; used even on single system

kernel-based services are invoked through RPC; Executive routes request to appropriate server
simplifies the Executive, easy to add new APIs
improves reliability since more things are running outside the kernel
uniform means for applications to communicate with services
suitable base for distributed computing, just route RPC over network as required


Windows is OO, Objects are important feature; they provide:

encapsulation: data of object protected by object's services (methods)
object class and instance: standard OS stuff, makes creating objects easier
inheritance: hand coded implementation, but still good because inheritance is good
polymorphism: common set of API functions to manipulate any object; however, not fully polymorphic, many APIs are specific to specific type


Object Security Descriptor (SD) used to restrict access to specific user
Objects can be named so that other processes can get a handle to them and shared them
two categories of objects used to manage processor:

Dispatcher object: threads can wait on these
Control objects: manage operation of processor not managed by normal thread scheduling


2.8 Traditional UNIX Systems


one of first OSes to be written in C rather than assembly, took advantage of high-level language features and ease of development
UNIX commands/libraries interact with kernel through system call interface, and kernel interacts with hardware
all interaction with hardware is done through file subsystem, which can use either character or block interface with hardware control system

2.9 Modern UNIX Systems


System V Release 4, BSD, and Solaris 10 are all more modern versions of UNIX
SVR4 is very important, objective was to be uniform platform for commerical UNIX, incorporates most important features
BSD had more academic focus, widely used, responsible for popularity of UNIX

Mac OS X based on BSD


Solaris 10 was Sun's version; fully preemptable, multithreaded kernel, full support for SMP, OO interface to file systems

most successful commercial UNIX implementation


2.10 Linux 91


originally project of Linus Torvalds, lots of collaboration over internet
success is largely due to free software packages, FSF's GNU project
highly modular, easily configured, maximum rice
loadable modules are object files which can be linked and unlinked to kernel at runtime

even though kernel is monolithic, modules make it feel less monolithic


lots of components:

signals: used to call into processes (notify of faults, e.g.)
system calls: used to request kernel services
scheduler: manages and schedules processes
virtual memory: allocates and manages virtual memory for procs
file systems: global namespace for files, directories, etc.
network protocols: Sockets interface to TCP/IP
character device drivers: manage devices like terminals, printers, modems
block device drivers: manage secondary memory, mostly
network device drivers: manage network interface cards, bridges, routers
traps and faults: handle traps and faults generated by processor
physical memory: manages pool of page frames in real memory
interrupts: handle interrupts


2.11 Android

Not included on exam.
3 Process Description and Control


OS must interleave process execution, allocate resources to processes, and support interprocess communication

3.1 What Is a Process?


program in execution, instance of program, entity that can execute on processor
process can be uniquely characterized by:

identifier
state: e.g. running state
priority
program counter: address of next instruction in program to execute
memory pointers: pointers to code, data associated with process, shared memory blocks
context data: data in registers
I/O status information: I/O requests, assigned I/O devices, files in use
accounting information: amount of processor time, clock time, time limits, etc.


all that information is stored in process controll block (PCB)
PCB enables interrupting and later resuming execution as if no interruption occurred

3.2 Process States


trace of process is sequence of instructions executed for that process
two-state model has only running and not running; processes handled in round-robin manner
process creation involves creation of PCB and allocation of memory space for program/data
processes can be created (spawned) by:

new batch job
interactive log-on (spawn user process when they log on)
created by OS to provide service (e.g. automatically creating printing process)
spawned by existing process: fork! this is a child process


processes can be terminated for these reasons:

normal completion
time limit exceeded (total time limit)
memory unavailable
bounds violation (memory)
protection error (resources, e.g. files)
arithmetic error (divide by zero)
time overrun (blocked for too long)
I/O failure
invalid instruction (usually a result of branching to a data area)
privileged instruction
data misuse (type error)
OS intervention (deadlock management, maybe)
parent termination (parent was terminated)
parent request (parent does the terminating)


five state model has ready, running, blocked, new, and exit; they work as expected

new and exit are useful for management; gives OS time to allocate/free resources
processes in ready state still handled round-robin
can have multiple ready queues to support different priorities
can have blocked queues for each resource


swapping is useful because you can accomodate more processes at once; add suspend state

can also add ready/suspend and blocked/suspend states if even procs in ready state can be suspended


suspension is defined by these properties:

process not immediately available for execution
may or may not be waiting for event (wtf textbook); can become unblocked without unsuspending
process was placed in suspend state by agent; itself, parent, or even OS (wtf textbook)
process will remain suspended until agent explicitly orders unsuspension (something something Newton, also wtf textbook)


can suspend a program to inspect its state

3.3 Process Description


memory tables track both real and secondary memory

some reserved for OS, rest used by procs
tracks allocation of memory to processes, protection attributes, virtual memory data


I/O tables manage I/O devices and channels

also tracks individual I/O requests to it can route responses


OS may also have file tables, which are like I/O tables but for files
process tables are the list of PCBs, pretty much
all tables may be interlinked
tables subject to memory management as well
collection of resources for process referred to as process image; at least some of image must be in main memory for execution
paging can be used to make the image seem contiguous even if it's not, or even if not all of it is in main memory
process control block information can be grouped into three categories:

identification (number)
state (registers, stack pointers)
control (scheduling and state, OS data structure stuff (mp_next), IPC, privileges, memory management info, resource ownership)
can also have accounting, which is like control but separate


problems in PCB are big problems, so protect PCBs by putting them behind handler routines

3.4 Process Control


processes can execute in user (unprivileged) mode, or kernel (priviliged) mode

mode determines allowed instructions
mode determined by bit in PSW, changed in response to service calls and returns


process creation steps:

assign unique PID
allocate space for process, including its PCB
initilize PCB
set linkages (i.e. put in queue)
create/expand other data structures, like accounting


three reasons to do process switch:

interrupt (clock, I/O, memory fault)
trap (error or exception condition)
supervisor call (service call by process)


process state change steps:

save context of processor
update PCB
move to appropriate queue
select another process
update PCB of new process
update memory management structures (address translation)
restore context of new process


3.5 Execution of the Operating System


OS functions can execute in three ways:

nonprocess kernel: call into OS, which has own memory and own system stack, can do whatever
execution within user procs: process image also includes space for OS data structs, OS stack; OS functions executed within process; only process switching is external
process-based OS: OS functions are in their own processes, only IPC and process switching is external


3.6 UNIX SVR4 Process Management


nine process states:

user running
kernel running (move to this state from ready/in memory, then to user running)
ready/in memory
asleep/in memory (blocked)
ready/swapped
sleeping/swapped (wtf UNIX, why is it sleeping here but asleep in memory?)
preempted (basically same as ready/in memory, but reached by being preempting as exiting kernel running)
created (new)
zombie (exit)


proc 0 is swapper process, it spawns proc 1 which is init process, all other procs have proc 1 as ancestor
divides context into three parts:

user-level context: program, process data, stack data, shared memory
register context: PC, PSW, stack pointer, registers
system-level context: process table entry, U area, region table, kernel stack (for when proc executes in kernel mode)


PCB has state, pointers, process size, user IDs, process IDs, even descriptor (for blocking), priority, signals, timers, P_link, memory status
U area has process table pointer, user IDs, timers, signals, login terminal, error field, return value, I/O params, file params, file descriptor table, limit fields, permission modes
fork is used to create process

copies process image, except for shared memory
returns ID of child to parent, returns 0 to child


4 Threads

4.1 Processes and Threads


separate resource ownership and scheduling/execution; thread/lightweight process is executable, process holds resources
multithreading is multiple threads per process
threads have own stacks, thread control block, access to memory/resources of process
takes less time to create/terminate/switch between threads
more efficient interthread communcation because don't have to invoke kernel to IPC
threads can be used for:

foreground vs background work
asynchronous processing
speed of execution (some threads can get blocked, but other work can still occur)
modular program structure


suspension and termination affect all threads
thread synchronization is a concern since shared memory, addressed in chapter 5 and 6

4.2 Types of Threads


threads can be user-level or kernel-level
user-level threads managed by library in user space

blocking on messages from other threads can be handled in user space
light weight, doesn't require switch to kernel mode to manage threads
scheduling can be application specific
OS independent
if one thread blocks on system resource, kernel will block entire process

can solve this problem with jacketing, wrap kernel requests with nonblocking versions so that kernel doesn't block whole process


kernel-level threads are handled by kernel

kernel scheduler manages scheduling, switching; processes request threads through API
kernel can schedule multiple threads on multiple processors
kernel routines can be multithreaded
operations are more costly though


can combine approaches, where some ULTs are mapped to KLTs but some are just in user space
TODO: read about 1:M and M:N threads, where they can migrate between domains

4.3 Multicore and Multithreading


speedup from multicore is 1/((1-f) + f/N), where f of execution time is parallelizable and there are N cores
overheads mean that sometimes multicore is actually worse
some applications can scale really well with more cores:

multithreaded native applications
multiprocess applications (databases)
Java applications (JVM is super multithreaded)
multiinstance applications


Valve reprogrammed Source engine to take advantage of multithreading

found that coarse threading (major modules separate, occasionally synchronize with timeline thread) offered speedup of around ~1.2
found that fine-grained threading (similar/identical tasks spread across processors) was hard to program
used hybrid threading, where both types of threading are used
assign sound mixing to own processor, e.g.
scene rendering split up into multiple threads in hierarchy
use single-writer-multiple-readers to allow threads to continue to execute, not block each other


4.4 Windows 8 Process and Thread Management


application is one or more processes
processes hold resources, have one or more threads
thread can be scheduled for execution
job object is collection of processes that can be managed as a group
thread pool is collection of worker threads
fiber is unit of execution manually scheduled by application; run in context of thread that schedules them

like threads within threads


user-mode scheduling (UMS) allows applications to schedule their own threads

more efficient for short-duration work items with few system calls


Windows 8 changes a bunch of things:

only one Store app can run at once
apps can be terminated randomly, must save entire relevant state to disk each time app is suspended
background task API lets apps perform small tasks in background, like handling push notifications


Windows process are objects
process can be new, or copy of existing process
process and thread both have synchronization capabilities
logged-on users are given security ID; this is shared with literally all processes run by user, which is very secure

how is this different than UNIX, where we just record user who ran proc? calling it a security ID doesn't make it more secure


process points to linked list of virtual memory blocks, used by memory manager
process has object table with related objects, like threads and files
thread processor affinity is set of processors that are allowed to run a thread; process processor affinity is same thing but for process as a whole
Windows process states are:

ready
standby (intermediate between ready and running)
running
waiting (blocked)
transition (intermediate between waiting and ready; e.g. some pages are paged out)
terminated (exit)


complicated stuff happens to support both Win32 and POSIX

Windows doesn't automatically create thread for new process
Win32 always has intiial thread, so it makes a second request for an initial thread
POSIX doesn't support threads, so it makes a second request for an initial thread, and returns only process information to creator


4.5 Solaris Thread and SMP Management


Solaris has four thread-related concepts:

process
user-level threads
lightweight processes (mapping between ULTs and kernel threads)
kernel threads (fundamental entities that can be scheduled)
always exactly one kernel thread for each LWP, one LWP for each ULT
can also have kernel threads that execute specific system functions, not bound to LWP


can swap out ULT library, just have to fit LWP interface to be mapped to kernel threads
replaces traditional UNIX processor state block with list of data blocks for LWPs; LWP block has:

LWP identifier
priority
signal mask
saved values of user-level registers
kernel stack
resource usage and profiling data
pointer to kernel thread
pointer back to process structure


thread states are:

IDL (new?)
RUN (ready)
ONPROC (running)
PINNED (interrupted by interrupt thread)
SLEEP (blocked)
STOP (stopped, for debugging purposes)
ZOMBIE (exit)
FREE (resources released, waiting for thread data structure to be removed)


interrupts are implemented as threads as well

set of kernel threads dedicated to handling interrupts
kernel controls access to dat structures, syncs among interrupts using mutexes
have highest priority
pin current thread when running interrupt thread


4.6 Linux Process and Thread Management


process or task is represented by task_struct data structure with:

state
scheduling information (priorities, normal/realtime, time quantum)
identifiers
IPC data
links (to parent, siblings, children)
times and timers (creation time, processor time consumed)
file system (owned files, current/root directories of process)
address space (virtual memory)
processor-specific context (registers, stack info)


process states are:

running (consists of both ready and executing)
interruptible (blocked, waiting for event)
uninterruptible (blocked, waiting on hardware conditions, will not handle signals)
stopped (halted, probably for debugging purposes)
zombie (exit)


no thread support, one thread per process
instead, use pthread (POSIX thread) library, which creates kernel-level process for each ULT
kernel-level processes can share group ID to share resources like files and memory, less need for context switching
clone instead of fork preserves resources

fork is just clone with clone flags cleared


six types of namespaces (mnt, pid, net, ipc, uts, and user); namespaces provide processes with different view of system, like a virtual machine

mnt (mount): different mount provide different views of filesystem; can only access files in visible part of filesystem
uts (UNIX timesharing): allows group of machines within NIS domain to share config files
ipc: isolates certain IPC resources
pid: processes in different PID namespaces can have same PID
net (network): isolation of resources associated with networking
user: container with own set of UIDs, separate from parent


4.7+

Not included on exam.
5 Concurrency: Mutual Exclusion and Synchronization


multiprogramming is running multiple processes on a single processor
multiprocessing is multiple processors
distributed programming is running processes on multiple, distributed computer systems
concurrency can occur within:

multiple applications
structured application (single app programmed as multiple concurrent processes)
operating system structure (OS implemented as set of processes/threads)


5.1 Principles of Concurrency


even if only multiprogramming, can still have concurrency problems e.g. race conditions
bugs can be hard to find because of nondeterminism
race conditions can be prevented by disallowing multiple instances of programs being in critical section at same time
functioning of process (incl output) must be independent of speed of execution and other concurrent processes
processes can have different degrees of relation to each other

unaware: interact by competition

use of critical resource is a critical section, needs mutual exclusion
deadlock and starvation can happen


indirectly aware: cooperate by sharing

shared resources are now shared data
reading is fine, but writing still needs mutual exclusion, deadlock and starvation still possible


directly aware: cooperate by communication

don't need mutual exclusion for this sort of interaction
deadlock and starvation still possible; e.g. can have loop of processes waiting for messages from each other


requirements for supporting mutual exclusion:

mutual exclusion is enforced; only one proc at once is in critical section
process that halts in noncritical section doesn't matter
no deadlock or starvation
if nothing in critical section, things should be able to enter critical section
no assumptions about relative speeds
process remains in critical section for finite time


5.2 Mutual Exclusion: Hardware Support


possible to do mutual exclusion with only software, but error prone

can (almost?) always find race condition that makes the software solution break


can disable interrupts, which prevents any other process from running

does not work in multiprocessor environment


special machine instructions can do two operations as single atomic operation
compare&swap checks whether *word is equal to testval; if so, it sets *word to newval; always returns old value

shared bolt initialized to 0; when entering critical section, try to swap a 1 into bolt
afterwards, clear bolt


exchange instruction just swaps two variables

shared bolt initialized to 0; exchange with key, initialized to 1
when swap goes through (key is 0), do critical section
afterwards, set bolt back to 0 (equivalently, swap key with bolt again)


hardware instructions guarantee mutual exclusion
however, busy waiting is employed
starvation and deadlock still possible
want higher-level tools

5.3 Semaphores


semaphore has internal count
semWait decreases semaphore value; if value is negative, block process
semSignal increments semaphore value; if value is not positive, unblock one of the blocked process
initial value of semaphore is number of processes that may be in critical section at once
binary semaphore is variant

count is always 1 or 0
instead of going below 0, just keep at 0
set back to 1 if no processes remain in critical section


mutual exclusion lock is called mutex, basically just a binary semaphore

lock before using resource, unlock afterwards
process that locks mutex must be one that unlocks it


strong semaphore guarantees that process that has been blocked for the longest will be first to be unblocked
weak semaphore does not specify order of unblocking
typically assume strong semaphores; honestly you have to work pretty hard to make your semaphore weak
strong semaphores prevent starvation

Producer/Consumer Problem


1+ producers generating some type of data, putting in buffer
1+ consumers taking items out of buffer
only one agent can access buffer at once
want to prevent taking from empty buffer or adding to full buffer
if you use a binary semaphore, you have to use two semaphores and be careful about when you do your signals

save local copy of count after decrementing, consume resource, and then wait for new item if consumed last item
may not actually need to wait if more items have already been added to buffer in meantime


general semaphores are much easier; have one for lock on buffer, another (n) to track number of elements in buffer

producer signals n after appending, consumers wait on n before taking


with finite buffer, have additional general semaphore (e) that tracks number of free spaces in buffer

producers wait on e before appending, consumers signals e after taking


semaphores implemented by OS using hardware mechanism already discussed (or even software, maybe)

5.4 Monitors


problem with semaphores is that they are scattered thorughout code; want even higher-level tool
monitor has one or more procedures, initialization sequence, and internal data
local data variables only accessible by monitor procedures
process enters monitor by invoking a procedure
only one process may execute in monitor at once; other processes that try to enter are blocked
monitors also include synchronization tool in form of conditions
cwait(c) blocks on condition c; csignal(c) unblocks a process blocked on c, if any, otherwise does nothing
for producer and consumer problem, have local variables for buffer, buffer indices, and count, and conditions notfull and notempty

append method waits on notfull if currently full, signals notempty after appending
take method waits on notempty if currently empty, signals notfull after taking


csignal is usually last thing in procedure; if it isn't, can block signalling proc into special urgent queue, wait until monitor becomes free again

or be like Concurrent Pascal and forbid csignal from not being last operation in procedure


instead of Hoare's model (described above), can use Mesa monitor instead

csignal replaced with cnotify, which unblocks a process waiting on that condition but does not immediately run it; it will run at some later convenient time
fewer process switches, no constraints on when cnotify is run
must be careful to use while(blah) { wait } instead of if, since e.g. queue may have emptied again since notempty was signalled
can also add cbroadcast, which notifies all waiting procedures

useful if doing e.g. producer/consumer with variable length blocks, or otherwise uncertain about which/how many procs to unblock


finally, Mesa monitor is more modular, because waiters can determine their own conditions for resuming, and signallers only need to know that some waiters might want to resume


5.5 Message Passing


can send and receive messages to other processes
three possibilities for blocking/nonblocking:

blocking send and receive: rendezvous, tight synchronization
nonblocking send, blocking receive: sender continues on, receiver must wait for message; most natural; don't know if message was received
nonblocking send and receive: neither need to wait; receive returns null if no message waiting


addressing can be:

direct: target destination process specifically

can require that receiver request to receive from sender explicitly (explicit address) or just accept from anyone (implicit addressing)


indirect: send to and receive from shared data structure consisting of message queue called mailbox

allows (one|many)-to-(one|many); many-to-one is called port; one-to-many is broadcast; one-to-one is private
mailboxes can be owned by receiving process, or by creating process, or by OS


messages have header (message type, destination, source, length, control info) and data
FIFO is most common for queue, could allow receiver to look through message queue
can do mutual exclusion by sending null message to mailbox; processes take message before entering critical section, put back afterwards
can solve finite-buffer producer/consumer with mayproduce and mayconsume mailboxes

initialize mayproduce with number of messages equal to capacity
producer takes a message from mayproduce, produces, and sends produced content to mayconsume
consumer takes content from mayconsume, consumes, and sends null message to mayproduce


5.6 Readers/Writers Problem


any number of readers may simultaneously read, but only one writer may write at once, and while writer is writing, no reading
readers or writers may have priority
annoying to do with binary semaphores
easier to do with message passing
see examples in text, p. 241, 243-4

6 Concurrency: Deadlock and Starvation

6.1 Principles of Deadlock


deadlocks involve conflicting needs for resources by two or more processes
can draw joint progress diagram to see when deadlock can occur

axes are progress of processes, with lines for lock A, release B, etc.
shade areas of resulting grid where single resource is needed by both processes; can't enter those regions
if stuck in concave region (can't go up or right without hitting shaded area), deadlock is inevitable
finding clear lattice path to top right means avoiding deadlock is possible


reusable resources can be released and reused; things like processor, I/O channels, memory, devices, files, etc.
consumable resources are produced and consumed; things like interrupts, signals, messages, info in I/O buffers
resource allocation graph represents processes as circles, resources as square with dot

arrow from proc to resource is a request
arrow from resource dot to proc is held resource
for consumable resources, holding becomes producing


for deadlock, need three conditions for possibility and then one more for existence

mutual exclusion: only one process may use a resource at once
hold and wait: a process may hold resources while waiting for other resources
no preemption: can't forcibly take resources from processes
circular wait (for existence): closed chain of processes such that each process holds at least one resource needed by next process in chain (i.e. cycle in resource allocation graph)


6.2 Deadlock Prevention


prevent deadlocks by eliminating one of the conditions required for deadlock
mutual exclusion: can't really prevent, but for something like reads, deadlock can't occur because multiple readers are allowed
hold and wait: require that procs allocate all resources simultaneously

impractical because programs are modular
can't know in advance how many resources will be needed
inefficient because a lot of waiting happens


no preemption: allow preemption of resources

processes can be suspended to reclaim memory (and sometimes other resources too)


circular wait: define linear ordering of resources

all processes must allocate resources in order specified by linear ordering
inefficient like hold and wait


6.3 Deadlock Avoidance


dynamically prevent deadlock by disallowing process initialization or resource allocation if deadlock would become possible
define following symbols:

resource vector R (total amount of resource)
available vector V (current free amount of resource)
claim matrix C: max resource requirements for each process
allocation matrix A: current resource allocation for each process


most conservative algorithm would only allow process initialization if system could meet all max resource requests simultaneously
banker's algorithm tries to keep system in safe state

safe state is one in which there is a sequence of resource allocations that does not lead to deadlock
unsafe state is state that is not safe
basically, find process that can run to completion, and then free all its resources, and complete until everything is done
requires assumptions: maximum resource req stated in advance, processes are independent (no messages), fixed number of resources, completed processes hold no resources


6.4 Deadlock Detection


detect deadlock and then resolve
use modified version of banker's algorithm

replace claim matrix C with request matrix Q (what is currently requested)
perform banker's algorithm, marking processes that complete
at end, unmarked processes are deadlocked


recover by:

aborting all deadlocked processes; actually very common
back up deadlocked processes to checkpoints, then restart
successively abort deadlocked processes until deadlock goes away
successively preempt resources until deadlock goes away; roll back preempted processes to before they acquired those resources


for latter two recoveries, can select process with:

least amount of processor time so far
least output so far
most time remaining
least total resources allocated
lowest priority


6.5 An Integrated Deadlock Strategy


group resources into resource classes, use linear ordering strategy on classes, use specific strategy within resource
best strategy for specific resource depends on resource:

swappable space: prevent with hold-and-wait, or maybe deadlock avoidance
process resources (tape drives, files): avoidance usually effective
main memory: prevent with preemption (by swapping)
internal resources (I/O channels): prevent with resource ordering


6.6 Dining Philosophers Problem


circle of (five) philosophers, forks between philosophers, must claim fork on both sides to eat
lots of solutions
use semaphore, prevent more than four philosophers from entering room at once
check that both forks are free before grabbing
put down left fork if right fork is not available, then block on right fork

6.7 UNIX Concurrency Mechanisms


pipes: circular buffer that allows processes to communicate using producer-consumer model
messages: block of bytes with type, msgsnd and msgrcv

each process has a mailbox
receive by FIFO or by type
block when sending to full queue or reading from empty queue (or queue without requested type)


shared memory: common block of virtual memory

each process can have read-only or read-write permissions on block
must enforce own mutual exclusion constraints


semaphores: super-general semaphores, can increment and decrement by values greater than 1

kernel does all operations atomically
semaphores are actually sets of semaphores
single sem_op operation, takes operation for each semaphore in set, operation is just value

if positive, increment value of semaphore and awaken all procs waiting for increase
if 0: if semaphore value is 0, continue; otherwise, wait on value equaling 0
if negative and abs value lte current semaphore value, add arg to value; if result is 0, awaken procs waiting for 0
if negative and abs value gt current semaphore value, wait on value increasing


signals: informs proc of async event; like an interrupt with no priority

each signal is single bit
procs send signals to each other, or sent by kernel
procs can respond by performing default operation, invoking signal-handler function, or ignoring signal


6.8 Linux Kernel Concurrency Mechanisms


all concurrency mechanisms in UNIX plus more
also has real-time signals:

can deliver by priority
multiple signals can queue
can send value along with signal


also has atomic operations on variables, both integer and bitmap

must be implemented on architecture that implements Linux
on multiprocessor systems, variable is locked until operation completes


spinlocks are common mechanism for protecting critical system

problem is busy-waiting, but okay if resources usually freed quickly
different spinlocks depending on interrupt status:

plain: critical section not executed by interrupt handlers, or if interrupts are disabled
_irq: use if interrupts are always enabled
_irqsave: use if not known whether interrupts will be enabled
_bh: something with bottom halfs (TODO understand this maybe)


implemented differently on multiprocessor systems
reader-writer spinlock is more complicated spinlock

multiple readers can acquire lock, but only one writer
counter counts number of readers, flag indicates whether spinlock is released by all readers/writers
prioritizes readers over writers


Linux has binary and counting semaphores
also reader-writer semaphores: allow multiple readers but only single writers

counting semaphore for readers but binary semaphore (mutex) for writers


some compilers and/or hardware reorder memory accesses

algorithms prevent problems for non-concurrent systems, but concurrent systems can have problems
rmb barrier prevents reads from occurring across the barrier, wmb prevents writes, mb does both
dictate behaviour of compiler and processor, but when actually run don't do anything
variants for compiler-only barriers and barriers that apply to hardware only if on SMP system


6.9+

Not included on exam.
7 Memory Management


frame: fixed-length block of main memory
page: fixed-length block of data in secondary memory; can be copied to frame
segment: variable-length block of data in secondary memory

7.1 Memory Management Requirements


requirements: relocation, protection, sharing, logical organization, physical organization (RP PLS)
relocation: to unload and reload processes, must be able to load to different locations

branch and data references need to be translated to physical memory addresses


protection: memory must be protected from other processes; must be handled by hardware, not OS
sharing: processes need to be able to share memory
logical organization: dividing user programs/data into modules (e.g. segmentation) allows:

independent writing and compilation of modules, resolve refs at runtime
different degrees of protection for different modules
sharing of modules


physical organization: cache management should be the job of the OS, not user programmer

7.2 Memory Partitioning


virtual memory uses segmentation and/or paging, but first, talk about partitioning
internal fragmentation is unused space within partitions
external fragmentation is unused space outside of partitions
fixed partitioning partitions memory into blocks of fixed size

equal-size partitions: simple, but programs can be too big, utilization is inefficient, limited max #
variable-size partitions: can get better usage, fit large programs; need to consider assignment strategy:

assign-to-smallest-that-fits, with queue for each partition: okay, but we can use larger partition for smaller process if needed, so use only a single queue


dynamic partitioning: partitions resized to fit program exactly; no internal fragmentation, but inefficient due to external fragmentation and compaction

best-fit algorithm puts process in free space closest to needed size; generally sucks
first-fit algorithm puts process in first space large enough; generally best and fastest
next-fit algorithm puts process in next space large enough after previous allocation; similar to first fit, may require more frequent compaction
replacement algorithms also needed; see later notes


buddy system: partions of size 2^K, break up and recombine as needed

better than normal fixed and dynamic partitioning, but not as good as virtual memory; used in some parallel systems


relocation is needed for things that aren't fixed partitioning

store logical addresses in code (e.g. relative addresses, delta from known point), translate to physical address for access
check bounds during address translation (protection)


7.3 Paging


copy pages from secondary memory to frames in main memory
maintain page table for each process

logical addresses contain page number and offset
translated using page table lookup to frame number and offset
hardware must know how to do address translation with page table


when swapping in a process, load pages into whichever free frames are available, create page table

7.4 Segmentation


segments are like unequal-size pages; there is a maximum size
segmentation usually visible to programmer, unlike paging
fixed number of bits at beginning of logical address used to identify segement, like with paging

however, some logical addresses will be invalid if segment is not maximum size
segment table must store size to know whether addresses are invalid


can combine segmentation and paging (how this is done is not described in this chapter)

7A Loading and Linking

TODO
8 Virtual Memory


virtual memory: a scheme by which secondary memory can be addressed as though it were in main memory

limited by address length and secondary memory size, but not by main memory size


virtual address: logical address for virtual memory, basically
virtual address space: virtual storage space for a given process
address space: range of memory addresses available to process
real address: physical address in main memory, basically

8.1 Hardware and Control Structures


central idea of virtual memory: if memory references are all logical addresses and process is broken into pieces (pages/segments), then not all pages need to be loaded into memory at once
portion of process that is actually in main memory at one time is the resident set
attempts to access memory outside of resident set generate memory access fault; OS loads appropriate piece
more processes can be executing in main memory at once, and single process can be larger than main memory
virtual memory isn't too inefficient because of locality

if too much time is spent loading pieces, it's called thrashing and it's bad
however, locality lets OS make predictions about which pieces will be needed soon when doing replacement


page table must now indicate whether page is actually present in main memory, and whether it's been changed since being loaded
page table can also be huge because of amount of memory managed, so it usually can't fit in main memory

solution: put it in virtual memory, use a smaller "root page table" to to manage pages of user page table


inverted page table: rather than using a page table for each process, hash page number into single large page table with chaining

rows in table have page number, process identifier, control bits, and chain pointer


translation lookaside buffer: cache for address translation; avoid having to do additional main memory access to look at page table every time

lookup in TLB can be really fast because of hardware parallelization (associative mapping)
CPU cache of main memory is also in this pipeline


page size is important

large pages have more internal fragmentation, but smaller page fault rate and page table size

also more suitable for rotational secondary memory (transferring larger blocks at once)


small pages are reverse of large pages
there is an optimal middle ground for page size (measuring optimality by page fault rate)
generally between 512 bytes and 8 Kbytes; can be as large as a Gbyte


Segmentation


dynamic segment sizes make lots of things better

can handle growing data structures
programs can be altered/recompiled independently; each component requests segment of required size
sharing is better (because shared segment can be exactly right size?)
protection is more flexible (protect segments, rather than pages which are arbitrarily divided)


unique segment table for each process

rows contain present flag, change flag, other control bits
also segment length, process identifier, etc.
segment table can be big so it needs to be in main memory, like the page table


segmentation can be combined with paging

variable length segments that are broken into a number of fixed pages
virtual address has segment #, page # within that segment, and offset


protection and sharing are better with segments!

segment can be in segment table of more than one process
common scheme is ring-protection scheme: rings can access data of higher-number rings, but can only access services provided by lower-number rings


8.2 Operating System Software


OS must decide which algorithms to use; also whether to use paging, but everyone uses paging (usually in combination with segmentation)
fetch policy determines when a page should be brought into main memory

can just fetch only when page fault occurs
prepaging is when additional pages are brought in (to exploit spin disks that make it more efficient to load many pages at once)
when processes are resumed, all pages that were in main memory when it was suspended are restored


placement policy determines where in main memory to put a process piece

important in pure segementation, but if pages are involved the physical location doesn't really matter at all
some research has been done on nonuniform memory access, where futher memory takes longer to access


replacement policy determines which pages to replace when loading new pages

distinct from resident set management (later section)
some frames may be locked (e.g. kernel, key control structures) and will never be replaced
optimal policy (theoretical) has omniscience and can predict exactly which page is best to remove
LRU is LRU; expensive but almost optimal
FIFO treats page frames as circular buffer; very simple but pretty bad
clock policy is like FIFO, but you skip a frame if it has been used since the last time the clock went around; gets functionality close to LRU

can use more bits to prefer pages that haven't been modified e.g., but performance costs because might have to cycle multiple times


page buffering keeps a free page list and modified page list; pages still available for reading, but they are essentially queued for replacement when needed

pages spend some time in buffer before being removed so that if they are used again in that time, they can be saved
overhead is like FIFO, but performance closer to optimal


resident set management addresses maintainance of how many pages per process should be in main memory

too many pages are superfluous; fewer pages means that more processes can be in memory, but also increases page fault rate
fixed-allocation gives each process a fixed number of frames, determined at process creation time; may be determined by application type

replacements always occur within the frames assigned to that process


variable-allocation allows variation over time of allocated frame count; high page fault indicates that more are needed, low means that less are needed

much more overhead on part of OS


a local replacement strategy selects frames from within those assigned to process causing fault; global chooses from anywhere

fixed-allocation means local, variable-allocation usually means global, but...


variable allocation with local scope can use the working set strategy

working set is set of pages that process has referenced in past interval of time (window size)
strategy monitors working set, periodically removes pages not in working set, only executes process if working set is in memory
graph of working set size over time is periods of flat with big bumps between them (transient periods)
optimal window size is hard to determine


another strategy is the page fault frequency algorithm

on page fault, check number of page references since last page fault; if less than F, add frame; else, remove all pages that haven't been used since last time pages were removed
can be refined with two thresholds
performs poorly during transient periods


another strategy is variable-interval sampled working set

has minimum and maximum duration of sampling interval (M, L) in virtual time
if Q page faults occur between M and L, sample early, otherwise suspend and sample after L


cleaning policy is opposite of fetch policy; determines when pages are written to memory (because they are dirty)

demand cleaning writes pages to secondary memory only when they are selected for replacement
precleaning writes pages out before they need to written out; batch is more efficient
only a Sith deals in one cleaning policy exclusively

when doing precleaning, write out only pages that are replaceable under replacement policy


load control determines how many processes should reside in main memory

processor utilization with respect to multiprogramming level is graph that looks like /''..
maximum utiliization occurs (experimentally) when mean time between faults is equal to mean time spent handling faults
can monitor rate and increase or decrease this value at runtime as well
process to be replaced when swapping in can be decided by various strategies:

lowest-priority process
faulting process (whichever process just caused the fault)
last process activated (least likely to have its working set in memory)
process with smallest resident set (least effort to remove, but penalizes strong locality)
largest process (obtain most free frames)
largest remaining execution window (approximates shortest-processing-time-first scheduling)


8.3 UNIX and Solaris Memory Management


UNIX used to use variable partitioning with no virtual memory, but now UNIX and Solaris use paged virtual memory
Solaris (and SVR4) separate paging system (memory management for use procs) and kernel memory allocator (which does memory management for kernel)
paging system has very-important-to-remember data structures:

page table is per process, rows contain page frame number, age, and various control bits
disk block descriptor is associated with each page, which describes which swap device and block in that device contains the page
page frame data table contains information about each frame in real memory, used by replacement algorithm
swap-use table exists per swap device, has entry for each page on device, contains page number and reference count
uses modified clock policy with two hands; fronthand clears reference bit, backhand moves unreferenced pages to list of pages to be paged out

scanrate is rate at which hands scan through page list
handspread is gap between fronthand and backhand
hands speed up when there are fewer free pages (in order to free more)


kernel memory allocator must support dynamic allocation for multiple kernel functions

however, size of blocks in much smaller than typical machine page size, so full paging is inefficient
instead, use variant of buddy system called lazy buddy system

essentially, don't recombine blocks because block of same size might be needed soon; maintain pool of "locally free" blocks


8.4 Linux Memory Management


Linux uses a three level page table

page directory exists per active process, is the size of a single page, rows point to pages in page middle directory
page middle directory has entries pointing to pages in page table
page table has entires pointing to individual virtual pages of a process
virtual address therefore consists of four components: global directory, middle directory, page table, and offset
64-bit Alpha processor provides hardware support for three levels of paging
Linux handles two-level hardware paging by defining the size of the middle directory to be 1, with optimizations done at compile time


Linux used to use clock policy with 8-bit age variable; accesses increase age, hand sweeps decrease age

now uses "split LRU algorithm"; splits memory into two zones, maintains active and inactive list for each zone
daemon kswapd periodically performs page reclamation
first reference to page sets PG_referenced flag, second moves it to active list
PG_referenced reset if second reference doesn't happen before daemon runs
active pages require two timeouts before being moved back to inactive list


within kernel, Linux uses slab allocation to allocate smaller chunks within pages (this is complex and is not covered in text)

8.5 Windows Memory Management

Not included on exam.

32-bit processes in Windows each have a separate 32-bit address space, allowing 4 Gbytes of virtual memory (though half is used by the OS)

on 64-bit platforms, 8 Tbytes of user address space is available (Windows 7)


a guard page at each end of user virtual space is set aside to catch errors; remainder is divided into pages
pages can be available, reserved (not used but reserved for future use, e.g. growing stack), or committed (actually in use, possibly in main memory)
Windows uses variable allocation local scope

when memory is plentiful, resident sets are allowed to grow (with some beatdown on rapidly growing resident sets)
when memory is scarce, memory manager recovers memory by removing less-used pages


Windows 8 has a new virtual memory system to handle Windows Store interrupts

paging holds items that haven't been accessed in a while, whereas swapping holds items that were recently taken out of memory
essentially two levels of cache
only Store apps can use the swap space
whole application gets swapped


8.6 Android Memory Mangement

Not included on exam.

extension to normal Linux with extra features
ASHMem provides anonymous shared memory as file descriptors
Pmem allocates virtual memory so that it is physically contiguous; useful for hardware without virtual memory support
Low Memory Killer kills apps to free memory, since phones don't usually have swap capability

9 Uniprocessor Scheduling

9.1 Types of Processor Scheduling


long term scheduling: decision to add to pool of processes to be executed

decision driven by desired degree of multiprogramming
which job can be decided by FCFS, or maybe take into account priority, expected execution time, I/O reqs, etc.
tries to keep balanced usage of different resources (processor, different I/O devices)


medium term scheduling: decision to add to the number of processes that are in main memory

also manages multiprogramming degree, but more dynamically
more motivated by memory concerns though


short term scheduling: deciding which available process will be executed by processor

also known as dispatcher
invoked by timer, interrupts, system calls, signals
lots of algorithms, described in next section


9.2 Scheduling Algorithms


main objective is to allocate processor time to optimize some set of criteria

criteria can be user-oriented:

response time: time from submission of request until response begins to be received
turnaround time: time from submission of process until its completion
deadlines: number of processes that finish by their deadline
predictability: jobs are consistent when run multiple times (in terms of completion time)


or system-oriented:

efficient utilization of resources (balance)
throughput (number of processes completed per unit of time)
fairness to multiple users (no user/process gets starved)


processes can have priorities, higher priority always chosen over lower priority
not worth writing out details of all the algorithms, look at table 9.3 on page 405
FCFS selects process with longest waiting time to execute; does not preempt

process can still change if it blocks on I/O
favours processor-bound processes over I/O-bound


Round Robin uses time slicing to cycle through processes with preemption

fair, no starvation, but still favours processor-bound over I/O-bound
virtual round robin avoids unfairness with respect to processor- versus I/O-bound

when something blocks on I/O, it gets put in separate queue; prioritized for resumption, then run for its remaining time when it blocked


shortest process next uses estimated completion times and runs the shorted process next; no prempetion

completion time is estimated from past completion time
can use normal average, but exponential average (where more recent results are weighted higher) is better
risk of starving longer processes


shortest remaining time is like SPN, but can preempt when new process is scheduled

takes into account time already spent running a process when estimating remaining duration


highest response ratio next chooses process with maximum (w + s) / s

similar to SPN or SRT, but doesn't starve long processes because they'll eventually have waited long enough


feedback uses multiple queues

processes are admitted into highest priority queue
each time a process is preempted, it gets put into a lower priority queue
lowest queue is treated in round robin fashion
can avoid major turnaround time penalties by allowing longer time quantums on lower priority levels


fair-share scheduling is designed to fairly distribute resources among sets of processes (e.g. processes belonging to different users)

track CPU usage by each group of processes
use complicated formula involving process priorities, intended distribution of processor time, and CPU usage for specific process


9.3 Traditional UNIX Scheduling


traditional UNIX uses multilevel feedback within each priority level with one second preemption
has adjustable "nice" priority, but cannot adjust to be outside its "band", which is base priority assigned by process type

from highest to lowest, swapper, block I/O, file manipulation, character I/O, and user processes


suited to general purpose time sharing and provide efficient use of I/O devices

10 Multiprocessor, Multicore, and Real-Time Scheduling

10.1 Multiprocessor and Multicore Scheduling


multiprocessor systems can be:

loosely coupled/distributed: relatively autonomous systems
functionally specialized: specific processsors for specific tasks
tightly coupled multiprocessor: shared main memory, integrated control of OS


granularity is frequency of synchronization between processes

fine granularity is sub 20 instructions
very coarse is 2000-1M
independent is never, e.g. time sharing


design issues include:

assignment of procs to processors

static allocation assigns proc to processor, never changes; short-term queue maintained for each processor
dynamic load balancing prevents processors from being idle
master/slave architecture has kernel running on only one processor; master responsible for all scheduling
peer architecture allows all processors to self-schedule; OS must resolve conflicts, e.g. two procsesors trying to claim same proc


use of multiprogramming on individual processors

processes can still get blocked, should use multiprocessing for same reason it's normally used
sometimes, if many processors are available, it may be better for performance to not do multiprogramming; might want all threads of a proc to be running at once


dispatching of procs

simpler scheduling algorithms may be acceptable since processor resources more plentiful


most systems have a single queue feeding into common pool of processors
thread scheduling raises additional concerns

medium-grain to fine-grain synchronization; significant interaction


four strategies for handling multiprocessor thread scheduling

load sharing: global queue of ready threads is maintained, not grouped by process

distinct from load balancing, which is more long-term
work distributed evenly across processors
no centralized scheduler required
global queue can be organized using priority etc., as in chapter 9
FCFS actually does best on multiprocessing systems
load sharing has disadvantage of central queue under mutex, worse caching (since procs switch processors), all threads of proc rarely run at same time


gang scheduling: set of related threads scheduled to run on set of processors all at once

designed to handle threads requiring finer-grained synchronization
can weight run time for each proc by number of threads to minimize waste


dedicated processor assignment: assign all proc threads to processors, run until complete

can be viewed as extreme form of gang scheduling
good for highly parallel system, processor utilization isn't as important as normal
completely avoids process switching
too many threads can be a problem, causes more preemption; limit number of active threads to number of processors
process allocation more closely resembles memory allocation; can adapt ideas

activity working set, which is minimum number of activities that must be thread before application is allowed to run
process fragmentation is when there are unused processors left over, but not enough to do useful work


dynamic scheduling: number of threads in process can change during execution

OS is responsible for partitioning proessors among jobs; jobs uses processors currently assigned to execute subset of its runnable tasks/threads
when jobs request processors, use idle proessors to satisfy request

if none available, allocate single processor if the job doesn't already have one
otherwise, wait until processors become available
when another job releases processor, assign processors to jobs that are waiting

one loop of round robin, then FCFS (i.e. dump remaining to first job in list)


on multicore systems, reducing off-chip memory access becomes more important than proecssor utilization

adjacent cores may share some caches, additional factor to consider
beyond scope of text


10.2 Real-Time Scheduling


correctness depends not only on logical result, but also time at which results are produced

hard real-time task must meet its deadline; unacceptable damage or fatal error if fail
soft real-time task has associated deadline that is desireable but not mandatory
aperiodic task can have deadline associated with start and/or finish time
periodic task must be run "once per period T" or "exactly T units apart"


real-time systems have five unique requirements:

determinism: operations must be performed at fixed, predetermined times

good measure is maximum delay from arrival of high-priority interrupt until servicing begins


responsiveness: measures how long it takes to service interrupt (not start service)
user control: user must have fine-grained control over task priority

distinguish between hard and soft tasks, deadlines, etc.
what algorithms to use for paging, swapping, keeping procs in main memory, etc.


reliability: transient failures far more costly in real-time, must avoid
fail-soft operation: ability to fail in a way that preserves capability and data

a system is stable if, when it is impossible to meet all deadlines, it can still meet deadlines of most critical tasks


short term scheduler is key

enforce stricter priorities
minimize interrupt latency
provide more precise timing characteristics
clock-based prempetion to ensure long-running low-priority tasks don't run during critical times


real-time scheduling can be classified based on...

whether it does schedulability analysis
whether that analysis is static or dynamic
whether the result of analysis produces a schedule or a plan


classes of algorithms are therefore:

static table-driven: static analysis of feasible schedules, creates a schedule that determines when tasks must begine executing

good for periodic tasks/schedules
earliest-deadline-first are usually this category


static priority-driven preemptive: as above, but not schedule created; instead, priorities are assigned

example is rate monotonic algorithm (see later)


dynamic planning-based: feasibility determine at run time, arriving task accepted only if feasible; creates schedule or plan

attempt to create new schedule when new task arrives, reject if not possible


dynamic best effort: no feasibility analysis, tries to meet all deadlines and aborts processes that miss deadline

assign priority dynamically based on characteristics of tasks
used when tasks are aperiodic, no scheduling possible
can't know in advance whether deadlines will be met


deadline scheduling requires lots of information about processes

ready time: when it is available for execution
starting deadline: when it must start by
completion deadline: when it must finish by
processing time: time required to execute to completion
resource requirements: that
priority: relative importance of task (hard vs soft)
subtask structure: task may be decomposed into hard and soft subtask(s)


scheduling task with earliest deadline minimizes number of missed deadlines
when starting deadline is specified, nonpreemptive scheduling makes sense; ending deadline => preemptive

with ending deadlines, earliest-deadline-first makes most sense, implement via preemption
with starting deadlines, best strategy is to schedule the earliest deadline to run, regardless of whether it's ready or not; once it has started, let it run to completion; no preemption needed


rate monotonic scheduling is used for periodic tasks; highest priority goes to lowest period task

for a given process, C/T is computation time over period, i.e. fraction of usage required
RMS requires that sum of fractions is less than n*(2^(1/n) - 1) (which converges to ln(2)) to guarantee that all tasks will be scheduled successfully

however, earliest-deadline scheduling only requires that sum of fractions is less than 1, so it's better


RMS adopted in practice because ln(2) is only conservative bound, can actually get 90% utiliziation
soft real-time components can absorve processor time not used by RMS
stability easier to achieve with RMS because it involves priorities and priorities are good for stability


priority inversion is when a high-priority task is forced to wait on a low-priority task

e.g. low-priority task has locked a resource
can cause problems if low-priority task never runs because medium-priority task is not blocked
solution is priority inheritance
can also use priority ceiling; priority of resource is one plus highest priority user, and any proc that locks the resource gets that priority


10.3 Linux Scheduling


three queues: SCHED_FIFO, SCHED_RR, and SCHED_OTHER; first two are real-time

non-real-time procs cannot execute if real-time proc is ready
FIFO and RR are the same, but round robin does timeslicing within the same priority level


SCHED_OTHER scheduler did not scale very well; used single runqueue for all processors with single lock, no preemption
new scheduler for non-real-time tasks is O(1) scheduler

maintains number of tasks, priority bitmap (which queues are nonempty), and [active/expired] priority queues for each processor
schedule by selecting highest nonempty priority, scheduling round-robin within that priority

if a process is interrupted during timeslice, it is put back in active queue; procs move to expired queue after complete timeslice
when active queue is empty for a priority, swap active and expired queues


time slices are between 10-200ms


initial priorities assigned by user, dynamically adjusted over time

favour I/O bound tasks because this produces good interactive response


real time scheduler is a bit different

real-time tasks do not have dynamic priorities
SCHED_FIFO tasks do not have assigned timeslices
SCHED_RR tasks are never moved to expired queue list; just returned to same priority queue


10.4 UNIX SVR4 Scheduling


160 priority levels, top 60 for real-time (ensuring they run first), next 40 for kernel, last 60 for user (time-shared)

dispatch queue associated with each priority, execute round-robin
has bitmap for nonempty queues, just like Linux; it's called dqactmap


insertion of preemption points, safe place for preemption to occur

when preemption point is reached, check kprunrun flag, preempt if higher-priority real-time ready process


dynamic priority reassignment within time-shared class
time quantum for time-shared varies based on priority, longest for lowest-priority
real-time procs have fixed priority

10.5 UNIX FreeBSD Scheduling


256 priority levels into five priority classes:

top 64 for bottom-half kernel, scheduled by interrupts
next 64 for top-half kernel, which run until blocked or done
next 32 for real-time user; run until blocked or preempted by higher-priority thread
next 64 for time-sharing user; adjusts priorities based on processor usage
last 32 for idle user, run only when no time-sharing or real-time threads to run


latest version of FreeBSD scheduler addresses SMP/multicore support

supoorts processor affinity (only migrate thread when necessary)
better support for multithreading on multicore
improve performance of scheduling, no longer function of number of threads


queue structure split up; three queues for each processor

each queue has 32 priority classes, internal queue for each
two queues used for kernel, real-time, time-sharing; last queue just used for idle class
the two queues are designed current and next

exactly like active and expired for Linux scheduling


kernel and real-time threads always added to current queue
time-sharing thread assigned to current queue iff interactive


interactivity scoring determines whether a thread is interactive

ratio of run time versus voluntary sleep time is below a certain threshold => interactive
scaling factor of max interactivity score / 2 is used to normalize results
score is actually scaling factor times (run < sleep ? run / sleep : 1 + sleep / run)


thread migration i.e. processor affinity

take advantage of local caches
don't move threads by default
pull mechanism allows an idle processor to pull a thread from a nonidle processor
push mechanism is a periodic task; twice per second, equalizes most-loaded and least-loaded processors in system


10.6 Windows Scheduling

Not included on exam.
11 I/O Management and Disk Scheduling

11.1 I/O Devices


group into human readable (printers, video display, keyboard, mouse), machine readable (disk drives, USB keys, sensors, controllers), and communication (digital line drivers, modems)
can also classify by properties:

data rate
application: what device is being used for influences software choices
complexity of control: complexity of required interface (simple for printer, complex for disk)
unit of transfer: byte stream or larger blocks
data representation: encoding schemes; character code, parity
error conditions: types of errors, correction, reporting


11.2 Organization of the I/O Function


I/O can be performed in three different ways

programmed I/O: process busy waits for I/O operation to complete
interrupt-driver I/O: processor issues I/O command for process; can either block process until operation completes, or all it to continue running and interrupt when operation returns
direct memory access: separate DMA module controls exchange of data, interrupts only after entire block has been transferred


evolved from direct control, to programmed I/O, to DMA, to separate I/O processor, to separate I/O computer
I/O channel is another term for I/O module
to use DMA, it is given read/write bit, address of I/O device, starting location in memory, and number of words to read/write

DMA transfers entire block and sends interrupt
processor then handles data result


various DMA architecture supports

can be directly on main bus, separate from I/O devices
can be integrated into I/O devices
can act as interface to separate I/O bus; can only access I/O through DMA


11.3 Operating Sytsem Design Issues


efficiency very important since I/O operations are often bottleneck
generality is other important consideration; want to handle all devices in a uniform manner
divide operation into uniform logical structure:

logical I/O: deal with device as logical resource, not concerned with device details

manage general I/O functions; read, write, open, close


device I/O: convert requested operations into I/O instructions, channel commands

also deals with buffering


scheduling and control: active queueing/scheduling, handling interrupts, actually interacts with hardware


for file system, there are additional layers:

directory management: converts symbolic file names into references to files themselves
file system: deals with logical structure of of files; open, close, read, write
physical organization: convert logical references to files into secondary storage addresses, accounting for track and sector structure


11.4 I/O Buffering


when transferring a block to main memory, the page containing target addresses must be locked into main memory, which interferes with swapping considerations
same thing happens with output
to avoid problems, perform input in advance of requests and output some time after requests; known as buffering
simplest buffer is single buffer

buffer in main memory is used for I/O operations
when transfer is complete, block is moved to user space and next block of memory is requested

usually next block will be needed


process can be swapped out while buffering is happening because buffering happens in system memory, not user memory

have to be careful; if swapping process out requires use of same I/O device that is currently buffering, cannot swap out until operation completes, when it may no longer be appropriate


double buffer has two buffers; load into one, swap pointers rather than transfer to user space; results in speedup
circular buffer aims to smooth out flow of bursty I/O device; list of buffers treated like circular queue, essentially just double buffer with more buffers
buffering in general smooths out peaks in demand, but does not allow I/O to keep pace with processor in long term

11.5 Disk Scheduling


processing is orders of magnitude faster than disk speed; makes sense to use some computation to optimize disk access
three main delays associated with disk transfer:

seek time: time to move head into correct position (select track)

mainly optimized by reducing disk size (tracks closer together, less physical distance to move)
typical modern seek time is under 10ms


rotational delay: time for disk to rotate so that head reaches beginning of sector

sum of seek and rotational is access time
average rotational delay is 1/2r, where r is revs per second


transfer time: time required to read the sector as it moves under head

T = b / (rN), where T is transfer time, b is bytes to transfer, N is byte on track, and r is revolutions per second
Ta = Ts + 1/2r + b/rN gives total access time


there are also the queueing delays

need to wait for device to become available
need to wait for channel to device to become available, if I/O channel is shared


some systems do seek, and then estimate when rotation will complete and suspend and resume communication accordingly; sometimes ends up missing the sector, called RPS miss
huge delays are incurred if file is very fragmented and cannot be read in continuous transfer
can improve disk efficiency by not doing accesses in random order

FIFO is essentially random
priority doesn't optimize disk access, but usually requests from high priority will be from same proc, which will be close together
LIFO actually does pretty well, since last request probably requires least movement from previously finished request; can still be pretty random
if track position is known, can use shortest service time first; choose request closest to current head position
SCAN allows arm to move in one direction only, satisfying all requests as it moves

reverse direction at end (either at last track, or with LOOK, when no more requests in that direction)
prevents starvation that can occur in SSTF


C-SCAN is like SCAN, but moves to beginning after last track instead of reversing

reduces maximum delay experienced by new requests


N-step-SCAN and FSCAN prevnent "arm stickiness"

N-step-SCAN processes requests in batches of N; for large N, equivalent to SCAN basically
FSCAN uses two subqueues; during scan of one queue, new requests added to other queue; process all old requests before new requests


11.6 RAID


seven levels of RAID, but all share:

set of physical disks viewed as single logical drive
data distributed across disks
redundant disk capacity is used to store parity information, can recover data if a disk fails

RAID 0 and 1 don't support this


RAID 0 has no redundancy, but data is distributed using striping (consecutive blocks on consecutive disks)

supports high data transfer capacity and high I/O request rate


RAID 1 just mirrors data using 2N disks

read request handled by either copy, write requests done in parallel


RAID 2 and 3 read/write all disks simultaneously at same position

RAID 2 uses Hamming code for error correction; not usually needed
RAID 3 uses single extra disk with parity bit, can restore data if any individual drive fails


RAID 4 does parity like RAID 3, but on a block level
RAID 5 does distributed parity, so the parity blocks are distributed throughout the drives
RAID 6 does dual redundancy, so their are two parity blocks for each level

up to two disks can fail while still allowing full recovery of data
30% drop in overall write performance, but read comparable to RAID 5


11.7 Disk Cache


can cache disk sectors in main memory
when cached block is requested, can do transfer from within main memory, or just pass back pointer to shared memory location
replacement policy must also be considered; LRU is best, LFU may be more appropriate
frequency-based replacement is also a thing

blocks are organized into stack, as in LRU; top part of stack is designated as new section
rereferencing in new section does not increase access count
replacement is done by selecting lowest count in old section
can refine by having three sections instead of two; reference counts incremented on both middle and old, but only replace from old


11.8 UNIX SVR4 I/O


I/O devices associated with a special file managed by file system, read and written as files
buffer cache is essentially disk cache

block transfers handled through buffer cache
maintain free list of slots in cache, device list (list of buffers associated with each disk), and driver I/O queue (lis tof buffer that are waiting for I/O on particular device)
index into buffer cache using free list or hash table
block replacement done using LRU; free list preserves LRU order


separate buffering (character queue) used for character-oriented devices; producer/consumer model is used; reading character destroys it
unbuffered I/O is done using DMA is used to maximize speed
category of device (disk drive, tape drive, terminal, communication line, printer) determines which strategy is used

11.9 Linux I/O


similar to UNIX
disk scheduler is called Linux Elevator

essentially just LOOK, which is SCAN with reversing when no more records in current direction

if new request is on same sector or immediately adjacent to existing request, merge requests
if requests currently in request queue are old, insert at tail of queue
if suitable location, insert in sorted order
otherwise, tail of queue


enhanced by deadline scheduler:

read requests are more urgent than write requests because they block
for each request, there is a expiration time, which is shorter for reads
normally requests are taken from sorted queue, but taken from FIFO queue when head of queue is expired (along with next few requests)
solves starvation problem, also read vs write problem


enhanced by anticipatory I/O scheduler

successive reads are likely, so delay slightly after serving a read to see if next read request comes in


maintains page cache for particular system files and virtual memory, separate buffer cache for block I/O

page cache has benefits of allowing writing in bulk, and caches things because caches are good

do we have to repeat these benefits every time we talk about any sort of cache?


dirty pages written out when free memory falls below threshold, or when the dirty pages get older than a theshold


11.10 Windows I/O

Not included on exam.

four I/O-related kernel components:

cache manager: file caching for all file systems

dynamically adjusts size of cache for particular file based on physical memory size
lazy writer periodically writes things to disk, which allows it to be more efficient because batch writing is efficient

like seriously, every time we talk about caches


file system drivers: file system driver treated like any other driver, route file requests through this system, which sends to hardware device adapter
network drivers: integrated networking capabilities for accessing remote file systems

implemented as software drivers rather than Windows Executive


hardware device drivers: software drivers that access hardware registers of peripheral devices


Windows I/O can be asychronous or synchronous

async is more efficient for caller, but needs to be signalled
signal using event on file object: proc continues and stops once it actually needs data from I/O operation, then checks file object
signal event object: allows multiple simultaneous I/O requests against single device/file; event object created for each request, can wait on some subset of requests
async procedure call: requests are made with user-mode routines to call when I/O completes
I/O completion ports: used on server to optimize thread use; thread waits on completion port, kernel waits thread to handle I/O completion
polling: async I/O requests write status and transfer count into memory, proc just checks these values


Windows also has software report for RAID

hardware RAID is what was discussed earlier; software RAID is non-contiguous disk space combined into logical partition(s)
implements RAID 1 and RAID 5


volume shadow copies are efficient ways of making snapshots of volumes for bakcup; implemented by software driver that makes copies before data is overwritten
uses BitLocker to encrypt entire partition, supports multiple methods of supplying cryptographic key

12 File Management


input and output almost always done to/from files (except for real-time)
files have life outside individual applications, need dedicated management system

12.1 Overview


files stored on disk
files sharable between processes, access permissions permitting
operations are: create, delete, open, close, read, write
attributes maintained with metadata about persmissions, owner, creation time, access time, etc.
file structure can be organized into hierarchy

field is basic element of data
record is collection of related fields
file is collection of similar records
database is collection of related data, probably in multiple files/tables

database should support sensible operations


sometimes files are just bytes
objectives for file management system

meet data management requirements of user; storage and operations
guarantee that data in file is valid
optimize performance
provide I/O support for various storage devices
minimize or eliminate potential for lost data
provide standardized set of I/O interface routines to user procs
provide I/O support for multiple users at once


reiterate layers from previous chapter, but it's changed a bit

device driver: interacts with device directly
basic file system: deals with blocks of data exchanged with driver; concerned with placement
basic I/O supervisor: responsible for I/O init and termination

selects device, schedules disk accesses
is part of OS


logical I/O: enables user access to records; general purpose record I/O capabilities
access method is main difference between systems; see 12.2


12.2 File Organization and Access


file organization is logical structuring of records; influenced by blocking strategy, allocation strategy
important criteria:

short access time
ease of update
economy of storage
simple maintenance
reliability


five organizations: pile, sequential, indexed sequential, indexed, direct (hashed)
pile file: variable-length records, variable set of fields in each record

have to search through entire file to find desired record
uses space well though


sequential file: fixed-length fields in fixed-length records

only field values need to be stored
key field uniquely identifies record; records sorted by key so you can binary search


indexed sequential file: add index and overflow file

index allows searching of key without so many cache misses etc. because index is smaller than entire file
when record is inserted, inserted into overflow file, and pointer of preceding record already in file

periodically merge overflow back into main file


indexed file: employ multiple indexes to search on different keys

don't bother sorting records, only access via index
can also have variable-length records


direct (hashed) file: hash key and go directly to storage location

space penalties because of empty space


12.3 B-Trees

Not included on exam.
12.4 File Directories


file directory contains information about files including attributes, location, ownership, etc.
directory itself is a file, though generally access indirectly via system routes
must support search, create/delete file, list directory, update directory (update file attributes)
simple list not well suited; unique naming problems, size problems, also want to conceal parts from users that may not have access
two level scheme has master directory and subdirectory for each user
more flexible approach is to allow nesting of subdirectories, which is what we would expect as people who have seen file systems before
symbolic name consists of directory names and file name; this is a pathname
working directory means you don't need to retype full path each time

12.5 File Sharing


access rights to a file can be:

none: none
knowledge: know that the file exists
execution: can execute file but not copy it (proprietary program)
reading: can read file, copy; also allows execution since you could execute your own copy
appending: can append, but can't modify or delete content
updating: all writing
changing protection: change access rights for other users
deletion: deletion


rights can be granted to specific users, groups, or everyone
can handle simultaneous access by locking whole file, or just specific records during an update

12.6 Record Blocking


records are unit of structured file, blocks are unit of I/O with secondary storage
larger blocks means more records can be transferred at once, but this is less efficient if records are accessed randomly because increased overhead
three methods of blocking:

fixed blocking: fixed-length records, fixed integral number stored in each block

may be unused space at end of block


variable-length spanned blocking: variable-length records, records can span blocks

no unused space


variable-length unspanned blocking: variable length records, records cannot span blocks

usually some unused space because not every gap can be filled


always some unused space because of how blocks fit into tracks and because of hardware-enforced gaps


12.7 Secondary Storage Management


allocation of blocks to files on physical disk is kind of like allocating blocks of memory in main memory
files can grow over time; a portion of a file is a contiguously allocated block
need data structure to keep track of portions assigned to file; example is FAT
preallocation policy requires that maximum file size be declared at creation; dynamic allocation allocates portions as needed, which is more common because it's more flexible
portion size is important consideration:

contiguity of space is good for performance; larger portions do that
too many portions increases size of management tables
fixed-size portions simplifies reallocation
variable or small-fixed size portions minimizes wasted space
as a result, choose variable and large contiguous portions, or small fixed size portions (blocks)
with variable size, can do first-fit, best-fit, nearest-fit (nearest to other parts of same file), mostly like chapter 7


three methods for actual file allocation methods

contiguous allocation finds contiguous set of blocks; requires preallocation

easy to tell where desired block of file will be
external fragmentation occurs


chained allocation stores each file as a linked list

have to iterate through file blocks to find required block; not necessarily big loss if files accessed sequentially anyway
easy to add blocks as needed (dynamic allocation)


indexed allocation uses an index

index kept in separate block, points to all blocks in file
most popular because it's literally just the best


free space is managed by disk allocation table

can be bitmap, which is simplest and least overhead; also makes it easy to find contiguous blocks
bitmap can be too large to search through though; instead chain together free blocks

often chain of free blocks is very fragmented, so new allocations will be fragmented too


indexing treats free space as a file and uses index table

index table should be based on variable-size free portions so it doesn't have to index every block


free block list assigns number to each block and maintains list of numbers

this can get really big, too large to store in main memory; treat as stack and keep only top of stack in memory
could also treat as queue, keeping only head of queue in main memory


volume is logical disk

often, one volume is one disk, but sometimes divide disk into partitions that each appear as a volume
can also treat multiple disks as a single volume


stuff can get corrupted if system crashes; have to be careful about write order to prevent problems

12.8 UNIX File Management


six types of files on UNIX:

regular/ordinary: arbitrary data in zero or more data blocks
directory: list of file names plus pointers to inodes
special: no data, provide mechanism to map to physical devices
named pipes: interprocess communications facility
links: alternative name for existing file
symbolic links: data file that contains name of linked file


all files administered by inodes (index nodes), which contain key information about file:

type and access mode
file owner and group access identifiers
creation time, read time, write time, inode update time
size of file in bytes
sequence of block pointers
number of physical blocks
number of directory entries that reference file
kernel and user-settable flags
generation number of file (random number assigned to inode; used to detect deleted files)
blocksize (sometimes larger than file system blocksize)
plus additional attribute entries


there is inode table/list on disk that contains inodes of all files in file system
file allocation is multi-level; 12 direct data blocks, then single-, double-, triple- indirect layers which work as expected

keeps inode fixed size
small files accessed with little indirection
very large maximum file size


directories are normal directories, references to contained inodes
UNIX file system has single partition with boot block, superblock (info about file system), inode table, and data blocks

12.9 Linux Virtual File System

Not included on exam.

Linux has virtual file system, which present single uniform interface to whatever file system is used (so you can swap ext4 out for, like, eCryptfs-GCM)
files have symbolic names, owner, protection, and other properties
need to map logical operations to operation specific to the OS
make system call for file operation, which is passed to mapping function for file system

usually mapping function is pretty simple (1:1), but can be more complex
for example, FAT directories are not files, so directory access requests have to be translated


VFS is object-oriented but written in C, so objects are just structs

superblock object: mounted file system

has device, block size, dirty flag, file system type, flags, pointer to root directory


inode object: specific file

holds all the information about named file except for name and actual contents
also has inode operations object which describes available operations


dentry object: directory entry, either directory or file name

pointer to inode and superblock, as well as partent and child dentrys


file object: open file associated with process

associated dentry object, file system, usage counter, user id, group id, file pointer
also includes inode operations object


three caches are used to improve performance:

inode cache: stores recently visited inodes
directory cache: caches mapping between directory names and inode numbers; speeds up ls
buffer cache: independent of file systems, integrated into data buffer mechanisms

basically cache closer to drive level


12.10 Windows File System

Not included on exam.
12.11 Android FIle Management

Not included on exam.
15 Operating System Security

15.1 Intruders and Malicious Software


processes have associated privileges regarding memory, files, instructions
root access is highest level of privilege
intruders can be:

masquerader: not authorized to use system, but penetrates access controls to exploit legitimate user
misfeasor: legitimate user who accesses unauthorized materials, or who misuses authorization
clandestine user: seizes control of system and uses control to evade auditing and access controls, supress audit collection
can exploit software bugs, or acquire protected information e.g. passwords


malicious software is programs that exploit vulnerabilities

parasitic malware is a fragment of some legit program; virus, logic bomb, backdoor
malware can also be independent, scheduled to run by OS; worm, bot program


malware can be activated by trigger (logic bomb, backdoor, bot program), others replicates itself to be activated later (virus, worm)
intrustion detection is a countermeasure

can be host-based IDS (monitor host and events on host for suspicious activity) or network-based (monitor network traffic)
three logical components:

sensor: collects data (from network packets, log files, system call traces)
analyzer: analyzes data from sensors, determines if intrusion occurred
user interface: view output from system, configure behaviour


authentication is another countermeasure

consists of identification step and verification step
for ways to verify user's identify:

something they know (password, PIN, answers to questions)
something they possess (keycard, smart cards, keys, all referred to as tokens)
something they are (static biometrics: fingerprint, retina, face)
something they do (dynamic biometrics: voice pattern, handwriting, typing rhythm)


can do two-factor authentication


access control is another countermeasure

mediate between user and resources, determine what they can and cannot access
must authenticate first
can audit to see who accessed what, who is authorized to access what


firewalls are another countermeasure

act as chokepoint between system and outside connection
enforces local security policy, defines what types of traffic can pass
firewall itself is secure against attacks


15.2 Buffer Overflow


memory protection protects a process' memory from other processes
buffer overflow is writing more to a buffer than there is space, which can overwrite other memory locations
can corrupt data, unexpected transfer of control, memory access violations, and probably termination
can even execute arbitrary code
stack overflow is most common type of buffer overflow:

gets() does not include checking on amount of data copied; can overwrite other memory locations beyond target


to exploit buffer overflow, need to:

identify buffer overflow vulnerability that can be triggered by external input
understand how buffer is stored in memory, potential adjacent memory that can be corrupted


fuzzing can be used to identify vulnerabilities
compilers can be used to defend against buffer overflow

prevent vulnerabilities at compile time
modern high-level languages with types and checks on permissible operations are immune to buffer overflow
safe coding techniques (manually identify potential vulnerabilities) can also help
language extensions can perform additional checks on operations

Libsafe is dynamic library that can load before other code and prevent buffer overflows without even recompiling program


stack protection mechanisms check stack frame for evidence of corruption

Stackguard is GCC compiler extension; adds (random) canary value below old frame pointer; if canary is overwritten, program is aborted


runtime defenses aim to prevent buffer overflows at runtime

executable address space protection tags some pages as non-executable (hardware)

if stack and heap are non-executable, hard to execute custom code
however, some languages need executable code on the stack (e.g. nested functions in C, Linux signal handlers)


address space randomization randomly moves key data structures

if stack is moved around, can't predict target buffer's address


guard pages are pages between critical sections of memory

if anything attempts to write to guard page, illegal address exception is raised, program aborts
can also put guard pages between stack frames, or between allocations on heap
can be costly in terms of memory


15.3 Access Control


easy to allow/disallow access to entire files based on user profile
might want to apply permissions on level of record, or even part of record
access matrix    is model of permissions

subjects on left, objects on top, access rights in cells
access rights can be read, write, execute, plus functions in software objects
matrix usually sparse, decompose by columns and give users access control lists
can decompose by rows instead to get capabilitiy tickets, which lists authorized users and actions for specific resource
have to make sure that access matrix info is not accessible to users


three access control policies:

discretionary (DAC): check identify of user and access rules for resource

basically just access matrix as above
objects can be processes, devices, memory locations/regions, or subjects (subjects are users)
controller for object checks whether operation is allowed for requested on that object
access matrix controller controls access to the access matrix
copy flag on permission allows transfer of right (with or without copy flag) to another user
owner can grant any right to any user
controller of subject or owner of resource can revoke a right, read rights of subject
creating an object requires no rights, creator becomes owner
destroying an object requires owner
subjects can be created and destroyed like objects


mandatory (MAC): check security labels (how sensitive/critical resource is) with security clearances

designed for military systems, not covered by book


role-based (RBAC): check role of user and access rules for role

users can have multiple roles
one matrix for mapping users to roles, and access control matrix with roles as subjects


15.4 UNIX Access Control


users have unique user ID and can be a member of primary group and possibly other groups with group IDs
files are owned by user and belong to a group
each file has 12 access bits

(read/write/execute) for (user/group/other)
SetUID and SetGID: when file is executed, it executes with permissions of user/group
on directory, SetGID indicates that newly created files should inherit group of directory
Sticky bit originally indicated that file contents should be retained in memory following execution; no longer used
Sticky bit on directory indicates that only owner of file in directory can rename, move, delete that file

useful for shared temp directories


one user ID is the superuser who can do anything

be careful when setting SetUID on file owned by superuser


UNIX also has access control lists

assign list of user and group IDs to file, each with three protection bits
9-bit permission field has same meaning as normal
group class entry acts as mask for all other permissions granted by list


15.5 Operating Systems Hardening


install and patch OS

unpatched system is vulnerable, don't connect to network while patching
initial installation should be minimal
boot process must be secure, limit which media system can boot from, reqiure password for changes to BIOS

cryptographic file system helps secure system against malicious boot process


driver installation very critical, since those execute in kernel mode
automatically install updates and patches

on critical systems, don't do it automatically, stage and validate patches on test systems


remove unnecessary services, apps, protocols

fewer packages means fewer potential vulnerabilities
default configuration usually optimzed for ease of use, not security, don't use default
uninstall scripts may fail to remove all components, so don't install in first place


configure users, groups, permissions

restrict elevated privileges to only those who need them
only access elevated privileges when needed (i.e. don't log in as root, use sudo)
decide on authentication system (local vs central)
change default passwords, remove or secure default accounts
allowable length, complexity, age of passwords can be set


configure resource controls

set permissions on data and resources
yeah that's pretty much it


install security controlls (antivirus, firewalls, IDS)

whitelist for applications can prevent malware, but not always possible to implement depending on system use


test security

follow checklist in security-hardening guide (apparently this is not a security-hardening guide, it's just a guide to security-hardening guides)


15.6 Security Maintenance


monitor and analyze logs

informs you of bad things that have already happened
ensure sufficient space is allocated for logs
need automated analysis of logs


perform regular backups

may be legal requirements for backups
backup is process of making copies of data, archive is process of retaining copies of data for long periods of time
decide whether online or offline, local or remote

if attackers compromise/destroy data, don't want them to also compromise/destroy backups


recover from security compromises
regularly test system security
patch and update critical software, monitor and revise configuration

15.7 Windows Security

Not included on exam.