Skip to content

Instantly share code, notes, and snippets.

@czinn
Created April 8, 2017 23:21
Show Gist options
  • Save czinn/cf0bac25d58725abe92d965b7e0fb407 to your computer and use it in GitHub Desktop.
Save czinn/cf0bac25d58725abe92d965b7e0fb407 to your computer and use it in GitHub Desktop.

1 Computer System Overview

1.1 Basic Elements

  • processor: controls operation of computer, data processing functions
  • main memory: stores data and programs, typically volatile
  • I/O modules: move data between computer and external environment, e.g. disks, internet, terminals
  • system bus: communciation between three systems above

1.2 Evolution of the Microprocessor

  • microprocessors used to be slower than multi-chip processors, but everything changed when the fire nation attacked they got really small and physics got involved
  • a single chip can now contain multiple processors
  • GPUs are more efficient for calculations on arrays of data, some modern processors have vector operations too
  • Digital Signal Processors handle streaming signals specifically
  • other specialized processing units (encryption, audio/video codec stuff, etc.) also exist
  • mobile devices have System on a Chip (SoC), where main memory, I/O devices, radios, etc. are all on same chip as main processor

1.3 Instruction Execution

  • fetch intstruction, execute instruction, repeat
  • instructions can be in four categories:
    • processor-memory: transfer data between processor and memory
    • processor-I/O: transfer data between processor and I/O
    • data processing: arithmetic or logic op
    • control: branches
  • combinations also possible
  • it's like ECE 222 but less detailed

1.4 Interrupts

  • interrupts improve processor utilization by interrupting normal sequencing
  • I/O devices very slow, so run other processes (or same process, if nonblocking) in meantime and wait for interrupt
  • when starting instruction, processor checks for interrupt requests; if any, branches to handler instead of next instruction
  • after interrupt, previously running process is resumed
  • nine steps, five in hardware, four in software, for handling interrupts:
    • device issues interrupt signal
    • finish previous instruction
    • test for interrupt requests, sends ack signal
    • save context (PSW, PC)
    • load PC with entry location of interrupt handler
    • save additional state in software (e.g. registers, stack pointer, etc.)
    • process interrupt (examine status info, send commands, acks, etc.)
    • restore saved state
    • restore PSW, PC
  • interrupts often disabled during interrupt handling, though can have multiple priority levels for handlers

1.5 The Memory Hierarchy

  • access time is inversely proportional to cost, so can't make entire memory out of fast memory
  • put most frequently accessed data in more expensive but faster memory
    • this is possible because of locality of reference; data accessed recently will often be accessed again soon
    • use various swapping strategies to move things between layers of cache based on recent references
  • levels are usually processor registers, then cache (often in processor as well), then main memory (volatile), then secondary memory (disks, SSD)

1.6 Cache Memory

  • every processor cycle requires at least one main memory access to fetch instruction, so cache very important to prevent slowdown
  • cache maintains copy of fixed length blocks from main memory
  • blocks have tags to indicate which main-memory block it corresponds to
  • look for word in cache before resorting to main memory lookup
  • need mapping function, replacement algorithm, and write policy for cache; more details in memory chapter
    • LRU is best
  • multiple levels of cache (L1, L2, often L3) can be used

1.7 Direct Memory Access

  • I/O can be:
    • programmed, i.e. polling
    • interrupt-driven, which doesn't block the processor while doing I/O
    • DMA!
  • DMA module is delegated task of transfer of complete block to/from memory or I/O device
  • far more efficient for multiple-word transfers

1.8 Multiprocessor and Multicore Organization

  • symmetric multiprocessors (SMP) defined as:
    • two or more similar processors
    • share main memory and I/O facilities, interconnected by e.g. bus
    • share access to I/O devices
    • processors can perform same functions
    • system controlled by integrated OS
  • get better performance, availability, growth, scaling
  • processors can have their own caches
  • multicore aka chip multiprocessor has multiple cores on single chip
    • basically SMP but on single chip

1A Performance Characteristics of Two-Level Memories

  • locality happens because:
    • besides for branches, execution is sequential
    • usually some time is spent localized in single process, not always jumping around
    • loops involve small numbers of instructions repeated a lot
    • array elements are usually referenced in sequence
  • spatial locality is tendency to access memory locations that are clustered
  • temporal locality is tendency to access same memory location again soon
  • can calculate average access time using times for each memory and hit ratio
  • average cost per bit depends on costs per bit in each memory and sizes of memory
  • access efficiency is access time for M1 over average access time; function of hit ratio
  • hit ratio is much higher than S1/S2 when there is lots of locality

2 Operating System Overview

2.1 Operating System Objectives and Functions

  • objectives of OS are (ACE):
    • convenience
    • efficiency
    • ability to evolve
  • convenience is enhanced by following services:
    • libraries for I/O, file management, etc.
    • utilities for launching and managing tasks
    • handling I/O devices
    • controlling file access, system access
    • error detection
    • accounting
  • key interfaces in system are:
    • ISA: machine language instructions
    • ABI: standard for binary portability across platforms; system call interface
    • API: program access to hardware resources and services; high-level language library calls
  • efficiency is enhanced by managing resources
    • OS kernel does scheduling, allocation, etc. to efficiently manage resources
  • ease of evolution enhanced by allowing:
    • hardware upgrades via drivers
    • new services via OS updates
    • bug fixes via OS updates

2.2 The Evolution of Operating Systems

  • serial processing: manually run batch jobs, one after another
  • simple batch systems: monitor automatically runs next job after job completes
    • can do compilation of user programs by loading compiler
    • can also support memory protection, timer, privileged instructions, interrupts
    • programs execute in user mode, monitor executes in kernel mode
  • multiprogrammed batch system: interleave processes for greater efficiency
    • becuase I/O takes a lot of time
    • requires more advanced memory management and I/O management
  • time-sharing systems: handle multiple interactive jobs
    • minimize response time and accept commands from terminal
    • memory protection requirements

2.3 Major Achievements

  • process:
    • job in execution, instance of program running, entity that can be executed on processor
    • multiprogramming requires ability to switch between processes
    • time-sharing also requires this
    • real-time systems also require this
    • lots of problems result from multiple processes:
      • improper synchronization
      • failed mutual exclusion
      • nondeterminate program operation (things can execute in ~random order)
      • deadlocks
    • process is systematic way to manage information about task
    • store program, data needed by program, execution context
    • threads are extension of this concept, discussed later
  • memory management:
    • five responsibilities:
      • process isolation required when multiple processes in system
      • automatic allocation/management: program shouldn't have to do its own memory allocation
      • support modular programming: create, destroy, alter size of modules dynamically
      • protection/access control: some memory is shared, but some is not shared
      • long-term storage: manage memory on hard drives
    • use virtual memory with virtual/logical addresses that get translated to real addresses
  • information protection and security:
    • availability: protect system from interruption
    • confidentiality: can't read unauthorized data
    • data integrity: can't write unauthorized data
    • authenticity: verify user identity, validity of messages/data
  • scheduling and resource management:
    • especially important for time-sharing
    • must consider fairness, differential responsiveness, efficiency
    • round robin plus interrupts is pretty good

2.4 Developments Leading to Modern Operating Systems

  • microkernel architecture:
    • OS was traditionally large monolithic kernel with all the features
    • shift towards microkernel, which has only essential functions in kernel, rest in user space
    • more customizable, better ability to evolve
  • multithreading:
    • separate executable work and process data
    • can parallelize work, especially on SMP or multicore systems
  • SMP: better performance, availability, incremental growth, scaling (I think this list was in Chapter 1 as well)
  • distributed OS: provide illusion of single main memory, secondary memory, but distribute computation across multiple computers
  • object-oriented design: write OS with OOP, eases development

2.5 Fault Tolerance

  • reliability R(t) is probability of correct operation up to time t given that operation was correct at t=0
  • mean time to failure (MTTF) is Integral[R(t), {t, 0, inf}]
  • mean time to repair is how long it takes for repair or replace faulty element
  • availability is MTTF / (MTTF + MTTR)
  • faults can be:
    • permanent: after it occurs, always present; disk hard crashes, software bugs
    • temporary: not permanent
      • transient: occurs only once
      • intermittent: multiple, unpredictable times
  • three types of redundancy for managing faults:
    • spatial (physical) redundancy: multiple components that do the same thing
    • temporal redundancy: repeat a function when an error occurs; effective for temporary faults
    • information redundancy: error correction codes
  • OS can provide:
    • process isolation: more hermetic environment, because interference from other procs can cause faults
    • concurrency controls: because problems cause concurrency can
    • virtual machines: even more isolation, prevent some types of permanent faults
    • checkpoints and rollbacks: copy of application state in backup; often used in databases

2.6 OS Design Considerations for Multiprocessor and Multicore

  • design issues for SMP include:
    • simultaneous concurrent procs/threads: kernel data structures need to account for possiblity of multiple instances of same service call running at once
    • scheduling: each processor needs to have scheduling, can interfere
    • synchronization: some processes might depend on other procs, also need to mutex shared resources
    • memory management: need to exploit hardware parallelism; paging mechanisms need to be consistent
    • reliability and fault tolerance: graceful degradation if one processor fails
  • design issues for multicore:
    • same issues as SMP, but multicore is EVEN MORE PARALLEL so it's just extra hard
  • parallelism within application: write applications to take advantage of parallel execution
    • Grand Central Dispatch on OSX/iOS is thread pool mechanism, makes this easier
  • virtual machine approach: give process entire cores, let it run as if it is only program on system
    • OS just acts like hypervisor, process handles own resources for most part

2.7 Microsoft Windows Overview

  • originally extension to MS-DOS, replaced by Windows NT
  • organizes system into:
    • executive: core OS services
      • I/O manager: implements all the APIs for I/O
      • cache manager: for file-based I/O
      • object manager: manages Windows Executive objects
      • plug-and-play manager: automatic device driver installation
      • power manager: shutting down idle devices, sleeping, writing all memory to disk and shutting off
      • security reference monitor: manage secure access to all Objects
      • virtual memory manager: virtual address management, physical memory, paging
      • process/thread management: create, manage, delete procs and threads
      • configuration manager: system registry
      • advanced local procedure call (ALPC) facility: efficient cross-process procedure call mechanism; pretty much like RPC
    • kernel: scheduling, processor management
    • hardware abstraction layer (HAL): maps between generic hardware commands and platform-specific stuff
    • device drivers: extend functionality of executive, mostly to do translation to HAL commands
    • windowing and graphics system: GUI functions
  • four types of user-mode processes:
    • special system process: user-mode OS service, can manage system in more privileged ways; session manager, auth subsystem, service manager, logon proc
    • service processes: printer spooler, event logger, device driver components, network services; background user-mode activity
    • environment subsystems: provide different OS "personalities"; either Win32 or POSIX; converts system calls
    • user applications: EXEs and DLLs
  • client/server model is used on Windows, which makes it easier to use in distributed computing; used even on single system
    • kernel-based services are invoked through RPC; Executive routes request to appropriate server
    • simplifies the Executive, easy to add new APIs
    • improves reliability since more things are running outside the kernel
    • uniform means for applications to communicate with services
    • suitable base for distributed computing, just route RPC over network as required
  • Windows is OO, Objects are important feature; they provide:
    • encapsulation: data of object protected by object's services (methods)
    • object class and instance: standard OS stuff, makes creating objects easier
    • inheritance: hand coded implementation, but still good because inheritance is good
    • polymorphism: common set of API functions to manipulate any object; however, not fully polymorphic, many APIs are specific to specific type
  • Object Security Descriptor (SD) used to restrict access to specific user
  • Objects can be named so that other processes can get a handle to them and shared them
  • two categories of objects used to manage processor:
    • Dispatcher object: threads can wait on these
    • Control objects: manage operation of processor not managed by normal thread scheduling

2.8 Traditional UNIX Systems

  • one of first OSes to be written in C rather than assembly, took advantage of high-level language features and ease of development
  • UNIX commands/libraries interact with kernel through system call interface, and kernel interacts with hardware
  • all interaction with hardware is done through file subsystem, which can use either character or block interface with hardware control system

2.9 Modern UNIX Systems

  • System V Release 4, BSD, and Solaris 10 are all more modern versions of UNIX
  • SVR4 is very important, objective was to be uniform platform for commerical UNIX, incorporates most important features
  • BSD had more academic focus, widely used, responsible for popularity of UNIX
    • Mac OS X based on BSD
  • Solaris 10 was Sun's version; fully preemptable, multithreaded kernel, full support for SMP, OO interface to file systems
    • most successful commercial UNIX implementation

2.10 Linux 91

  • originally project of Linus Torvalds, lots of collaboration over internet
  • success is largely due to free software packages, FSF's GNU project
  • highly modular, easily configured, maximum rice
  • loadable modules are object files which can be linked and unlinked to kernel at runtime
    • even though kernel is monolithic, modules make it feel less monolithic
  • lots of components:
    • signals: used to call into processes (notify of faults, e.g.)
    • system calls: used to request kernel services
    • scheduler: manages and schedules processes
    • virtual memory: allocates and manages virtual memory for procs
    • file systems: global namespace for files, directories, etc.
    • network protocols: Sockets interface to TCP/IP
    • character device drivers: manage devices like terminals, printers, modems
    • block device drivers: manage secondary memory, mostly
    • network device drivers: manage network interface cards, bridges, routers
    • traps and faults: handle traps and faults generated by processor
    • physical memory: manages pool of page frames in real memory
    • interrupts: handle interrupts

2.11 Android

Not included on exam.

3 Process Description and Control

  • OS must interleave process execution, allocate resources to processes, and support interprocess communication

3.1 What Is a Process?

  • program in execution, instance of program, entity that can execute on processor
  • process can be uniquely characterized by:
    • identifier
    • state: e.g. running state
    • priority
    • program counter: address of next instruction in program to execute
    • memory pointers: pointers to code, data associated with process, shared memory blocks
    • context data: data in registers
    • I/O status information: I/O requests, assigned I/O devices, files in use
    • accounting information: amount of processor time, clock time, time limits, etc.
  • all that information is stored in process controll block (PCB)
  • PCB enables interrupting and later resuming execution as if no interruption occurred

3.2 Process States

  • trace of process is sequence of instructions executed for that process
  • two-state model has only running and not running; processes handled in round-robin manner
  • process creation involves creation of PCB and allocation of memory space for program/data
  • processes can be created (spawned) by:
    • new batch job
    • interactive log-on (spawn user process when they log on)
    • created by OS to provide service (e.g. automatically creating printing process)
    • spawned by existing process: fork! this is a child process
  • processes can be terminated for these reasons:
    • normal completion
    • time limit exceeded (total time limit)
    • memory unavailable
    • bounds violation (memory)
    • protection error (resources, e.g. files)
    • arithmetic error (divide by zero)
    • time overrun (blocked for too long)
    • I/O failure
    • invalid instruction (usually a result of branching to a data area)
    • privileged instruction
    • data misuse (type error)
    • OS intervention (deadlock management, maybe)
    • parent termination (parent was terminated)
    • parent request (parent does the terminating)
  • five state model has ready, running, blocked, new, and exit; they work as expected
    • new and exit are useful for management; gives OS time to allocate/free resources
    • processes in ready state still handled round-robin
    • can have multiple ready queues to support different priorities
    • can have blocked queues for each resource
  • swapping is useful because you can accomodate more processes at once; add suspend state
    • can also add ready/suspend and blocked/suspend states if even procs in ready state can be suspended
  • suspension is defined by these properties:
    • process not immediately available for execution
    • may or may not be waiting for event (wtf textbook); can become unblocked without unsuspending
    • process was placed in suspend state by agent; itself, parent, or even OS (wtf textbook)
    • process will remain suspended until agent explicitly orders unsuspension (something something Newton, also wtf textbook)
  • can suspend a program to inspect its state

3.3 Process Description

  • memory tables track both real and secondary memory
    • some reserved for OS, rest used by procs
    • tracks allocation of memory to processes, protection attributes, virtual memory data
  • I/O tables manage I/O devices and channels
    • also tracks individual I/O requests to it can route responses
  • OS may also have file tables, which are like I/O tables but for files
  • process tables are the list of PCBs, pretty much
  • all tables may be interlinked
  • tables subject to memory management as well
  • collection of resources for process referred to as process image; at least some of image must be in main memory for execution
  • paging can be used to make the image seem contiguous even if it's not, or even if not all of it is in main memory
  • process control block information can be grouped into three categories:
    • identification (number)
    • state (registers, stack pointers)
    • control (scheduling and state, OS data structure stuff (mp_next), IPC, privileges, memory management info, resource ownership)
    • can also have accounting, which is like control but separate
  • problems in PCB are big problems, so protect PCBs by putting them behind handler routines

3.4 Process Control

  • processes can execute in user (unprivileged) mode, or kernel (priviliged) mode
    • mode determines allowed instructions
    • mode determined by bit in PSW, changed in response to service calls and returns
  • process creation steps:
    • assign unique PID
    • allocate space for process, including its PCB
    • initilize PCB
    • set linkages (i.e. put in queue)
    • create/expand other data structures, like accounting
  • three reasons to do process switch:
    • interrupt (clock, I/O, memory fault)
    • trap (error or exception condition)
    • supervisor call (service call by process)
  • process state change steps:
    • save context of processor
    • update PCB
    • move to appropriate queue
    • select another process
    • update PCB of new process
    • update memory management structures (address translation)
    • restore context of new process

3.5 Execution of the Operating System

  • OS functions can execute in three ways:
    • nonprocess kernel: call into OS, which has own memory and own system stack, can do whatever
    • execution within user procs: process image also includes space for OS data structs, OS stack; OS functions executed within process; only process switching is external
    • process-based OS: OS functions are in their own processes, only IPC and process switching is external

3.6 UNIX SVR4 Process Management

  • nine process states:
    • user running
    • kernel running (move to this state from ready/in memory, then to user running)
    • ready/in memory
    • asleep/in memory (blocked)
    • ready/swapped
    • sleeping/swapped (wtf UNIX, why is it sleeping here but asleep in memory?)
    • preempted (basically same as ready/in memory, but reached by being preempting as exiting kernel running)
    • created (new)
    • zombie (exit)
  • proc 0 is swapper process, it spawns proc 1 which is init process, all other procs have proc 1 as ancestor
  • divides context into three parts:
    • user-level context: program, process data, stack data, shared memory
    • register context: PC, PSW, stack pointer, registers
    • system-level context: process table entry, U area, region table, kernel stack (for when proc executes in kernel mode)
  • PCB has state, pointers, process size, user IDs, process IDs, even descriptor (for blocking), priority, signals, timers, P_link, memory status
  • U area has process table pointer, user IDs, timers, signals, login terminal, error field, return value, I/O params, file params, file descriptor table, limit fields, permission modes
  • fork is used to create process
    • copies process image, except for shared memory
    • returns ID of child to parent, returns 0 to child

4 Threads

4.1 Processes and Threads

  • separate resource ownership and scheduling/execution; thread/lightweight process is executable, process holds resources
  • multithreading is multiple threads per process
  • threads have own stacks, thread control block, access to memory/resources of process
  • takes less time to create/terminate/switch between threads
  • more efficient interthread communcation because don't have to invoke kernel to IPC
  • threads can be used for:
    • foreground vs background work
    • asynchronous processing
    • speed of execution (some threads can get blocked, but other work can still occur)
    • modular program structure
  • suspension and termination affect all threads
  • thread synchronization is a concern since shared memory, addressed in chapter 5 and 6

4.2 Types of Threads

  • threads can be user-level or kernel-level
  • user-level threads managed by library in user space
    • blocking on messages from other threads can be handled in user space
    • light weight, doesn't require switch to kernel mode to manage threads
    • scheduling can be application specific
    • OS independent
    • if one thread blocks on system resource, kernel will block entire process
      • can solve this problem with jacketing, wrap kernel requests with nonblocking versions so that kernel doesn't block whole process
  • kernel-level threads are handled by kernel
    • kernel scheduler manages scheduling, switching; processes request threads through API
    • kernel can schedule multiple threads on multiple processors
    • kernel routines can be multithreaded
    • operations are more costly though
  • can combine approaches, where some ULTs are mapped to KLTs but some are just in user space
  • TODO: read about 1:M and M:N threads, where they can migrate between domains

4.3 Multicore and Multithreading

  • speedup from multicore is 1/((1-f) + f/N), where f of execution time is parallelizable and there are N cores
  • overheads mean that sometimes multicore is actually worse
  • some applications can scale really well with more cores:
    • multithreaded native applications
    • multiprocess applications (databases)
    • Java applications (JVM is super multithreaded)
    • multiinstance applications
  • Valve reprogrammed Source engine to take advantage of multithreading
    • found that coarse threading (major modules separate, occasionally synchronize with timeline thread) offered speedup of around ~1.2
    • found that fine-grained threading (similar/identical tasks spread across processors) was hard to program
    • used hybrid threading, where both types of threading are used
    • assign sound mixing to own processor, e.g.
    • scene rendering split up into multiple threads in hierarchy
    • use single-writer-multiple-readers to allow threads to continue to execute, not block each other

4.4 Windows 8 Process and Thread Management

  • application is one or more processes
  • processes hold resources, have one or more threads
  • thread can be scheduled for execution
  • job object is collection of processes that can be managed as a group
  • thread pool is collection of worker threads
  • fiber is unit of execution manually scheduled by application; run in context of thread that schedules them
    • like threads within threads
  • user-mode scheduling (UMS) allows applications to schedule their own threads
    • more efficient for short-duration work items with few system calls
  • Windows 8 changes a bunch of things:
    • only one Store app can run at once
    • apps can be terminated randomly, must save entire relevant state to disk each time app is suspended
    • background task API lets apps perform small tasks in background, like handling push notifications
  • Windows process are objects
  • process can be new, or copy of existing process
  • process and thread both have synchronization capabilities
  • logged-on users are given security ID; this is shared with literally all processes run by user, which is very secure
    • how is this different than UNIX, where we just record user who ran proc? calling it a security ID doesn't make it more secure
  • process points to linked list of virtual memory blocks, used by memory manager
  • process has object table with related objects, like threads and files
  • thread processor affinity is set of processors that are allowed to run a thread; process processor affinity is same thing but for process as a whole
  • Windows process states are:
    • ready
    • standby (intermediate between ready and running)
    • running
    • waiting (blocked)
    • transition (intermediate between waiting and ready; e.g. some pages are paged out)
    • terminated (exit)
  • complicated stuff happens to support both Win32 and POSIX
    • Windows doesn't automatically create thread for new process
    • Win32 always has intiial thread, so it makes a second request for an initial thread
    • POSIX doesn't support threads, so it makes a second request for an initial thread, and returns only process information to creator

4.5 Solaris Thread and SMP Management

  • Solaris has four thread-related concepts:
    • process
    • user-level threads
    • lightweight processes (mapping between ULTs and kernel threads)
    • kernel threads (fundamental entities that can be scheduled)
    • always exactly one kernel thread for each LWP, one LWP for each ULT
    • can also have kernel threads that execute specific system functions, not bound to LWP
  • can swap out ULT library, just have to fit LWP interface to be mapped to kernel threads
  • replaces traditional UNIX processor state block with list of data blocks for LWPs; LWP block has:
    • LWP identifier
    • priority
    • signal mask
    • saved values of user-level registers
    • kernel stack
    • resource usage and profiling data
    • pointer to kernel thread
    • pointer back to process structure
  • thread states are:
    • IDL (new?)
    • RUN (ready)
    • ONPROC (running)
    • PINNED (interrupted by interrupt thread)
    • SLEEP (blocked)
    • STOP (stopped, for debugging purposes)
    • ZOMBIE (exit)
    • FREE (resources released, waiting for thread data structure to be removed)
  • interrupts are implemented as threads as well
    • set of kernel threads dedicated to handling interrupts
    • kernel controls access to dat structures, syncs among interrupts using mutexes
    • have highest priority
    • pin current thread when running interrupt thread

4.6 Linux Process and Thread Management

  • process or task is represented by task_struct data structure with:
    • state
    • scheduling information (priorities, normal/realtime, time quantum)
    • identifiers
    • IPC data
    • links (to parent, siblings, children)
    • times and timers (creation time, processor time consumed)
    • file system (owned files, current/root directories of process)
    • address space (virtual memory)
    • processor-specific context (registers, stack info)
  • process states are:
    • running (consists of both ready and executing)
    • interruptible (blocked, waiting for event)
    • uninterruptible (blocked, waiting on hardware conditions, will not handle signals)
    • stopped (halted, probably for debugging purposes)
    • zombie (exit)
  • no thread support, one thread per process
  • instead, use pthread (POSIX thread) library, which creates kernel-level process for each ULT
  • kernel-level processes can share group ID to share resources like files and memory, less need for context switching
  • clone instead of fork preserves resources
    • fork is just clone with clone flags cleared
  • six types of namespaces (mnt, pid, net, ipc, uts, and user); namespaces provide processes with different view of system, like a virtual machine
    • mnt (mount): different mount provide different views of filesystem; can only access files in visible part of filesystem
    • uts (UNIX timesharing): allows group of machines within NIS domain to share config files
    • ipc: isolates certain IPC resources
    • pid: processes in different PID namespaces can have same PID
    • net (network): isolation of resources associated with networking
    • user: container with own set of UIDs, separate from parent

4.7+

Not included on exam.

5 Concurrency: Mutual Exclusion and Synchronization

  • multiprogramming is running multiple processes on a single processor
  • multiprocessing is multiple processors
  • distributed programming is running processes on multiple, distributed computer systems
  • concurrency can occur within:
    • multiple applications
    • structured application (single app programmed as multiple concurrent processes)
    • operating system structure (OS implemented as set of processes/threads)

5.1 Principles of Concurrency

  • even if only multiprogramming, can still have concurrency problems e.g. race conditions
  • bugs can be hard to find because of nondeterminism
  • race conditions can be prevented by disallowing multiple instances of programs being in critical section at same time
  • functioning of process (incl output) must be independent of speed of execution and other concurrent processes
  • processes can have different degrees of relation to each other
    • unaware: interact by competition
      • use of critical resource is a critical section, needs mutual exclusion
      • deadlock and starvation can happen
    • indirectly aware: cooperate by sharing
      • shared resources are now shared data
      • reading is fine, but writing still needs mutual exclusion, deadlock and starvation still possible
    • directly aware: cooperate by communication
      • don't need mutual exclusion for this sort of interaction
      • deadlock and starvation still possible; e.g. can have loop of processes waiting for messages from each other
  • requirements for supporting mutual exclusion:
    • mutual exclusion is enforced; only one proc at once is in critical section
    • process that halts in noncritical section doesn't matter
    • no deadlock or starvation
    • if nothing in critical section, things should be able to enter critical section
    • no assumptions about relative speeds
    • process remains in critical section for finite time

5.2 Mutual Exclusion: Hardware Support

  • possible to do mutual exclusion with only software, but error prone
    • can (almost?) always find race condition that makes the software solution break
  • can disable interrupts, which prevents any other process from running
    • does not work in multiprocessor environment
  • special machine instructions can do two operations as single atomic operation
  • compare&swap checks whether *word is equal to testval; if so, it sets *word to newval; always returns old value
    • shared bolt initialized to 0; when entering critical section, try to swap a 1 into bolt
    • afterwards, clear bolt
  • exchange instruction just swaps two variables
    • shared bolt initialized to 0; exchange with key, initialized to 1
    • when swap goes through (key is 0), do critical section
    • afterwards, set bolt back to 0 (equivalently, swap key with bolt again)
  • hardware instructions guarantee mutual exclusion
  • however, busy waiting is employed
  • starvation and deadlock still possible
  • want higher-level tools

5.3 Semaphores

  • semaphore has internal count
  • semWait decreases semaphore value; if value is negative, block process
  • semSignal increments semaphore value; if value is not positive, unblock one of the blocked process
  • initial value of semaphore is number of processes that may be in critical section at once
  • binary semaphore is variant
    • count is always 1 or 0
    • instead of going below 0, just keep at 0
    • set back to 1 if no processes remain in critical section
  • mutual exclusion lock is called mutex, basically just a binary semaphore
    • lock before using resource, unlock afterwards
    • process that locks mutex must be one that unlocks it
  • strong semaphore guarantees that process that has been blocked for the longest will be first to be unblocked
  • weak semaphore does not specify order of unblocking
  • typically assume strong semaphores; honestly you have to work pretty hard to make your semaphore weak
  • strong semaphores prevent starvation

Producer/Consumer Problem

  • 1+ producers generating some type of data, putting in buffer
  • 1+ consumers taking items out of buffer
  • only one agent can access buffer at once
  • want to prevent taking from empty buffer or adding to full buffer
  • if you use a binary semaphore, you have to use two semaphores and be careful about when you do your signals
    • save local copy of count after decrementing, consume resource, and then wait for new item if consumed last item
    • may not actually need to wait if more items have already been added to buffer in meantime
  • general semaphores are much easier; have one for lock on buffer, another (n) to track number of elements in buffer
    • producer signals n after appending, consumers wait on n before taking
  • with finite buffer, have additional general semaphore (e) that tracks number of free spaces in buffer
    • producers wait on e before appending, consumers signals e after taking
  • semaphores implemented by OS using hardware mechanism already discussed (or even software, maybe)

5.4 Monitors

  • problem with semaphores is that they are scattered thorughout code; want even higher-level tool
  • monitor has one or more procedures, initialization sequence, and internal data
  • local data variables only accessible by monitor procedures
  • process enters monitor by invoking a procedure
  • only one process may execute in monitor at once; other processes that try to enter are blocked
  • monitors also include synchronization tool in form of conditions
  • cwait(c) blocks on condition c; csignal(c) unblocks a process blocked on c, if any, otherwise does nothing
  • for producer and consumer problem, have local variables for buffer, buffer indices, and count, and conditions notfull and notempty
    • append method waits on notfull if currently full, signals notempty after appending
    • take method waits on notempty if currently empty, signals notfull after taking
  • csignal is usually last thing in procedure; if it isn't, can block signalling proc into special urgent queue, wait until monitor becomes free again
    • or be like Concurrent Pascal and forbid csignal from not being last operation in procedure
  • instead of Hoare's model (described above), can use Mesa monitor instead
    • csignal replaced with cnotify, which unblocks a process waiting on that condition but does not immediately run it; it will run at some later convenient time
    • fewer process switches, no constraints on when cnotify is run
    • must be careful to use while(blah) { wait } instead of if, since e.g. queue may have emptied again since notempty was signalled
    • can also add cbroadcast, which notifies all waiting procedures
      • useful if doing e.g. producer/consumer with variable length blocks, or otherwise uncertain about which/how many procs to unblock
    • finally, Mesa monitor is more modular, because waiters can determine their own conditions for resuming, and signallers only need to know that some waiters might want to resume

5.5 Message Passing

  • can send and receive messages to other processes
  • three possibilities for blocking/nonblocking:
    • blocking send and receive: rendezvous, tight synchronization
    • nonblocking send, blocking receive: sender continues on, receiver must wait for message; most natural; don't know if message was received
    • nonblocking send and receive: neither need to wait; receive returns null if no message waiting
  • addressing can be:
    • direct: target destination process specifically
      • can require that receiver request to receive from sender explicitly (explicit address) or just accept from anyone (implicit addressing)
    • indirect: send to and receive from shared data structure consisting of message queue called mailbox
      • allows (one|many)-to-(one|many); many-to-one is called port; one-to-many is broadcast; one-to-one is private
      • mailboxes can be owned by receiving process, or by creating process, or by OS
  • messages have header (message type, destination, source, length, control info) and data
  • FIFO is most common for queue, could allow receiver to look through message queue
  • can do mutual exclusion by sending null message to mailbox; processes take message before entering critical section, put back afterwards
  • can solve finite-buffer producer/consumer with mayproduce and mayconsume mailboxes
    • initialize mayproduce with number of messages equal to capacity
    • producer takes a message from mayproduce, produces, and sends produced content to mayconsume
    • consumer takes content from mayconsume, consumes, and sends null message to mayproduce

5.6 Readers/Writers Problem

  • any number of readers may simultaneously read, but only one writer may write at once, and while writer is writing, no reading
  • readers or writers may have priority
  • annoying to do with binary semaphores
  • easier to do with message passing
  • see examples in text, p. 241, 243-4

6 Concurrency: Deadlock and Starvation

6.1 Principles of Deadlock

  • deadlocks involve conflicting needs for resources by two or more processes
  • can draw joint progress diagram to see when deadlock can occur
    • axes are progress of processes, with lines for lock A, release B, etc.
    • shade areas of resulting grid where single resource is needed by both processes; can't enter those regions
    • if stuck in concave region (can't go up or right without hitting shaded area), deadlock is inevitable
    • finding clear lattice path to top right means avoiding deadlock is possible
  • reusable resources can be released and reused; things like processor, I/O channels, memory, devices, files, etc.
  • consumable resources are produced and consumed; things like interrupts, signals, messages, info in I/O buffers
  • resource allocation graph represents processes as circles, resources as square with dot
    • arrow from proc to resource is a request
    • arrow from resource dot to proc is held resource
    • for consumable resources, holding becomes producing
  • for deadlock, need three conditions for possibility and then one more for existence
    • mutual exclusion: only one process may use a resource at once
    • hold and wait: a process may hold resources while waiting for other resources
    • no preemption: can't forcibly take resources from processes
    • circular wait (for existence): closed chain of processes such that each process holds at least one resource needed by next process in chain (i.e. cycle in resource allocation graph)

6.2 Deadlock Prevention

  • prevent deadlocks by eliminating one of the conditions required for deadlock
  • mutual exclusion: can't really prevent, but for something like reads, deadlock can't occur because multiple readers are allowed
  • hold and wait: require that procs allocate all resources simultaneously
    • impractical because programs are modular
    • can't know in advance how many resources will be needed
    • inefficient because a lot of waiting happens
  • no preemption: allow preemption of resources
    • processes can be suspended to reclaim memory (and sometimes other resources too)
  • circular wait: define linear ordering of resources
    • all processes must allocate resources in order specified by linear ordering
    • inefficient like hold and wait

6.3 Deadlock Avoidance

  • dynamically prevent deadlock by disallowing process initialization or resource allocation if deadlock would become possible
  • define following symbols:
    • resource vector R (total amount of resource)
    • available vector V (current free amount of resource)
    • claim matrix C: max resource requirements for each process
    • allocation matrix A: current resource allocation for each process
  • most conservative algorithm would only allow process initialization if system could meet all max resource requests simultaneously
  • banker's algorithm tries to keep system in safe state
    • safe state is one in which there is a sequence of resource allocations that does not lead to deadlock
    • unsafe state is state that is not safe
    • basically, find process that can run to completion, and then free all its resources, and complete until everything is done
    • requires assumptions: maximum resource req stated in advance, processes are independent (no messages), fixed number of resources, completed processes hold no resources

6.4 Deadlock Detection

  • detect deadlock and then resolve
  • use modified version of banker's algorithm
    • replace claim matrix C with request matrix Q (what is currently requested)
    • perform banker's algorithm, marking processes that complete
    • at end, unmarked processes are deadlocked
  • recover by:
    • aborting all deadlocked processes; actually very common
    • back up deadlocked processes to checkpoints, then restart
    • successively abort deadlocked processes until deadlock goes away
    • successively preempt resources until deadlock goes away; roll back preempted processes to before they acquired those resources
  • for latter two recoveries, can select process with:
    • least amount of processor time so far
    • least output so far
    • most time remaining
    • least total resources allocated
    • lowest priority

6.5 An Integrated Deadlock Strategy

  • group resources into resource classes, use linear ordering strategy on classes, use specific strategy within resource
  • best strategy for specific resource depends on resource:
    • swappable space: prevent with hold-and-wait, or maybe deadlock avoidance
    • process resources (tape drives, files): avoidance usually effective
    • main memory: prevent with preemption (by swapping)
    • internal resources (I/O channels): prevent with resource ordering

6.6 Dining Philosophers Problem

  • circle of (five) philosophers, forks between philosophers, must claim fork on both sides to eat
  • lots of solutions
  • use semaphore, prevent more than four philosophers from entering room at once
  • check that both forks are free before grabbing
  • put down left fork if right fork is not available, then block on right fork

6.7 UNIX Concurrency Mechanisms

  • pipes: circular buffer that allows processes to communicate using producer-consumer model
  • messages: block of bytes with type, msgsnd and msgrcv
    • each process has a mailbox
    • receive by FIFO or by type
    • block when sending to full queue or reading from empty queue (or queue without requested type)
  • shared memory: common block of virtual memory
    • each process can have read-only or read-write permissions on block
    • must enforce own mutual exclusion constraints
  • semaphores: super-general semaphores, can increment and decrement by values greater than 1
    • kernel does all operations atomically
    • semaphores are actually sets of semaphores
    • single sem_op operation, takes operation for each semaphore in set, operation is just value
      • if positive, increment value of semaphore and awaken all procs waiting for increase
      • if 0: if semaphore value is 0, continue; otherwise, wait on value equaling 0
      • if negative and abs value lte current semaphore value, add arg to value; if result is 0, awaken procs waiting for 0
      • if negative and abs value gt current semaphore value, wait on value increasing
  • signals: informs proc of async event; like an interrupt with no priority
    • each signal is single bit
    • procs send signals to each other, or sent by kernel
    • procs can respond by performing default operation, invoking signal-handler function, or ignoring signal

6.8 Linux Kernel Concurrency Mechanisms

  • all concurrency mechanisms in UNIX plus more
  • also has real-time signals:
    • can deliver by priority
    • multiple signals can queue
    • can send value along with signal
  • also has atomic operations on variables, both integer and bitmap
    • must be implemented on architecture that implements Linux
    • on multiprocessor systems, variable is locked until operation completes
  • spinlocks are common mechanism for protecting critical system
    • problem is busy-waiting, but okay if resources usually freed quickly
    • different spinlocks depending on interrupt status:
      • plain: critical section not executed by interrupt handlers, or if interrupts are disabled
      • _irq: use if interrupts are always enabled
      • _irqsave: use if not known whether interrupts will be enabled
      • _bh: something with bottom halfs (TODO understand this maybe)
    • implemented differently on multiprocessor systems
    • reader-writer spinlock is more complicated spinlock
      • multiple readers can acquire lock, but only one writer
      • counter counts number of readers, flag indicates whether spinlock is released by all readers/writers
      • prioritizes readers over writers
  • Linux has binary and counting semaphores
  • also reader-writer semaphores: allow multiple readers but only single writers
    • counting semaphore for readers but binary semaphore (mutex) for writers
  • some compilers and/or hardware reorder memory accesses
    • algorithms prevent problems for non-concurrent systems, but concurrent systems can have problems
    • rmb barrier prevents reads from occurring across the barrier, wmb prevents writes, mb does both
    • dictate behaviour of compiler and processor, but when actually run don't do anything
    • variants for compiler-only barriers and barriers that apply to hardware only if on SMP system

6.9+

Not included on exam.

7 Memory Management

  • frame: fixed-length block of main memory
  • page: fixed-length block of data in secondary memory; can be copied to frame
  • segment: variable-length block of data in secondary memory

7.1 Memory Management Requirements

  • requirements: relocation, protection, sharing, logical organization, physical organization (RP PLS)
  • relocation: to unload and reload processes, must be able to load to different locations
    • branch and data references need to be translated to physical memory addresses
  • protection: memory must be protected from other processes; must be handled by hardware, not OS
  • sharing: processes need to be able to share memory
  • logical organization: dividing user programs/data into modules (e.g. segmentation) allows:
    • independent writing and compilation of modules, resolve refs at runtime
    • different degrees of protection for different modules
    • sharing of modules
  • physical organization: cache management should be the job of the OS, not user programmer

7.2 Memory Partitioning

  • virtual memory uses segmentation and/or paging, but first, talk about partitioning
  • internal fragmentation is unused space within partitions
  • external fragmentation is unused space outside of partitions
  • fixed partitioning partitions memory into blocks of fixed size
    • equal-size partitions: simple, but programs can be too big, utilization is inefficient, limited max #
    • variable-size partitions: can get better usage, fit large programs; need to consider assignment strategy:
      • assign-to-smallest-that-fits, with queue for each partition: okay, but we can use larger partition for smaller process if needed, so use only a single queue
  • dynamic partitioning: partitions resized to fit program exactly; no internal fragmentation, but inefficient due to external fragmentation and compaction
    • best-fit algorithm puts process in free space closest to needed size; generally sucks
    • first-fit algorithm puts process in first space large enough; generally best and fastest
    • next-fit algorithm puts process in next space large enough after previous allocation; similar to first fit, may require more frequent compaction
    • replacement algorithms also needed; see later notes
  • buddy system: partions of size 2^K, break up and recombine as needed
    • better than normal fixed and dynamic partitioning, but not as good as virtual memory; used in some parallel systems
  • relocation is needed for things that aren't fixed partitioning
    • store logical addresses in code (e.g. relative addresses, delta from known point), translate to physical address for access
    • check bounds during address translation (protection)

7.3 Paging

  • copy pages from secondary memory to frames in main memory
  • maintain page table for each process
    • logical addresses contain page number and offset
    • translated using page table lookup to frame number and offset
    • hardware must know how to do address translation with page table
  • when swapping in a process, load pages into whichever free frames are available, create page table

7.4 Segmentation

  • segments are like unequal-size pages; there is a maximum size
  • segmentation usually visible to programmer, unlike paging
  • fixed number of bits at beginning of logical address used to identify segement, like with paging
    • however, some logical addresses will be invalid if segment is not maximum size
    • segment table must store size to know whether addresses are invalid
  • can combine segmentation and paging (how this is done is not described in this chapter)

7A Loading and Linking

TODO

8 Virtual Memory

  • virtual memory: a scheme by which secondary memory can be addressed as though it were in main memory
    • limited by address length and secondary memory size, but not by main memory size
  • virtual address: logical address for virtual memory, basically
  • virtual address space: virtual storage space for a given process
  • address space: range of memory addresses available to process
  • real address: physical address in main memory, basically

8.1 Hardware and Control Structures

  • central idea of virtual memory: if memory references are all logical addresses and process is broken into pieces (pages/segments), then not all pages need to be loaded into memory at once
  • portion of process that is actually in main memory at one time is the resident set
  • attempts to access memory outside of resident set generate memory access fault; OS loads appropriate piece
  • more processes can be executing in main memory at once, and single process can be larger than main memory
  • virtual memory isn't too inefficient because of locality
    • if too much time is spent loading pieces, it's called thrashing and it's bad
    • however, locality lets OS make predictions about which pieces will be needed soon when doing replacement
  • page table must now indicate whether page is actually present in main memory, and whether it's been changed since being loaded
  • page table can also be huge because of amount of memory managed, so it usually can't fit in main memory
    • solution: put it in virtual memory, use a smaller "root page table" to to manage pages of user page table
  • inverted page table: rather than using a page table for each process, hash page number into single large page table with chaining
    • rows in table have page number, process identifier, control bits, and chain pointer
  • translation lookaside buffer: cache for address translation; avoid having to do additional main memory access to look at page table every time
    • lookup in TLB can be really fast because of hardware parallelization (associative mapping)
    • CPU cache of main memory is also in this pipeline
  • page size is important
    • large pages have more internal fragmentation, but smaller page fault rate and page table size
      • also more suitable for rotational secondary memory (transferring larger blocks at once)
    • small pages are reverse of large pages
    • there is an optimal middle ground for page size (measuring optimality by page fault rate)
    • generally between 512 bytes and 8 Kbytes; can be as large as a Gbyte

Segmentation

  • dynamic segment sizes make lots of things better
    • can handle growing data structures
    • programs can be altered/recompiled independently; each component requests segment of required size
    • sharing is better (because shared segment can be exactly right size?)
    • protection is more flexible (protect segments, rather than pages which are arbitrarily divided)
  • unique segment table for each process
    • rows contain present flag, change flag, other control bits
    • also segment length, process identifier, etc.
    • segment table can be big so it needs to be in main memory, like the page table
  • segmentation can be combined with paging
    • variable length segments that are broken into a number of fixed pages
    • virtual address has segment #, page # within that segment, and offset
  • protection and sharing are better with segments!
    • segment can be in segment table of more than one process
    • common scheme is ring-protection scheme: rings can access data of higher-number rings, but can only access services provided by lower-number rings

8.2 Operating System Software

  • OS must decide which algorithms to use; also whether to use paging, but everyone uses paging (usually in combination with segmentation)
  • fetch policy determines when a page should be brought into main memory
    • can just fetch only when page fault occurs
    • prepaging is when additional pages are brought in (to exploit spin disks that make it more efficient to load many pages at once)
    • when processes are resumed, all pages that were in main memory when it was suspended are restored
  • placement policy determines where in main memory to put a process piece
    • important in pure segementation, but if pages are involved the physical location doesn't really matter at all
    • some research has been done on nonuniform memory access, where futher memory takes longer to access
  • replacement policy determines which pages to replace when loading new pages
    • distinct from resident set management (later section)
    • some frames may be locked (e.g. kernel, key control structures) and will never be replaced
    • optimal policy (theoretical) has omniscience and can predict exactly which page is best to remove
    • LRU is LRU; expensive but almost optimal
    • FIFO treats page frames as circular buffer; very simple but pretty bad
    • clock policy is like FIFO, but you skip a frame if it has been used since the last time the clock went around; gets functionality close to LRU
      • can use more bits to prefer pages that haven't been modified e.g., but performance costs because might have to cycle multiple times
    • page buffering keeps a free page list and modified page list; pages still available for reading, but they are essentially queued for replacement when needed
      • pages spend some time in buffer before being removed so that if they are used again in that time, they can be saved
      • overhead is like FIFO, but performance closer to optimal
  • resident set management addresses maintainance of how many pages per process should be in main memory
    • too many pages are superfluous; fewer pages means that more processes can be in memory, but also increases page fault rate
    • fixed-allocation gives each process a fixed number of frames, determined at process creation time; may be determined by application type
      • replacements always occur within the frames assigned to that process
    • variable-allocation allows variation over time of allocated frame count; high page fault indicates that more are needed, low means that less are needed
      • much more overhead on part of OS
    • a local replacement strategy selects frames from within those assigned to process causing fault; global chooses from anywhere
      • fixed-allocation means local, variable-allocation usually means global, but...
    • variable allocation with local scope can use the working set strategy
      • working set is set of pages that process has referenced in past interval of time (window size)
      • strategy monitors working set, periodically removes pages not in working set, only executes process if working set is in memory
      • graph of working set size over time is periods of flat with big bumps between them (transient periods)
      • optimal window size is hard to determine
    • another strategy is the page fault frequency algorithm
      • on page fault, check number of page references since last page fault; if less than F, add frame; else, remove all pages that haven't been used since last time pages were removed
      • can be refined with two thresholds
      • performs poorly during transient periods
    • another strategy is variable-interval sampled working set
      • has minimum and maximum duration of sampling interval (M, L) in virtual time
      • if Q page faults occur between M and L, sample early, otherwise suspend and sample after L
  • cleaning policy is opposite of fetch policy; determines when pages are written to memory (because they are dirty)
    • demand cleaning writes pages to secondary memory only when they are selected for replacement
    • precleaning writes pages out before they need to written out; batch is more efficient
    • only a Sith deals in one cleaning policy exclusively
      • when doing precleaning, write out only pages that are replaceable under replacement policy
  • load control determines how many processes should reside in main memory
    • processor utilization with respect to multiprogramming level is graph that looks like /''..
    • maximum utiliization occurs (experimentally) when mean time between faults is equal to mean time spent handling faults
    • can monitor rate and increase or decrease this value at runtime as well
    • process to be replaced when swapping in can be decided by various strategies:
      • lowest-priority process
      • faulting process (whichever process just caused the fault)
      • last process activated (least likely to have its working set in memory)
      • process with smallest resident set (least effort to remove, but penalizes strong locality)
      • largest process (obtain most free frames)
      • largest remaining execution window (approximates shortest-processing-time-first scheduling)

8.3 UNIX and Solaris Memory Management

  • UNIX used to use variable partitioning with no virtual memory, but now UNIX and Solaris use paged virtual memory
  • Solaris (and SVR4) separate paging system (memory management for use procs) and kernel memory allocator (which does memory management for kernel)
  • paging system has very-important-to-remember data structures:
    • page table is per process, rows contain page frame number, age, and various control bits
    • disk block descriptor is associated with each page, which describes which swap device and block in that device contains the page
    • page frame data table contains information about each frame in real memory, used by replacement algorithm
    • swap-use table exists per swap device, has entry for each page on device, contains page number and reference count
    • uses modified clock policy with two hands; fronthand clears reference bit, backhand moves unreferenced pages to list of pages to be paged out
      • scanrate is rate at which hands scan through page list
      • handspread is gap between fronthand and backhand
      • hands speed up when there are fewer free pages (in order to free more)
  • kernel memory allocator must support dynamic allocation for multiple kernel functions
    • however, size of blocks in much smaller than typical machine page size, so full paging is inefficient
    • instead, use variant of buddy system called lazy buddy system
      • essentially, don't recombine blocks because block of same size might be needed soon; maintain pool of "locally free" blocks

8.4 Linux Memory Management

  • Linux uses a three level page table
    • page directory exists per active process, is the size of a single page, rows point to pages in page middle directory
    • page middle directory has entries pointing to pages in page table
    • page table has entires pointing to individual virtual pages of a process
    • virtual address therefore consists of four components: global directory, middle directory, page table, and offset
    • 64-bit Alpha processor provides hardware support for three levels of paging
    • Linux handles two-level hardware paging by defining the size of the middle directory to be 1, with optimizations done at compile time
  • Linux used to use clock policy with 8-bit age variable; accesses increase age, hand sweeps decrease age
    • now uses "split LRU algorithm"; splits memory into two zones, maintains active and inactive list for each zone
    • daemon kswapd periodically performs page reclamation
    • first reference to page sets PG_referenced flag, second moves it to active list
    • PG_referenced reset if second reference doesn't happen before daemon runs
    • active pages require two timeouts before being moved back to inactive list
  • within kernel, Linux uses slab allocation to allocate smaller chunks within pages (this is complex and is not covered in text)

8.5 Windows Memory Management

Not included on exam.

  • 32-bit processes in Windows each have a separate 32-bit address space, allowing 4 Gbytes of virtual memory (though half is used by the OS)
    • on 64-bit platforms, 8 Tbytes of user address space is available (Windows 7)
  • a guard page at each end of user virtual space is set aside to catch errors; remainder is divided into pages
  • pages can be available, reserved (not used but reserved for future use, e.g. growing stack), or committed (actually in use, possibly in main memory)
  • Windows uses variable allocation local scope
    • when memory is plentiful, resident sets are allowed to grow (with some beatdown on rapidly growing resident sets)
    • when memory is scarce, memory manager recovers memory by removing less-used pages
  • Windows 8 has a new virtual memory system to handle Windows Store interrupts
    • paging holds items that haven't been accessed in a while, whereas swapping holds items that were recently taken out of memory
    • essentially two levels of cache
    • only Store apps can use the swap space
    • whole application gets swapped

8.6 Android Memory Mangement

Not included on exam.

  • extension to normal Linux with extra features
  • ASHMem provides anonymous shared memory as file descriptors
  • Pmem allocates virtual memory so that it is physically contiguous; useful for hardware without virtual memory support
  • Low Memory Killer kills apps to free memory, since phones don't usually have swap capability

9 Uniprocessor Scheduling

9.1 Types of Processor Scheduling

  • long term scheduling: decision to add to pool of processes to be executed
    • decision driven by desired degree of multiprogramming
    • which job can be decided by FCFS, or maybe take into account priority, expected execution time, I/O reqs, etc.
    • tries to keep balanced usage of different resources (processor, different I/O devices)
  • medium term scheduling: decision to add to the number of processes that are in main memory
    • also manages multiprogramming degree, but more dynamically
    • more motivated by memory concerns though
  • short term scheduling: deciding which available process will be executed by processor
    • also known as dispatcher
    • invoked by timer, interrupts, system calls, signals
    • lots of algorithms, described in next section

9.2 Scheduling Algorithms

  • main objective is to allocate processor time to optimize some set of criteria
    • criteria can be user-oriented:
      • response time: time from submission of request until response begins to be received
      • turnaround time: time from submission of process until its completion
      • deadlines: number of processes that finish by their deadline
      • predictability: jobs are consistent when run multiple times (in terms of completion time)
    • or system-oriented:
      • efficient utilization of resources (balance)
      • throughput (number of processes completed per unit of time)
      • fairness to multiple users (no user/process gets starved)
  • processes can have priorities, higher priority always chosen over lower priority
  • not worth writing out details of all the algorithms, look at table 9.3 on page 405
  • FCFS selects process with longest waiting time to execute; does not preempt
    • process can still change if it blocks on I/O
    • favours processor-bound processes over I/O-bound
  • Round Robin uses time slicing to cycle through processes with preemption
    • fair, no starvation, but still favours processor-bound over I/O-bound
    • virtual round robin avoids unfairness with respect to processor- versus I/O-bound
      • when something blocks on I/O, it gets put in separate queue; prioritized for resumption, then run for its remaining time when it blocked
  • shortest process next uses estimated completion times and runs the shorted process next; no prempetion
    • completion time is estimated from past completion time
    • can use normal average, but exponential average (where more recent results are weighted higher) is better
    • risk of starving longer processes
  • shortest remaining time is like SPN, but can preempt when new process is scheduled
    • takes into account time already spent running a process when estimating remaining duration
  • highest response ratio next chooses process with maximum (w + s) / s
    • similar to SPN or SRT, but doesn't starve long processes because they'll eventually have waited long enough
  • feedback uses multiple queues
    • processes are admitted into highest priority queue
    • each time a process is preempted, it gets put into a lower priority queue
    • lowest queue is treated in round robin fashion
    • can avoid major turnaround time penalties by allowing longer time quantums on lower priority levels
  • fair-share scheduling is designed to fairly distribute resources among sets of processes (e.g. processes belonging to different users)
    • track CPU usage by each group of processes
    • use complicated formula involving process priorities, intended distribution of processor time, and CPU usage for specific process

9.3 Traditional UNIX Scheduling

  • traditional UNIX uses multilevel feedback within each priority level with one second preemption
  • has adjustable "nice" priority, but cannot adjust to be outside its "band", which is base priority assigned by process type
    • from highest to lowest, swapper, block I/O, file manipulation, character I/O, and user processes
  • suited to general purpose time sharing and provide efficient use of I/O devices

10 Multiprocessor, Multicore, and Real-Time Scheduling

10.1 Multiprocessor and Multicore Scheduling

  • multiprocessor systems can be:
    • loosely coupled/distributed: relatively autonomous systems
    • functionally specialized: specific processsors for specific tasks
    • tightly coupled multiprocessor: shared main memory, integrated control of OS
  • granularity is frequency of synchronization between processes
    • fine granularity is sub 20 instructions
    • very coarse is 2000-1M
    • independent is never, e.g. time sharing
  • design issues include:
    • assignment of procs to processors
      • static allocation assigns proc to processor, never changes; short-term queue maintained for each processor
      • dynamic load balancing prevents processors from being idle
      • master/slave architecture has kernel running on only one processor; master responsible for all scheduling
      • peer architecture allows all processors to self-schedule; OS must resolve conflicts, e.g. two procsesors trying to claim same proc
    • use of multiprogramming on individual processors
      • processes can still get blocked, should use multiprocessing for same reason it's normally used
      • sometimes, if many processors are available, it may be better for performance to not do multiprogramming; might want all threads of a proc to be running at once
    • dispatching of procs
      • simpler scheduling algorithms may be acceptable since processor resources more plentiful
  • most systems have a single queue feeding into common pool of processors
  • thread scheduling raises additional concerns
    • medium-grain to fine-grain synchronization; significant interaction
  • four strategies for handling multiprocessor thread scheduling
    • load sharing: global queue of ready threads is maintained, not grouped by process
      • distinct from load balancing, which is more long-term
      • work distributed evenly across processors
      • no centralized scheduler required
      • global queue can be organized using priority etc., as in chapter 9
      • FCFS actually does best on multiprocessing systems
      • load sharing has disadvantage of central queue under mutex, worse caching (since procs switch processors), all threads of proc rarely run at same time
    • gang scheduling: set of related threads scheduled to run on set of processors all at once
      • designed to handle threads requiring finer-grained synchronization
      • can weight run time for each proc by number of threads to minimize waste
    • dedicated processor assignment: assign all proc threads to processors, run until complete
      • can be viewed as extreme form of gang scheduling
      • good for highly parallel system, processor utilization isn't as important as normal
      • completely avoids process switching
      • too many threads can be a problem, causes more preemption; limit number of active threads to number of processors
      • process allocation more closely resembles memory allocation; can adapt ideas
        • activity working set, which is minimum number of activities that must be thread before application is allowed to run
        • process fragmentation is when there are unused processors left over, but not enough to do useful work
    • dynamic scheduling: number of threads in process can change during execution
      • OS is responsible for partitioning proessors among jobs; jobs uses processors currently assigned to execute subset of its runnable tasks/threads
      • when jobs request processors, use idle proessors to satisfy request
        • if none available, allocate single processor if the job doesn't already have one
        • otherwise, wait until processors become available
        • when another job releases processor, assign processors to jobs that are waiting
          • one loop of round robin, then FCFS (i.e. dump remaining to first job in list)
  • on multicore systems, reducing off-chip memory access becomes more important than proecssor utilization
    • adjacent cores may share some caches, additional factor to consider
    • beyond scope of text

10.2 Real-Time Scheduling

  • correctness depends not only on logical result, but also time at which results are produced
    • hard real-time task must meet its deadline; unacceptable damage or fatal error if fail
    • soft real-time task has associated deadline that is desireable but not mandatory
    • aperiodic task can have deadline associated with start and/or finish time
    • periodic task must be run "once per period T" or "exactly T units apart"
  • real-time systems have five unique requirements:
    • determinism: operations must be performed at fixed, predetermined times
      • good measure is maximum delay from arrival of high-priority interrupt until servicing begins
    • responsiveness: measures how long it takes to service interrupt (not start service)
    • user control: user must have fine-grained control over task priority
      • distinguish between hard and soft tasks, deadlines, etc.
      • what algorithms to use for paging, swapping, keeping procs in main memory, etc.
    • reliability: transient failures far more costly in real-time, must avoid
    • fail-soft operation: ability to fail in a way that preserves capability and data
      • a system is stable if, when it is impossible to meet all deadlines, it can still meet deadlines of most critical tasks
  • short term scheduler is key
    • enforce stricter priorities
    • minimize interrupt latency
    • provide more precise timing characteristics
    • clock-based prempetion to ensure long-running low-priority tasks don't run during critical times
  • real-time scheduling can be classified based on...
    • whether it does schedulability analysis
    • whether that analysis is static or dynamic
    • whether the result of analysis produces a schedule or a plan
  • classes of algorithms are therefore:
    • static table-driven: static analysis of feasible schedules, creates a schedule that determines when tasks must begine executing
      • good for periodic tasks/schedules
      • earliest-deadline-first are usually this category
    • static priority-driven preemptive: as above, but not schedule created; instead, priorities are assigned
      • example is rate monotonic algorithm (see later)
    • dynamic planning-based: feasibility determine at run time, arriving task accepted only if feasible; creates schedule or plan
      • attempt to create new schedule when new task arrives, reject if not possible
    • dynamic best effort: no feasibility analysis, tries to meet all deadlines and aborts processes that miss deadline
      • assign priority dynamically based on characteristics of tasks
      • used when tasks are aperiodic, no scheduling possible
      • can't know in advance whether deadlines will be met
  • deadline scheduling requires lots of information about processes
    • ready time: when it is available for execution
    • starting deadline: when it must start by
    • completion deadline: when it must finish by
    • processing time: time required to execute to completion
    • resource requirements: that
    • priority: relative importance of task (hard vs soft)
    • subtask structure: task may be decomposed into hard and soft subtask(s)
  • scheduling task with earliest deadline minimizes number of missed deadlines
  • when starting deadline is specified, nonpreemptive scheduling makes sense; ending deadline => preemptive
    • with ending deadlines, earliest-deadline-first makes most sense, implement via preemption
    • with starting deadlines, best strategy is to schedule the earliest deadline to run, regardless of whether it's ready or not; once it has started, let it run to completion; no preemption needed
  • rate monotonic scheduling is used for periodic tasks; highest priority goes to lowest period task
    • for a given process, C/T is computation time over period, i.e. fraction of usage required
    • RMS requires that sum of fractions is less than n*(2^(1/n) - 1) (which converges to ln(2)) to guarantee that all tasks will be scheduled successfully
      • however, earliest-deadline scheduling only requires that sum of fractions is less than 1, so it's better
    • RMS adopted in practice because ln(2) is only conservative bound, can actually get 90% utiliziation
    • soft real-time components can absorve processor time not used by RMS
    • stability easier to achieve with RMS because it involves priorities and priorities are good for stability
  • priority inversion is when a high-priority task is forced to wait on a low-priority task
    • e.g. low-priority task has locked a resource
    • can cause problems if low-priority task never runs because medium-priority task is not blocked
    • solution is priority inheritance
    • can also use priority ceiling; priority of resource is one plus highest priority user, and any proc that locks the resource gets that priority

10.3 Linux Scheduling

  • three queues: SCHED_FIFO, SCHED_RR, and SCHED_OTHER; first two are real-time
    • non-real-time procs cannot execute if real-time proc is ready
    • FIFO and RR are the same, but round robin does timeslicing within the same priority level
  • SCHED_OTHER scheduler did not scale very well; used single runqueue for all processors with single lock, no preemption
  • new scheduler for non-real-time tasks is O(1) scheduler
    • maintains number of tasks, priority bitmap (which queues are nonempty), and [active/expired] priority queues for each processor
    • schedule by selecting highest nonempty priority, scheduling round-robin within that priority
      • if a process is interrupted during timeslice, it is put back in active queue; procs move to expired queue after complete timeslice
      • when active queue is empty for a priority, swap active and expired queues
    • time slices are between 10-200ms
  • initial priorities assigned by user, dynamically adjusted over time
    • favour I/O bound tasks because this produces good interactive response
  • real time scheduler is a bit different
    • real-time tasks do not have dynamic priorities
    • SCHED_FIFO tasks do not have assigned timeslices
    • SCHED_RR tasks are never moved to expired queue list; just returned to same priority queue

10.4 UNIX SVR4 Scheduling

  • 160 priority levels, top 60 for real-time (ensuring they run first), next 40 for kernel, last 60 for user (time-shared)
    • dispatch queue associated with each priority, execute round-robin
    • has bitmap for nonempty queues, just like Linux; it's called dqactmap
  • insertion of preemption points, safe place for preemption to occur
    • when preemption point is reached, check kprunrun flag, preempt if higher-priority real-time ready process
  • dynamic priority reassignment within time-shared class
  • time quantum for time-shared varies based on priority, longest for lowest-priority
  • real-time procs have fixed priority

10.5 UNIX FreeBSD Scheduling

  • 256 priority levels into five priority classes:
    • top 64 for bottom-half kernel, scheduled by interrupts
    • next 64 for top-half kernel, which run until blocked or done
    • next 32 for real-time user; run until blocked or preempted by higher-priority thread
    • next 64 for time-sharing user; adjusts priorities based on processor usage
    • last 32 for idle user, run only when no time-sharing or real-time threads to run
  • latest version of FreeBSD scheduler addresses SMP/multicore support
    • supoorts processor affinity (only migrate thread when necessary)
    • better support for multithreading on multicore
    • improve performance of scheduling, no longer function of number of threads
  • queue structure split up; three queues for each processor
    • each queue has 32 priority classes, internal queue for each
    • two queues used for kernel, real-time, time-sharing; last queue just used for idle class
    • the two queues are designed current and next
      • exactly like active and expired for Linux scheduling
    • kernel and real-time threads always added to current queue
    • time-sharing thread assigned to current queue iff interactive
  • interactivity scoring determines whether a thread is interactive
    • ratio of run time versus voluntary sleep time is below a certain threshold => interactive
    • scaling factor of max interactivity score / 2 is used to normalize results
    • score is actually scaling factor times (run < sleep ? run / sleep : 1 + sleep / run)
  • thread migration i.e. processor affinity
    • take advantage of local caches
    • don't move threads by default
    • pull mechanism allows an idle processor to pull a thread from a nonidle processor
    • push mechanism is a periodic task; twice per second, equalizes most-loaded and least-loaded processors in system

10.6 Windows Scheduling

Not included on exam.

11 I/O Management and Disk Scheduling

11.1 I/O Devices

  • group into human readable (printers, video display, keyboard, mouse), machine readable (disk drives, USB keys, sensors, controllers), and communication (digital line drivers, modems)
  • can also classify by properties:
    • data rate
    • application: what device is being used for influences software choices
    • complexity of control: complexity of required interface (simple for printer, complex for disk)
    • unit of transfer: byte stream or larger blocks
    • data representation: encoding schemes; character code, parity
    • error conditions: types of errors, correction, reporting

11.2 Organization of the I/O Function

  • I/O can be performed in three different ways
    • programmed I/O: process busy waits for I/O operation to complete
    • interrupt-driver I/O: processor issues I/O command for process; can either block process until operation completes, or all it to continue running and interrupt when operation returns
    • direct memory access: separate DMA module controls exchange of data, interrupts only after entire block has been transferred
  • evolved from direct control, to programmed I/O, to DMA, to separate I/O processor, to separate I/O computer
  • I/O channel is another term for I/O module
  • to use DMA, it is given read/write bit, address of I/O device, starting location in memory, and number of words to read/write
    • DMA transfers entire block and sends interrupt
    • processor then handles data result
  • various DMA architecture supports
    • can be directly on main bus, separate from I/O devices
    • can be integrated into I/O devices
    • can act as interface to separate I/O bus; can only access I/O through DMA

11.3 Operating Sytsem Design Issues

  • efficiency very important since I/O operations are often bottleneck
  • generality is other important consideration; want to handle all devices in a uniform manner
  • divide operation into uniform logical structure:
    • logical I/O: deal with device as logical resource, not concerned with device details
      • manage general I/O functions; read, write, open, close
    • device I/O: convert requested operations into I/O instructions, channel commands
      • also deals with buffering
    • scheduling and control: active queueing/scheduling, handling interrupts, actually interacts with hardware
  • for file system, there are additional layers:
    • directory management: converts symbolic file names into references to files themselves
    • file system: deals with logical structure of of files; open, close, read, write
    • physical organization: convert logical references to files into secondary storage addresses, accounting for track and sector structure

11.4 I/O Buffering

  • when transferring a block to main memory, the page containing target addresses must be locked into main memory, which interferes with swapping considerations
  • same thing happens with output
  • to avoid problems, perform input in advance of requests and output some time after requests; known as buffering
  • simplest buffer is single buffer
    • buffer in main memory is used for I/O operations
    • when transfer is complete, block is moved to user space and next block of memory is requested
      • usually next block will be needed
    • process can be swapped out while buffering is happening because buffering happens in system memory, not user memory
      • have to be careful; if swapping process out requires use of same I/O device that is currently buffering, cannot swap out until operation completes, when it may no longer be appropriate
  • double buffer has two buffers; load into one, swap pointers rather than transfer to user space; results in speedup
  • circular buffer aims to smooth out flow of bursty I/O device; list of buffers treated like circular queue, essentially just double buffer with more buffers
  • buffering in general smooths out peaks in demand, but does not allow I/O to keep pace with processor in long term

11.5 Disk Scheduling

  • processing is orders of magnitude faster than disk speed; makes sense to use some computation to optimize disk access
  • three main delays associated with disk transfer:
    • seek time: time to move head into correct position (select track)
      • mainly optimized by reducing disk size (tracks closer together, less physical distance to move)
      • typical modern seek time is under 10ms
    • rotational delay: time for disk to rotate so that head reaches beginning of sector
      • sum of seek and rotational is access time
      • average rotational delay is 1/2r, where r is revs per second
    • transfer time: time required to read the sector as it moves under head
      • T = b / (rN), where T is transfer time, b is bytes to transfer, N is byte on track, and r is revolutions per second
      • Ta = Ts + 1/2r + b/rN gives total access time
  • there are also the queueing delays
    • need to wait for device to become available
    • need to wait for channel to device to become available, if I/O channel is shared
  • some systems do seek, and then estimate when rotation will complete and suspend and resume communication accordingly; sometimes ends up missing the sector, called RPS miss
  • huge delays are incurred if file is very fragmented and cannot be read in continuous transfer
  • can improve disk efficiency by not doing accesses in random order
    • FIFO is essentially random
    • priority doesn't optimize disk access, but usually requests from high priority will be from same proc, which will be close together
    • LIFO actually does pretty well, since last request probably requires least movement from previously finished request; can still be pretty random
    • if track position is known, can use shortest service time first; choose request closest to current head position
    • SCAN allows arm to move in one direction only, satisfying all requests as it moves
      • reverse direction at end (either at last track, or with LOOK, when no more requests in that direction)
      • prevents starvation that can occur in SSTF
    • C-SCAN is like SCAN, but moves to beginning after last track instead of reversing
      • reduces maximum delay experienced by new requests
    • N-step-SCAN and FSCAN prevnent "arm stickiness"
      • N-step-SCAN processes requests in batches of N; for large N, equivalent to SCAN basically
      • FSCAN uses two subqueues; during scan of one queue, new requests added to other queue; process all old requests before new requests

11.6 RAID

  • seven levels of RAID, but all share:
    • set of physical disks viewed as single logical drive
    • data distributed across disks
    • redundant disk capacity is used to store parity information, can recover data if a disk fails
      • RAID 0 and 1 don't support this
  • RAID 0 has no redundancy, but data is distributed using striping (consecutive blocks on consecutive disks)
    • supports high data transfer capacity and high I/O request rate
  • RAID 1 just mirrors data using 2N disks
    • read request handled by either copy, write requests done in parallel
  • RAID 2 and 3 read/write all disks simultaneously at same position
    • RAID 2 uses Hamming code for error correction; not usually needed
    • RAID 3 uses single extra disk with parity bit, can restore data if any individual drive fails
  • RAID 4 does parity like RAID 3, but on a block level
  • RAID 5 does distributed parity, so the parity blocks are distributed throughout the drives
  • RAID 6 does dual redundancy, so their are two parity blocks for each level
    • up to two disks can fail while still allowing full recovery of data
    • 30% drop in overall write performance, but read comparable to RAID 5

11.7 Disk Cache

  • can cache disk sectors in main memory
  • when cached block is requested, can do transfer from within main memory, or just pass back pointer to shared memory location
  • replacement policy must also be considered; LRU is best, LFU may be more appropriate
  • frequency-based replacement is also a thing
    • blocks are organized into stack, as in LRU; top part of stack is designated as new section
    • rereferencing in new section does not increase access count
    • replacement is done by selecting lowest count in old section
    • can refine by having three sections instead of two; reference counts incremented on both middle and old, but only replace from old

11.8 UNIX SVR4 I/O

  • I/O devices associated with a special file managed by file system, read and written as files
  • buffer cache is essentially disk cache
    • block transfers handled through buffer cache
    • maintain free list of slots in cache, device list (list of buffers associated with each disk), and driver I/O queue (lis tof buffer that are waiting for I/O on particular device)
    • index into buffer cache using free list or hash table
    • block replacement done using LRU; free list preserves LRU order
  • separate buffering (character queue) used for character-oriented devices; producer/consumer model is used; reading character destroys it
  • unbuffered I/O is done using DMA is used to maximize speed
  • category of device (disk drive, tape drive, terminal, communication line, printer) determines which strategy is used

11.9 Linux I/O

  • similar to UNIX
  • disk scheduler is called Linux Elevator
    • essentially just LOOK, which is SCAN with reversing when no more records in current direction
      • if new request is on same sector or immediately adjacent to existing request, merge requests
      • if requests currently in request queue are old, insert at tail of queue
      • if suitable location, insert in sorted order
      • otherwise, tail of queue
    • enhanced by deadline scheduler:
      • read requests are more urgent than write requests because they block
      • for each request, there is a expiration time, which is shorter for reads
      • normally requests are taken from sorted queue, but taken from FIFO queue when head of queue is expired (along with next few requests)
      • solves starvation problem, also read vs write problem
    • enhanced by anticipatory I/O scheduler
      • successive reads are likely, so delay slightly after serving a read to see if next read request comes in
  • maintains page cache for particular system files and virtual memory, separate buffer cache for block I/O
    • page cache has benefits of allowing writing in bulk, and caches things because caches are good
      • do we have to repeat these benefits every time we talk about any sort of cache?
    • dirty pages written out when free memory falls below threshold, or when the dirty pages get older than a theshold

11.10 Windows I/O

Not included on exam.

  • four I/O-related kernel components:
    • cache manager: file caching for all file systems
      • dynamically adjusts size of cache for particular file based on physical memory size
      • lazy writer periodically writes things to disk, which allows it to be more efficient because batch writing is efficient
        • like seriously, every time we talk about caches
    • file system drivers: file system driver treated like any other driver, route file requests through this system, which sends to hardware device adapter
    • network drivers: integrated networking capabilities for accessing remote file systems
      • implemented as software drivers rather than Windows Executive
    • hardware device drivers: software drivers that access hardware registers of peripheral devices
  • Windows I/O can be asychronous or synchronous
    • async is more efficient for caller, but needs to be signalled
    • signal using event on file object: proc continues and stops once it actually needs data from I/O operation, then checks file object
    • signal event object: allows multiple simultaneous I/O requests against single device/file; event object created for each request, can wait on some subset of requests
    • async procedure call: requests are made with user-mode routines to call when I/O completes
    • I/O completion ports: used on server to optimize thread use; thread waits on completion port, kernel waits thread to handle I/O completion
    • polling: async I/O requests write status and transfer count into memory, proc just checks these values
  • Windows also has software report for RAID
    • hardware RAID is what was discussed earlier; software RAID is non-contiguous disk space combined into logical partition(s)
    • implements RAID 1 and RAID 5
  • volume shadow copies are efficient ways of making snapshots of volumes for bakcup; implemented by software driver that makes copies before data is overwritten
  • uses BitLocker to encrypt entire partition, supports multiple methods of supplying cryptographic key

12 File Management

  • input and output almost always done to/from files (except for real-time)
  • files have life outside individual applications, need dedicated management system

12.1 Overview

  • files stored on disk
  • files sharable between processes, access permissions permitting
  • operations are: create, delete, open, close, read, write
  • attributes maintained with metadata about persmissions, owner, creation time, access time, etc.
  • file structure can be organized into hierarchy
    • field is basic element of data
    • record is collection of related fields
    • file is collection of similar records
    • database is collection of related data, probably in multiple files/tables
      • database should support sensible operations
  • sometimes files are just bytes
  • objectives for file management system
    • meet data management requirements of user; storage and operations
    • guarantee that data in file is valid
    • optimize performance
    • provide I/O support for various storage devices
    • minimize or eliminate potential for lost data
    • provide standardized set of I/O interface routines to user procs
    • provide I/O support for multiple users at once
  • reiterate layers from previous chapter, but it's changed a bit
    • device driver: interacts with device directly
    • basic file system: deals with blocks of data exchanged with driver; concerned with placement
    • basic I/O supervisor: responsible for I/O init and termination
      • selects device, schedules disk accesses
      • is part of OS
    • logical I/O: enables user access to records; general purpose record I/O capabilities
    • access method is main difference between systems; see 12.2

12.2 File Organization and Access

  • file organization is logical structuring of records; influenced by blocking strategy, allocation strategy
  • important criteria:
    • short access time
    • ease of update
    • economy of storage
    • simple maintenance
    • reliability
  • five organizations: pile, sequential, indexed sequential, indexed, direct (hashed)
  • pile file: variable-length records, variable set of fields in each record
    • have to search through entire file to find desired record
    • uses space well though
  • sequential file: fixed-length fields in fixed-length records
    • only field values need to be stored
    • key field uniquely identifies record; records sorted by key so you can binary search
  • indexed sequential file: add index and overflow file
    • index allows searching of key without so many cache misses etc. because index is smaller than entire file
    • when record is inserted, inserted into overflow file, and pointer of preceding record already in file
      • periodically merge overflow back into main file
  • indexed file: employ multiple indexes to search on different keys
    • don't bother sorting records, only access via index
    • can also have variable-length records
  • direct (hashed) file: hash key and go directly to storage location
    • space penalties because of empty space

12.3 B-Trees

Not included on exam.

12.4 File Directories

  • file directory contains information about files including attributes, location, ownership, etc.
  • directory itself is a file, though generally access indirectly via system routes
  • must support search, create/delete file, list directory, update directory (update file attributes)
  • simple list not well suited; unique naming problems, size problems, also want to conceal parts from users that may not have access
  • two level scheme has master directory and subdirectory for each user
  • more flexible approach is to allow nesting of subdirectories, which is what we would expect as people who have seen file systems before
  • symbolic name consists of directory names and file name; this is a pathname
  • working directory means you don't need to retype full path each time

12.5 File Sharing

  • access rights to a file can be:
    • none: none
    • knowledge: know that the file exists
    • execution: can execute file but not copy it (proprietary program)
    • reading: can read file, copy; also allows execution since you could execute your own copy
    • appending: can append, but can't modify or delete content
    • updating: all writing
    • changing protection: change access rights for other users
    • deletion: deletion
  • rights can be granted to specific users, groups, or everyone
  • can handle simultaneous access by locking whole file, or just specific records during an update

12.6 Record Blocking

  • records are unit of structured file, blocks are unit of I/O with secondary storage
  • larger blocks means more records can be transferred at once, but this is less efficient if records are accessed randomly because increased overhead
  • three methods of blocking:
    • fixed blocking: fixed-length records, fixed integral number stored in each block
      • may be unused space at end of block
    • variable-length spanned blocking: variable-length records, records can span blocks
      • no unused space
    • variable-length unspanned blocking: variable length records, records cannot span blocks
      • usually some unused space because not every gap can be filled
    • always some unused space because of how blocks fit into tracks and because of hardware-enforced gaps

12.7 Secondary Storage Management

  • allocation of blocks to files on physical disk is kind of like allocating blocks of memory in main memory
  • files can grow over time; a portion of a file is a contiguously allocated block
  • need data structure to keep track of portions assigned to file; example is FAT
  • preallocation policy requires that maximum file size be declared at creation; dynamic allocation allocates portions as needed, which is more common because it's more flexible
  • portion size is important consideration:
    • contiguity of space is good for performance; larger portions do that
    • too many portions increases size of management tables
    • fixed-size portions simplifies reallocation
    • variable or small-fixed size portions minimizes wasted space
    • as a result, choose variable and large contiguous portions, or small fixed size portions (blocks)
    • with variable size, can do first-fit, best-fit, nearest-fit (nearest to other parts of same file), mostly like chapter 7
  • three methods for actual file allocation methods
    • contiguous allocation finds contiguous set of blocks; requires preallocation
      • easy to tell where desired block of file will be
      • external fragmentation occurs
    • chained allocation stores each file as a linked list
      • have to iterate through file blocks to find required block; not necessarily big loss if files accessed sequentially anyway
      • easy to add blocks as needed (dynamic allocation)
    • indexed allocation uses an index
      • index kept in separate block, points to all blocks in file
      • most popular because it's literally just the best
  • free space is managed by disk allocation table
    • can be bitmap, which is simplest and least overhead; also makes it easy to find contiguous blocks
    • bitmap can be too large to search through though; instead chain together free blocks
      • often chain of free blocks is very fragmented, so new allocations will be fragmented too
    • indexing treats free space as a file and uses index table
      • index table should be based on variable-size free portions so it doesn't have to index every block
    • free block list assigns number to each block and maintains list of numbers
      • this can get really big, too large to store in main memory; treat as stack and keep only top of stack in memory
      • could also treat as queue, keeping only head of queue in main memory
  • volume is logical disk
    • often, one volume is one disk, but sometimes divide disk into partitions that each appear as a volume
    • can also treat multiple disks as a single volume
  • stuff can get corrupted if system crashes; have to be careful about write order to prevent problems

12.8 UNIX File Management

  • six types of files on UNIX:
    • regular/ordinary: arbitrary data in zero or more data blocks
    • directory: list of file names plus pointers to inodes
    • special: no data, provide mechanism to map to physical devices
    • named pipes: interprocess communications facility
    • links: alternative name for existing file
    • symbolic links: data file that contains name of linked file
  • all files administered by inodes (index nodes), which contain key information about file:
    • type and access mode
    • file owner and group access identifiers
    • creation time, read time, write time, inode update time
    • size of file in bytes
    • sequence of block pointers
    • number of physical blocks
    • number of directory entries that reference file
    • kernel and user-settable flags
    • generation number of file (random number assigned to inode; used to detect deleted files)
    • blocksize (sometimes larger than file system blocksize)
    • plus additional attribute entries
  • there is inode table/list on disk that contains inodes of all files in file system
  • file allocation is multi-level; 12 direct data blocks, then single-, double-, triple- indirect layers which work as expected
    • keeps inode fixed size
    • small files accessed with little indirection
    • very large maximum file size
  • directories are normal directories, references to contained inodes
  • UNIX file system has single partition with boot block, superblock (info about file system), inode table, and data blocks

12.9 Linux Virtual File System

Not included on exam.

  • Linux has virtual file system, which present single uniform interface to whatever file system is used (so you can swap ext4 out for, like, eCryptfs-GCM)
  • files have symbolic names, owner, protection, and other properties
  • need to map logical operations to operation specific to the OS
  • make system call for file operation, which is passed to mapping function for file system
    • usually mapping function is pretty simple (1:1), but can be more complex
    • for example, FAT directories are not files, so directory access requests have to be translated
  • VFS is object-oriented but written in C, so objects are just structs
    • superblock object: mounted file system
      • has device, block size, dirty flag, file system type, flags, pointer to root directory
    • inode object: specific file
      • holds all the information about named file except for name and actual contents
      • also has inode operations object which describes available operations
    • dentry object: directory entry, either directory or file name
      • pointer to inode and superblock, as well as partent and child dentrys
    • file object: open file associated with process
      • associated dentry object, file system, usage counter, user id, group id, file pointer
      • also includes inode operations object
  • three caches are used to improve performance:
    • inode cache: stores recently visited inodes
    • directory cache: caches mapping between directory names and inode numbers; speeds up ls
    • buffer cache: independent of file systems, integrated into data buffer mechanisms
      • basically cache closer to drive level

12.10 Windows File System

Not included on exam.

12.11 Android FIle Management

Not included on exam.

15 Operating System Security

15.1 Intruders and Malicious Software

  • processes have associated privileges regarding memory, files, instructions
  • root access is highest level of privilege
  • intruders can be:
    • masquerader: not authorized to use system, but penetrates access controls to exploit legitimate user
    • misfeasor: legitimate user who accesses unauthorized materials, or who misuses authorization
    • clandestine user: seizes control of system and uses control to evade auditing and access controls, supress audit collection
    • can exploit software bugs, or acquire protected information e.g. passwords
  • malicious software is programs that exploit vulnerabilities
    • parasitic malware is a fragment of some legit program; virus, logic bomb, backdoor
    • malware can also be independent, scheduled to run by OS; worm, bot program
  • malware can be activated by trigger (logic bomb, backdoor, bot program), others replicates itself to be activated later (virus, worm)
  • intrustion detection is a countermeasure
    • can be host-based IDS (monitor host and events on host for suspicious activity) or network-based (monitor network traffic)
    • three logical components:
      • sensor: collects data (from network packets, log files, system call traces)
      • analyzer: analyzes data from sensors, determines if intrusion occurred
      • user interface: view output from system, configure behaviour
  • authentication is another countermeasure
    • consists of identification step and verification step
    • for ways to verify user's identify:
      • something they know (password, PIN, answers to questions)
      • something they possess (keycard, smart cards, keys, all referred to as tokens)
      • something they are (static biometrics: fingerprint, retina, face)
      • something they do (dynamic biometrics: voice pattern, handwriting, typing rhythm)
    • can do two-factor authentication
  • access control is another countermeasure
    • mediate between user and resources, determine what they can and cannot access
    • must authenticate first
    • can audit to see who accessed what, who is authorized to access what
  • firewalls are another countermeasure
    • act as chokepoint between system and outside connection
    • enforces local security policy, defines what types of traffic can pass
    • firewall itself is secure against attacks

15.2 Buffer Overflow

  • memory protection protects a process' memory from other processes
  • buffer overflow is writing more to a buffer than there is space, which can overwrite other memory locations
  • can corrupt data, unexpected transfer of control, memory access violations, and probably termination
  • can even execute arbitrary code
  • stack overflow is most common type of buffer overflow:
    • gets() does not include checking on amount of data copied; can overwrite other memory locations beyond target
  • to exploit buffer overflow, need to:
    • identify buffer overflow vulnerability that can be triggered by external input
    • understand how buffer is stored in memory, potential adjacent memory that can be corrupted
  • fuzzing can be used to identify vulnerabilities
  • compilers can be used to defend against buffer overflow
    • prevent vulnerabilities at compile time
    • modern high-level languages with types and checks on permissible operations are immune to buffer overflow
    • safe coding techniques (manually identify potential vulnerabilities) can also help
    • language extensions can perform additional checks on operations
      • Libsafe is dynamic library that can load before other code and prevent buffer overflows without even recompiling program
    • stack protection mechanisms check stack frame for evidence of corruption
      • Stackguard is GCC compiler extension; adds (random) canary value below old frame pointer; if canary is overwritten, program is aborted
  • runtime defenses aim to prevent buffer overflows at runtime
    • executable address space protection tags some pages as non-executable (hardware)
      • if stack and heap are non-executable, hard to execute custom code
      • however, some languages need executable code on the stack (e.g. nested functions in C, Linux signal handlers)
    • address space randomization randomly moves key data structures
      • if stack is moved around, can't predict target buffer's address
    • guard pages are pages between critical sections of memory
      • if anything attempts to write to guard page, illegal address exception is raised, program aborts
      • can also put guard pages between stack frames, or between allocations on heap
      • can be costly in terms of memory

15.3 Access Control

  • easy to allow/disallow access to entire files based on user profile
  • might want to apply permissions on level of record, or even part of record
  • access matrix is model of permissions
    • subjects on left, objects on top, access rights in cells
    • access rights can be read, write, execute, plus functions in software objects
    • matrix usually sparse, decompose by columns and give users access control lists
    • can decompose by rows instead to get capabilitiy tickets, which lists authorized users and actions for specific resource
    • have to make sure that access matrix info is not accessible to users
  • three access control policies:
    • discretionary (DAC): check identify of user and access rules for resource
      • basically just access matrix as above
      • objects can be processes, devices, memory locations/regions, or subjects (subjects are users)
      • controller for object checks whether operation is allowed for requested on that object
      • access matrix controller controls access to the access matrix
      • copy flag on permission allows transfer of right (with or without copy flag) to another user
      • owner can grant any right to any user
      • controller of subject or owner of resource can revoke a right, read rights of subject
      • creating an object requires no rights, creator becomes owner
      • destroying an object requires owner
      • subjects can be created and destroyed like objects
    • mandatory (MAC): check security labels (how sensitive/critical resource is) with security clearances
      • designed for military systems, not covered by book
    • role-based (RBAC): check role of user and access rules for role
      • users can have multiple roles
      • one matrix for mapping users to roles, and access control matrix with roles as subjects

15.4 UNIX Access Control

  • users have unique user ID and can be a member of primary group and possibly other groups with group IDs
  • files are owned by user and belong to a group
  • each file has 12 access bits
    • (read/write/execute) for (user/group/other)
    • SetUID and SetGID: when file is executed, it executes with permissions of user/group
    • on directory, SetGID indicates that newly created files should inherit group of directory
    • Sticky bit originally indicated that file contents should be retained in memory following execution; no longer used
    • Sticky bit on directory indicates that only owner of file in directory can rename, move, delete that file
      • useful for shared temp directories
  • one user ID is the superuser who can do anything
    • be careful when setting SetUID on file owned by superuser
  • UNIX also has access control lists
    • assign list of user and group IDs to file, each with three protection bits
    • 9-bit permission field has same meaning as normal
    • group class entry acts as mask for all other permissions granted by list

15.5 Operating Systems Hardening

  • install and patch OS
    • unpatched system is vulnerable, don't connect to network while patching
    • initial installation should be minimal
    • boot process must be secure, limit which media system can boot from, reqiure password for changes to BIOS
      • cryptographic file system helps secure system against malicious boot process
    • driver installation very critical, since those execute in kernel mode
    • automatically install updates and patches
      • on critical systems, don't do it automatically, stage and validate patches on test systems
  • remove unnecessary services, apps, protocols
    • fewer packages means fewer potential vulnerabilities
    • default configuration usually optimzed for ease of use, not security, don't use default
    • uninstall scripts may fail to remove all components, so don't install in first place
  • configure users, groups, permissions
    • restrict elevated privileges to only those who need them
    • only access elevated privileges when needed (i.e. don't log in as root, use sudo)
    • decide on authentication system (local vs central)
    • change default passwords, remove or secure default accounts
    • allowable length, complexity, age of passwords can be set
  • configure resource controls
    • set permissions on data and resources
    • yeah that's pretty much it
  • install security controlls (antivirus, firewalls, IDS)
    • whitelist for applications can prevent malware, but not always possible to implement depending on system use
  • test security
    • follow checklist in security-hardening guide (apparently this is not a security-hardening guide, it's just a guide to security-hardening guides)

15.6 Security Maintenance

  • monitor and analyze logs
    • informs you of bad things that have already happened
    • ensure sufficient space is allocated for logs
    • need automated analysis of logs
  • perform regular backups
    • may be legal requirements for backups
    • backup is process of making copies of data, archive is process of retaining copies of data for long periods of time
    • decide whether online or offline, local or remote
      • if attackers compromise/destroy data, don't want them to also compromise/destroy backups
  • recover from security compromises
  • regularly test system security
  • patch and update critical software, monitor and revise configuration

15.7 Windows Security

Not included on exam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment