- processor: controls operation of computer, data processing functions
- main memory: stores data and programs, typically volatile
- I/O modules: move data between computer and external environment, e.g. disks, internet, terminals
- system bus: communciation between three systems above
- microprocessors used to be slower than multi-chip processors, but everything changed when
the fire nation attackedthey got really small and physics got involved - a single chip can now contain multiple processors
- GPUs are more efficient for calculations on arrays of data, some modern processors have vector operations too
- Digital Signal Processors handle streaming signals specifically
- other specialized processing units (encryption, audio/video codec stuff, etc.) also exist
- mobile devices have System on a Chip (SoC), where main memory, I/O devices, radios, etc. are all on same chip as main processor
- fetch intstruction, execute instruction, repeat
- instructions can be in four categories:
- processor-memory: transfer data between processor and memory
- processor-I/O: transfer data between processor and I/O
- data processing: arithmetic or logic op
- control: branches
- combinations also possible
- it's like ECE 222 but less detailed
- interrupts improve processor utilization by interrupting normal sequencing
- I/O devices very slow, so run other processes (or same process, if nonblocking) in meantime and wait for interrupt
- when starting instruction, processor checks for interrupt requests; if any, branches to handler instead of next instruction
- after interrupt, previously running process is resumed
- nine steps, five in hardware, four in software, for handling interrupts:
- device issues interrupt signal
- finish previous instruction
- test for interrupt requests, sends ack signal
- save context (PSW, PC)
- load PC with entry location of interrupt handler
- save additional state in software (e.g. registers, stack pointer, etc.)
- process interrupt (examine status info, send commands, acks, etc.)
- restore saved state
- restore PSW, PC
- interrupts often disabled during interrupt handling, though can have multiple priority levels for handlers
- access time is inversely proportional to cost, so can't make entire memory out of fast memory
- put most frequently accessed data in more expensive but faster memory
- this is possible because of locality of reference; data accessed recently will often be accessed again soon
- use various swapping strategies to move things between layers of cache based on recent references
- levels are usually processor registers, then cache (often in processor as well), then main memory (volatile), then secondary memory (disks, SSD)
- every processor cycle requires at least one main memory access to fetch instruction, so cache very important to prevent slowdown
- cache maintains copy of fixed length blocks from main memory
- blocks have tags to indicate which main-memory block it corresponds to
- look for word in cache before resorting to main memory lookup
- need mapping function, replacement algorithm, and write policy for cache; more details in memory chapter
- LRU is best
- multiple levels of cache (L1, L2, often L3) can be used
- I/O can be:
- programmed, i.e. polling
- interrupt-driven, which doesn't block the processor while doing I/O
- DMA!
- DMA module is delegated task of transfer of complete block to/from memory or I/O device
- far more efficient for multiple-word transfers
- symmetric multiprocessors (SMP) defined as:
- two or more similar processors
- share main memory and I/O facilities, interconnected by e.g. bus
- share access to I/O devices
- processors can perform same functions
- system controlled by integrated OS
- get better performance, availability, growth, scaling
- processors can have their own caches
- multicore aka chip multiprocessor has multiple cores on single chip
- basically SMP but on single chip
- locality happens because:
- besides for branches, execution is sequential
- usually some time is spent localized in single process, not always jumping around
- loops involve small numbers of instructions repeated a lot
- array elements are usually referenced in sequence
- spatial locality is tendency to access memory locations that are clustered
- temporal locality is tendency to access same memory location again soon
- can calculate average access time using times for each memory and hit ratio
- average cost per bit depends on costs per bit in each memory and sizes of memory
- access efficiency is access time for M1 over average access time; function of hit ratio
- hit ratio is much higher than S1/S2 when there is lots of locality
- objectives of OS are (ACE):
- convenience
- efficiency
- ability to evolve
- convenience is enhanced by following services:
- libraries for I/O, file management, etc.
- utilities for launching and managing tasks
- handling I/O devices
- controlling file access, system access
- error detection
- accounting
- key interfaces in system are:
- ISA: machine language instructions
- ABI: standard for binary portability across platforms; system call interface
- API: program access to hardware resources and services; high-level language library calls
- efficiency is enhanced by managing resources
- OS kernel does scheduling, allocation, etc. to efficiently manage resources
- ease of evolution enhanced by allowing:
- hardware upgrades via drivers
- new services via OS updates
- bug fixes via OS updates
- serial processing: manually run batch jobs, one after another
- simple batch systems: monitor automatically runs next job after job completes
- can do compilation of user programs by loading compiler
- can also support memory protection, timer, privileged instructions, interrupts
- programs execute in user mode, monitor executes in kernel mode
- multiprogrammed batch system: interleave processes for greater efficiency
- becuase I/O takes a lot of time
- requires more advanced memory management and I/O management
- time-sharing systems: handle multiple interactive jobs
- minimize response time and accept commands from terminal
- memory protection requirements
- process:
- job in execution, instance of program running, entity that can be executed on processor
- multiprogramming requires ability to switch between processes
- time-sharing also requires this
- real-time systems also require this
- lots of problems result from multiple processes:
- improper synchronization
- failed mutual exclusion
- nondeterminate program operation (things can execute in ~random order)
- deadlocks
- process is systematic way to manage information about task
- store program, data needed by program, execution context
- threads are extension of this concept, discussed later
- memory management:
- five responsibilities:
- process isolation required when multiple processes in system
- automatic allocation/management: program shouldn't have to do its own memory allocation
- support modular programming: create, destroy, alter size of modules dynamically
- protection/access control: some memory is shared, but some is not shared
- long-term storage: manage memory on hard drives
- use virtual memory with virtual/logical addresses that get translated to real addresses
- five responsibilities:
- information protection and security:
- availability: protect system from interruption
- confidentiality: can't read unauthorized data
- data integrity: can't write unauthorized data
- authenticity: verify user identity, validity of messages/data
- scheduling and resource management:
- especially important for time-sharing
- must consider fairness, differential responsiveness, efficiency
- round robin plus interrupts is pretty good
- microkernel architecture:
- OS was traditionally large monolithic kernel with all the features
- shift towards microkernel, which has only essential functions in kernel, rest in user space
- more customizable, better ability to evolve
- multithreading:
- separate executable work and process data
- can parallelize work, especially on SMP or multicore systems
- SMP: better performance, availability, incremental growth, scaling (I think this list was in Chapter 1 as well)
- distributed OS: provide illusion of single main memory, secondary memory, but distribute computation across multiple computers
- object-oriented design: write OS with OOP, eases development
- reliability R(t) is probability of correct operation up to time t given that operation was correct at t=0
- mean time to failure (MTTF) is Integral[R(t), {t, 0, inf}]
- mean time to repair is how long it takes for repair or replace faulty element
- availability is MTTF / (MTTF + MTTR)
- faults can be:
- permanent: after it occurs, always present; disk hard crashes, software bugs
- temporary: not permanent
- transient: occurs only once
- intermittent: multiple, unpredictable times
- three types of redundancy for managing faults:
- spatial (physical) redundancy: multiple components that do the same thing
- temporal redundancy: repeat a function when an error occurs; effective for temporary faults
- information redundancy: error correction codes
- OS can provide:
- process isolation: more hermetic environment, because interference from other procs can cause faults
- concurrency controls: because problems cause concurrency can
- virtual machines: even more isolation, prevent some types of permanent faults
- checkpoints and rollbacks: copy of application state in backup; often used in databases
- design issues for SMP include:
- simultaneous concurrent procs/threads: kernel data structures need to account for possiblity of multiple instances of same service call running at once
- scheduling: each processor needs to have scheduling, can interfere
- synchronization: some processes might depend on other procs, also need to mutex shared resources
- memory management: need to exploit hardware parallelism; paging mechanisms need to be consistent
- reliability and fault tolerance: graceful degradation if one processor fails
- design issues for multicore:
- same issues as SMP, but multicore is EVEN MORE PARALLEL so it's just extra hard
- parallelism within application: write applications to take advantage of parallel execution
- Grand Central Dispatch on OSX/iOS is thread pool mechanism, makes this easier
- virtual machine approach: give process entire cores, let it run as if it is only program on system
- OS just acts like hypervisor, process handles own resources for most part
- originally extension to MS-DOS, replaced by Windows NT
- organizes system into:
- executive: core OS services
- I/O manager: implements all the APIs for I/O
- cache manager: for file-based I/O
- object manager: manages Windows Executive objects
- plug-and-play manager: automatic device driver installation
- power manager: shutting down idle devices, sleeping, writing all memory to disk and shutting off
- security reference monitor: manage secure access to all Objects
- virtual memory manager: virtual address management, physical memory, paging
- process/thread management: create, manage, delete procs and threads
- configuration manager: system registry
- advanced local procedure call (ALPC) facility: efficient cross-process procedure call mechanism; pretty much like RPC
- kernel: scheduling, processor management
- hardware abstraction layer (HAL): maps between generic hardware commands and platform-specific stuff
- device drivers: extend functionality of executive, mostly to do translation to HAL commands
- windowing and graphics system: GUI functions
- executive: core OS services
- four types of user-mode processes:
- special system process: user-mode OS service, can manage system in more privileged ways; session manager, auth subsystem, service manager, logon proc
- service processes: printer spooler, event logger, device driver components, network services; background user-mode activity
- environment subsystems: provide different OS "personalities"; either Win32 or POSIX; converts system calls
- user applications: EXEs and DLLs
- client/server model is used on Windows, which makes it easier to use in distributed computing; used even on single system
- kernel-based services are invoked through RPC; Executive routes request to appropriate server
- simplifies the Executive, easy to add new APIs
- improves reliability since more things are running outside the kernel
- uniform means for applications to communicate with services
- suitable base for distributed computing, just route RPC over network as required
- Windows is OO, Objects are important feature; they provide:
- encapsulation: data of object protected by object's services (methods)
- object class and instance: standard OS stuff, makes creating objects easier
- inheritance: hand coded implementation, but still good because inheritance is good
- polymorphism: common set of API functions to manipulate any object; however, not fully polymorphic, many APIs are specific to specific type
- Object Security Descriptor (SD) used to restrict access to specific user
- Objects can be named so that other processes can get a handle to them and shared them
- two categories of objects used to manage processor:
- Dispatcher object: threads can wait on these
- Control objects: manage operation of processor not managed by normal thread scheduling
- one of first OSes to be written in C rather than assembly, took advantage of high-level language features and ease of development
- UNIX commands/libraries interact with kernel through system call interface, and kernel interacts with hardware
- all interaction with hardware is done through file subsystem, which can use either character or block interface with hardware control system
- System V Release 4, BSD, and Solaris 10 are all more modern versions of UNIX
- SVR4 is very important, objective was to be uniform platform for commerical UNIX, incorporates most important features
- BSD had more academic focus, widely used, responsible for popularity of UNIX
- Mac OS X based on BSD
- Solaris 10 was Sun's version; fully preemptable, multithreaded kernel, full support for SMP, OO interface to file systems
- most successful commercial UNIX implementation
- originally project of Linus Torvalds, lots of collaboration over internet
- success is largely due to free software packages, FSF's GNU project
- highly modular, easily configured, maximum rice
- loadable modules are object files which can be linked and unlinked to kernel at runtime
- even though kernel is monolithic, modules make it feel less monolithic
- lots of components:
- signals: used to call into processes (notify of faults, e.g.)
- system calls: used to request kernel services
- scheduler: manages and schedules processes
- virtual memory: allocates and manages virtual memory for procs
- file systems: global namespace for files, directories, etc.
- network protocols: Sockets interface to TCP/IP
- character device drivers: manage devices like terminals, printers, modems
- block device drivers: manage secondary memory, mostly
- network device drivers: manage network interface cards, bridges, routers
- traps and faults: handle traps and faults generated by processor
- physical memory: manages pool of page frames in real memory
- interrupts: handle interrupts
Not included on exam.
- OS must interleave process execution, allocate resources to processes, and support interprocess communication
- program in execution, instance of program, entity that can execute on processor
- process can be uniquely characterized by:
- identifier
- state: e.g. running state
- priority
- program counter: address of next instruction in program to execute
- memory pointers: pointers to code, data associated with process, shared memory blocks
- context data: data in registers
- I/O status information: I/O requests, assigned I/O devices, files in use
- accounting information: amount of processor time, clock time, time limits, etc.
- all that information is stored in process controll block (PCB)
- PCB enables interrupting and later resuming execution as if no interruption occurred
- trace of process is sequence of instructions executed for that process
- two-state model has only running and not running; processes handled in round-robin manner
- process creation involves creation of PCB and allocation of memory space for program/data
- processes can be created (spawned) by:
- new batch job
- interactive log-on (spawn user process when they log on)
- created by OS to provide service (e.g. automatically creating printing process)
- spawned by existing process: fork! this is a child process
- processes can be terminated for these reasons:
- normal completion
- time limit exceeded (total time limit)
- memory unavailable
- bounds violation (memory)
- protection error (resources, e.g. files)
- arithmetic error (divide by zero)
- time overrun (blocked for too long)
- I/O failure
- invalid instruction (usually a result of branching to a data area)
- privileged instruction
- data misuse (type error)
- OS intervention (deadlock management, maybe)
- parent termination (parent was terminated)
- parent request (parent does the terminating)
- five state model has ready, running, blocked, new, and exit; they work as expected
- new and exit are useful for management; gives OS time to allocate/free resources
- processes in ready state still handled round-robin
- can have multiple ready queues to support different priorities
- can have blocked queues for each resource
- swapping is useful because you can accomodate more processes at once; add suspend state
- can also add ready/suspend and blocked/suspend states if even procs in ready state can be suspended
- suspension is defined by these properties:
- process not immediately available for execution
- may or may not be waiting for event (wtf textbook); can become unblocked without unsuspending
- process was placed in suspend state by agent; itself, parent, or even OS (wtf textbook)
- process will remain suspended until agent explicitly orders unsuspension (something something Newton, also wtf textbook)
- can suspend a program to inspect its state
- memory tables track both real and secondary memory
- some reserved for OS, rest used by procs
- tracks allocation of memory to processes, protection attributes, virtual memory data
- I/O tables manage I/O devices and channels
- also tracks individual I/O requests to it can route responses
- OS may also have file tables, which are like I/O tables but for files
- process tables are the list of PCBs, pretty much
- all tables may be interlinked
- tables subject to memory management as well
- collection of resources for process referred to as process image; at least some of image must be in main memory for execution
- paging can be used to make the image seem contiguous even if it's not, or even if not all of it is in main memory
- process control block information can be grouped into three categories:
- identification (number)
- state (registers, stack pointers)
- control (scheduling and state, OS data structure stuff (
mp_next
), IPC, privileges, memory management info, resource ownership) - can also have accounting, which is like control but separate
- problems in PCB are big problems, so protect PCBs by putting them behind handler routines
- processes can execute in user (unprivileged) mode, or kernel (priviliged) mode
- mode determines allowed instructions
- mode determined by bit in PSW, changed in response to service calls and returns
- process creation steps:
- assign unique PID
- allocate space for process, including its PCB
- initilize PCB
- set linkages (i.e. put in queue)
- create/expand other data structures, like accounting
- three reasons to do process switch:
- interrupt (clock, I/O, memory fault)
- trap (error or exception condition)
- supervisor call (service call by process)
- process state change steps:
- save context of processor
- update PCB
- move to appropriate queue
- select another process
- update PCB of new process
- update memory management structures (address translation)
- restore context of new process
- OS functions can execute in three ways:
- nonprocess kernel: call into OS, which has own memory and own system stack, can do whatever
- execution within user procs: process image also includes space for OS data structs, OS stack; OS functions executed within process; only process switching is external
- process-based OS: OS functions are in their own processes, only IPC and process switching is external
- nine process states:
- user running
- kernel running (move to this state from ready/in memory, then to user running)
- ready/in memory
- asleep/in memory (blocked)
- ready/swapped
- sleeping/swapped (wtf UNIX, why is it sleeping here but asleep in memory?)
- preempted (basically same as ready/in memory, but reached by being preempting as exiting kernel running)
- created (new)
- zombie (exit)
- proc 0 is swapper process, it spawns proc 1 which is init process, all other procs have proc 1 as ancestor
- divides context into three parts:
- user-level context: program, process data, stack data, shared memory
- register context: PC, PSW, stack pointer, registers
- system-level context: process table entry, U area, region table, kernel stack (for when proc executes in kernel mode)
- PCB has state, pointers, process size, user IDs, process IDs, even descriptor (for blocking), priority, signals, timers,
P_link
, memory status - U area has process table pointer, user IDs, timers, signals, login terminal, error field, return value, I/O params, file params, file descriptor table, limit fields, permission modes
- fork is used to create process
- copies process image, except for shared memory
- returns ID of child to parent, returns 0 to child
- separate resource ownership and scheduling/execution; thread/lightweight process is executable, process holds resources
- multithreading is multiple threads per process
- threads have own stacks, thread control block, access to memory/resources of process
- takes less time to create/terminate/switch between threads
- more efficient interthread communcation because don't have to invoke kernel to IPC
- threads can be used for:
- foreground vs background work
- asynchronous processing
- speed of execution (some threads can get blocked, but other work can still occur)
- modular program structure
- suspension and termination affect all threads
- thread synchronization is a concern since shared memory, addressed in chapter 5 and 6
- threads can be user-level or kernel-level
- user-level threads managed by library in user space
- blocking on messages from other threads can be handled in user space
- light weight, doesn't require switch to kernel mode to manage threads
- scheduling can be application specific
- OS independent
- if one thread blocks on system resource, kernel will block entire process
- can solve this problem with jacketing, wrap kernel requests with nonblocking versions so that kernel doesn't block whole process
- kernel-level threads are handled by kernel
- kernel scheduler manages scheduling, switching; processes request threads through API
- kernel can schedule multiple threads on multiple processors
- kernel routines can be multithreaded
- operations are more costly though
- can combine approaches, where some ULTs are mapped to KLTs but some are just in user space
- TODO: read about 1:M and M:N threads, where they can migrate between domains
- speedup from multicore is 1/((1-f) + f/N), where f of execution time is parallelizable and there are N cores
- overheads mean that sometimes multicore is actually worse
- some applications can scale really well with more cores:
- multithreaded native applications
- multiprocess applications (databases)
- Java applications (JVM is super multithreaded)
- multiinstance applications
- Valve reprogrammed Source engine to take advantage of multithreading
- found that coarse threading (major modules separate, occasionally synchronize with timeline thread) offered speedup of around ~1.2
- found that fine-grained threading (similar/identical tasks spread across processors) was hard to program
- used hybrid threading, where both types of threading are used
- assign sound mixing to own processor, e.g.
- scene rendering split up into multiple threads in hierarchy
- use single-writer-multiple-readers to allow threads to continue to execute, not block each other
- application is one or more processes
- processes hold resources, have one or more threads
- thread can be scheduled for execution
- job object is collection of processes that can be managed as a group
- thread pool is collection of worker threads
- fiber is unit of execution manually scheduled by application; run in context of thread that schedules them
- like threads within threads
- user-mode scheduling (UMS) allows applications to schedule their own threads
- more efficient for short-duration work items with few system calls
- Windows 8 changes a bunch of things:
- only one Store app can run at once
- apps can be terminated randomly, must save entire relevant state to disk each time app is suspended
- background task API lets apps perform small tasks in background, like handling push notifications
- Windows process are objects
- process can be new, or copy of existing process
- process and thread both have synchronization capabilities
- logged-on users are given security ID; this is shared with literally all processes run by user, which is very secure
- how is this different than UNIX, where we just record user who ran proc? calling it a security ID doesn't make it more secure
- process points to linked list of virtual memory blocks, used by memory manager
- process has object table with related objects, like threads and files
- thread processor affinity is set of processors that are allowed to run a thread; process processor affinity is same thing but for process as a whole
- Windows process states are:
- ready
- standby (intermediate between ready and running)
- running
- waiting (blocked)
- transition (intermediate between waiting and ready; e.g. some pages are paged out)
- terminated (exit)
- complicated stuff happens to support both Win32 and POSIX
- Windows doesn't automatically create thread for new process
- Win32 always has intiial thread, so it makes a second request for an initial thread
- POSIX doesn't support threads, so it makes a second request for an initial thread, and returns only process information to creator
- Solaris has four thread-related concepts:
- process
- user-level threads
- lightweight processes (mapping between ULTs and kernel threads)
- kernel threads (fundamental entities that can be scheduled)
- always exactly one kernel thread for each LWP, one LWP for each ULT
- can also have kernel threads that execute specific system functions, not bound to LWP
- can swap out ULT library, just have to fit LWP interface to be mapped to kernel threads
- replaces traditional UNIX processor state block with list of data blocks for LWPs; LWP block has:
- LWP identifier
- priority
- signal mask
- saved values of user-level registers
- kernel stack
- resource usage and profiling data
- pointer to kernel thread
- pointer back to process structure
- thread states are:
- IDL (new?)
- RUN (ready)
- ONPROC (running)
- PINNED (interrupted by interrupt thread)
- SLEEP (blocked)
- STOP (stopped, for debugging purposes)
- ZOMBIE (exit)
- FREE (resources released, waiting for thread data structure to be removed)
- interrupts are implemented as threads as well
- set of kernel threads dedicated to handling interrupts
- kernel controls access to dat structures, syncs among interrupts using mutexes
- have highest priority
- pin current thread when running interrupt thread
- process or task is represented by
task_struct
data structure with:- state
- scheduling information (priorities, normal/realtime, time quantum)
- identifiers
- IPC data
- links (to parent, siblings, children)
- times and timers (creation time, processor time consumed)
- file system (owned files, current/root directories of process)
- address space (virtual memory)
- processor-specific context (registers, stack info)
- process states are:
- running (consists of both ready and executing)
- interruptible (blocked, waiting for event)
- uninterruptible (blocked, waiting on hardware conditions, will not handle signals)
- stopped (halted, probably for debugging purposes)
- zombie (exit)
- no thread support, one thread per process
- instead, use pthread (POSIX thread) library, which creates kernel-level process for each ULT
- kernel-level processes can share group ID to share resources like files and memory, less need for context switching
- clone instead of fork preserves resources
- fork is just clone with clone flags cleared
- six types of namespaces (mnt, pid, net, ipc, uts, and user); namespaces provide processes with different view of system, like a virtual machine
- mnt (mount): different mount provide different views of filesystem; can only access files in visible part of filesystem
- uts (UNIX timesharing): allows group of machines within NIS domain to share config files
- ipc: isolates certain IPC resources
- pid: processes in different PID namespaces can have same PID
- net (network): isolation of resources associated with networking
- user: container with own set of UIDs, separate from parent
Not included on exam.
- multiprogramming is running multiple processes on a single processor
- multiprocessing is multiple processors
- distributed programming is running processes on multiple, distributed computer systems
- concurrency can occur within:
- multiple applications
- structured application (single app programmed as multiple concurrent processes)
- operating system structure (OS implemented as set of processes/threads)
- even if only multiprogramming, can still have concurrency problems e.g. race conditions
- bugs can be hard to find because of nondeterminism
- race conditions can be prevented by disallowing multiple instances of programs being in critical section at same time
- functioning of process (incl output) must be independent of speed of execution and other concurrent processes
- processes can have different degrees of relation to each other
- unaware: interact by competition
- use of critical resource is a critical section, needs mutual exclusion
- deadlock and starvation can happen
- indirectly aware: cooperate by sharing
- shared resources are now shared data
- reading is fine, but writing still needs mutual exclusion, deadlock and starvation still possible
- directly aware: cooperate by communication
- don't need mutual exclusion for this sort of interaction
- deadlock and starvation still possible; e.g. can have loop of processes waiting for messages from each other
- unaware: interact by competition
- requirements for supporting mutual exclusion:
- mutual exclusion is enforced; only one proc at once is in critical section
- process that halts in noncritical section doesn't matter
- no deadlock or starvation
- if nothing in critical section, things should be able to enter critical section
- no assumptions about relative speeds
- process remains in critical section for finite time
- possible to do mutual exclusion with only software, but error prone
- can (almost?) always find race condition that makes the software solution break
- can disable interrupts, which prevents any other process from running
- does not work in multiprocessor environment
- special machine instructions can do two operations as single atomic operation
compare&swap
checks whether*word
is equal totestval
; if so, it sets*word
tonewval
; always returns old value- shared bolt initialized to 0; when entering critical section, try to swap a 1 into bolt
- afterwards, clear bolt
exchange
instruction just swaps two variables- shared bolt initialized to 0; exchange with key, initialized to 1
- when swap goes through (key is 0), do critical section
- afterwards, set bolt back to 0 (equivalently, swap key with bolt again)
- hardware instructions guarantee mutual exclusion
- however, busy waiting is employed
- starvation and deadlock still possible
- want higher-level tools
- semaphore has internal count
- semWait decreases semaphore value; if value is negative, block process
- semSignal increments semaphore value; if value is not positive, unblock one of the blocked process
- initial value of semaphore is number of processes that may be in critical section at once
- binary semaphore is variant
- count is always 1 or 0
- instead of going below 0, just keep at 0
- set back to 1 if no processes remain in critical section
- mutual exclusion lock is called mutex, basically just a binary semaphore
- lock before using resource, unlock afterwards
- process that locks mutex must be one that unlocks it
- strong semaphore guarantees that process that has been blocked for the longest will be first to be unblocked
- weak semaphore does not specify order of unblocking
- typically assume strong semaphores; honestly you have to work pretty hard to make your semaphore weak
- strong semaphores prevent starvation
- 1+ producers generating some type of data, putting in buffer
- 1+ consumers taking items out of buffer
- only one agent can access buffer at once
- want to prevent taking from empty buffer or adding to full buffer
- if you use a binary semaphore, you have to use two semaphores and be careful about when you do your signals
- save local copy of count after decrementing, consume resource, and then wait for new item if consumed last item
- may not actually need to wait if more items have already been added to buffer in meantime
- general semaphores are much easier; have one for lock on buffer, another (n) to track number of elements in buffer
- producer signals n after appending, consumers wait on n before taking
- with finite buffer, have additional general semaphore (e) that tracks number of free spaces in buffer
- producers wait on e before appending, consumers signals e after taking
- semaphores implemented by OS using hardware mechanism already discussed (or even software, maybe)
- problem with semaphores is that they are scattered thorughout code; want even higher-level tool
- monitor has one or more procedures, initialization sequence, and internal data
- local data variables only accessible by monitor procedures
- process enters monitor by invoking a procedure
- only one process may execute in monitor at once; other processes that try to enter are blocked
- monitors also include synchronization tool in form of conditions
- cwait(c) blocks on condition c; csignal(c) unblocks a process blocked on c, if any, otherwise does nothing
- for producer and consumer problem, have local variables for buffer, buffer indices, and count, and conditions notfull and notempty
- append method waits on notfull if currently full, signals notempty after appending
- take method waits on notempty if currently empty, signals notfull after taking
- csignal is usually last thing in procedure; if it isn't, can block signalling proc into special urgent queue, wait until monitor becomes free again
- or be like Concurrent Pascal and forbid csignal from not being last operation in procedure
- instead of Hoare's model (described above), can use Mesa monitor instead
- csignal replaced with cnotify, which unblocks a process waiting on that condition but does not immediately run it; it will run at some later convenient time
- fewer process switches, no constraints on when cnotify is run
- must be careful to use while(blah) { wait } instead of if, since e.g. queue may have emptied again since notempty was signalled
- can also add cbroadcast, which notifies all waiting procedures
- useful if doing e.g. producer/consumer with variable length blocks, or otherwise uncertain about which/how many procs to unblock
- finally, Mesa monitor is more modular, because waiters can determine their own conditions for resuming, and signallers only need to know that some waiters might want to resume
- can send and receive messages to other processes
- three possibilities for blocking/nonblocking:
- blocking send and receive: rendezvous, tight synchronization
- nonblocking send, blocking receive: sender continues on, receiver must wait for message; most natural; don't know if message was received
- nonblocking send and receive: neither need to wait; receive returns null if no message waiting
- addressing can be:
- direct: target destination process specifically
- can require that receiver request to receive from sender explicitly (explicit address) or just accept from anyone (implicit addressing)
- indirect: send to and receive from shared data structure consisting of message queue called mailbox
- allows (one|many)-to-(one|many); many-to-one is called port; one-to-many is broadcast; one-to-one is private
- mailboxes can be owned by receiving process, or by creating process, or by OS
- direct: target destination process specifically
- messages have header (message type, destination, source, length, control info) and data
- FIFO is most common for queue, could allow receiver to look through message queue
- can do mutual exclusion by sending null message to mailbox; processes take message before entering critical section, put back afterwards
- can solve finite-buffer producer/consumer with mayproduce and mayconsume mailboxes
- initialize mayproduce with number of messages equal to capacity
- producer takes a message from mayproduce, produces, and sends produced content to mayconsume
- consumer takes content from mayconsume, consumes, and sends null message to mayproduce
- any number of readers may simultaneously read, but only one writer may write at once, and while writer is writing, no reading
- readers or writers may have priority
- annoying to do with binary semaphores
- easier to do with message passing
- see examples in text, p. 241, 243-4
- deadlocks involve conflicting needs for resources by two or more processes
- can draw joint progress diagram to see when deadlock can occur
- axes are progress of processes, with lines for lock A, release B, etc.
- shade areas of resulting grid where single resource is needed by both processes; can't enter those regions
- if stuck in concave region (can't go up or right without hitting shaded area), deadlock is inevitable
- finding clear lattice path to top right means avoiding deadlock is possible
- reusable resources can be released and reused; things like processor, I/O channels, memory, devices, files, etc.
- consumable resources are produced and consumed; things like interrupts, signals, messages, info in I/O buffers
- resource allocation graph represents processes as circles, resources as square with dot
- arrow from proc to resource is a request
- arrow from resource dot to proc is held resource
- for consumable resources, holding becomes producing
- for deadlock, need three conditions for possibility and then one more for existence
- mutual exclusion: only one process may use a resource at once
- hold and wait: a process may hold resources while waiting for other resources
- no preemption: can't forcibly take resources from processes
- circular wait (for existence): closed chain of processes such that each process holds at least one resource needed by next process in chain (i.e. cycle in resource allocation graph)
- prevent deadlocks by eliminating one of the conditions required for deadlock
- mutual exclusion: can't really prevent, but for something like reads, deadlock can't occur because multiple readers are allowed
- hold and wait: require that procs allocate all resources simultaneously
- impractical because programs are modular
- can't know in advance how many resources will be needed
- inefficient because a lot of waiting happens
- no preemption: allow preemption of resources
- processes can be suspended to reclaim memory (and sometimes other resources too)
- circular wait: define linear ordering of resources
- all processes must allocate resources in order specified by linear ordering
- inefficient like hold and wait
- dynamically prevent deadlock by disallowing process initialization or resource allocation if deadlock would become possible
- define following symbols:
- resource vector R (total amount of resource)
- available vector V (current free amount of resource)
- claim matrix C: max resource requirements for each process
- allocation matrix A: current resource allocation for each process
- most conservative algorithm would only allow process initialization if system could meet all max resource requests simultaneously
- banker's algorithm tries to keep system in safe state
- safe state is one in which there is a sequence of resource allocations that does not lead to deadlock
- unsafe state is state that is not safe
- basically, find process that can run to completion, and then free all its resources, and complete until everything is done
- requires assumptions: maximum resource req stated in advance, processes are independent (no messages), fixed number of resources, completed processes hold no resources
- detect deadlock and then resolve
- use modified version of banker's algorithm
- replace claim matrix C with request matrix Q (what is currently requested)
- perform banker's algorithm, marking processes that complete
- at end, unmarked processes are deadlocked
- recover by:
- aborting all deadlocked processes; actually very common
- back up deadlocked processes to checkpoints, then restart
- successively abort deadlocked processes until deadlock goes away
- successively preempt resources until deadlock goes away; roll back preempted processes to before they acquired those resources
- for latter two recoveries, can select process with:
- least amount of processor time so far
- least output so far
- most time remaining
- least total resources allocated
- lowest priority
- group resources into resource classes, use linear ordering strategy on classes, use specific strategy within resource
- best strategy for specific resource depends on resource:
- swappable space: prevent with hold-and-wait, or maybe deadlock avoidance
- process resources (tape drives, files): avoidance usually effective
- main memory: prevent with preemption (by swapping)
- internal resources (I/O channels): prevent with resource ordering
- circle of (five) philosophers, forks between philosophers, must claim fork on both sides to eat
- lots of solutions
- use semaphore, prevent more than four philosophers from entering room at once
- check that both forks are free before grabbing
- put down left fork if right fork is not available, then block on right fork
- pipes: circular buffer that allows processes to communicate using producer-consumer model
- messages: block of bytes with type, msgsnd and msgrcv
- each process has a mailbox
- receive by FIFO or by type
- block when sending to full queue or reading from empty queue (or queue without requested type)
- shared memory: common block of virtual memory
- each process can have read-only or read-write permissions on block
- must enforce own mutual exclusion constraints
- semaphores: super-general semaphores, can increment and decrement by values greater than 1
- kernel does all operations atomically
- semaphores are actually sets of semaphores
- single
sem_op
operation, takes operation for each semaphore in set, operation is just value- if positive, increment value of semaphore and awaken all procs waiting for increase
- if 0: if semaphore value is 0, continue; otherwise, wait on value equaling 0
- if negative and abs value lte current semaphore value, add arg to value; if result is 0, awaken procs waiting for 0
- if negative and abs value gt current semaphore value, wait on value increasing
- signals: informs proc of async event; like an interrupt with no priority
- each signal is single bit
- procs send signals to each other, or sent by kernel
- procs can respond by performing default operation, invoking signal-handler function, or ignoring signal
- all concurrency mechanisms in UNIX plus more
- also has real-time signals:
- can deliver by priority
- multiple signals can queue
- can send value along with signal
- also has atomic operations on variables, both integer and bitmap
- must be implemented on architecture that implements Linux
- on multiprocessor systems, variable is locked until operation completes
- spinlocks are common mechanism for protecting critical system
- problem is busy-waiting, but okay if resources usually freed quickly
- different spinlocks depending on interrupt status:
- plain: critical section not executed by interrupt handlers, or if interrupts are disabled
_irq
: use if interrupts are always enabled_irqsave
: use if not known whether interrupts will be enabled_bh
: something with bottom halfs (TODO understand this maybe)
- implemented differently on multiprocessor systems
- reader-writer spinlock is more complicated spinlock
- multiple readers can acquire lock, but only one writer
- counter counts number of readers, flag indicates whether spinlock is released by all readers/writers
- prioritizes readers over writers
- Linux has binary and counting semaphores
- also reader-writer semaphores: allow multiple readers but only single writers
- counting semaphore for readers but binary semaphore (mutex) for writers
- some compilers and/or hardware reorder memory accesses
- algorithms prevent problems for non-concurrent systems, but concurrent systems can have problems
rmb
barrier prevents reads from occurring across the barrier,wmb
prevents writes,mb
does both- dictate behaviour of compiler and processor, but when actually run don't do anything
- variants for compiler-only barriers and barriers that apply to hardware only if on SMP system
Not included on exam.
- frame: fixed-length block of main memory
- page: fixed-length block of data in secondary memory; can be copied to frame
- segment: variable-length block of data in secondary memory
- requirements: relocation, protection, sharing, logical organization, physical organization (RP PLS)
- relocation: to unload and reload processes, must be able to load to different locations
- branch and data references need to be translated to physical memory addresses
- protection: memory must be protected from other processes; must be handled by hardware, not OS
- sharing: processes need to be able to share memory
- logical organization: dividing user programs/data into modules (e.g. segmentation) allows:
- independent writing and compilation of modules, resolve refs at runtime
- different degrees of protection for different modules
- sharing of modules
- physical organization: cache management should be the job of the OS, not user programmer
- virtual memory uses segmentation and/or paging, but first, talk about partitioning
- internal fragmentation is unused space within partitions
- external fragmentation is unused space outside of partitions
- fixed partitioning partitions memory into blocks of fixed size
- equal-size partitions: simple, but programs can be too big, utilization is inefficient, limited max #
- variable-size partitions: can get better usage, fit large programs; need to consider assignment strategy:
- assign-to-smallest-that-fits, with queue for each partition: okay, but we can use larger partition for smaller process if needed, so use only a single queue
- dynamic partitioning: partitions resized to fit program exactly; no internal fragmentation, but inefficient due to external fragmentation and compaction
- best-fit algorithm puts process in free space closest to needed size; generally sucks
- first-fit algorithm puts process in first space large enough; generally best and fastest
- next-fit algorithm puts process in next space large enough after previous allocation; similar to first fit, may require more frequent compaction
- replacement algorithms also needed; see later notes
- buddy system: partions of size 2^K, break up and recombine as needed
- better than normal fixed and dynamic partitioning, but not as good as virtual memory; used in some parallel systems
- relocation is needed for things that aren't fixed partitioning
- store logical addresses in code (e.g. relative addresses, delta from known point), translate to physical address for access
- check bounds during address translation (protection)
- copy pages from secondary memory to frames in main memory
- maintain page table for each process
- logical addresses contain page number and offset
- translated using page table lookup to frame number and offset
- hardware must know how to do address translation with page table
- when swapping in a process, load pages into whichever free frames are available, create page table
- segments are like unequal-size pages; there is a maximum size
- segmentation usually visible to programmer, unlike paging
- fixed number of bits at beginning of logical address used to identify segement, like with paging
- however, some logical addresses will be invalid if segment is not maximum size
- segment table must store size to know whether addresses are invalid
- can combine segmentation and paging (how this is done is not described in this chapter)
TODO
- virtual memory: a scheme by which secondary memory can be addressed as though it were in main memory
- limited by address length and secondary memory size, but not by main memory size
- virtual address: logical address for virtual memory, basically
- virtual address space: virtual storage space for a given process
- address space: range of memory addresses available to process
- real address: physical address in main memory, basically
- central idea of virtual memory: if memory references are all logical addresses and process is broken into pieces (pages/segments), then not all pages need to be loaded into memory at once
- portion of process that is actually in main memory at one time is the resident set
- attempts to access memory outside of resident set generate memory access fault; OS loads appropriate piece
- more processes can be executing in main memory at once, and single process can be larger than main memory
- virtual memory isn't too inefficient because of locality
- if too much time is spent loading pieces, it's called thrashing and it's bad
- however, locality lets OS make predictions about which pieces will be needed soon when doing replacement
- page table must now indicate whether page is actually present in main memory, and whether it's been changed since being loaded
- page table can also be huge because of amount of memory managed, so it usually can't fit in main memory
- solution: put it in virtual memory, use a smaller "root page table" to to manage pages of user page table
- inverted page table: rather than using a page table for each process, hash page number into single large page table with chaining
- rows in table have page number, process identifier, control bits, and chain pointer
- translation lookaside buffer: cache for address translation; avoid having to do additional main memory access to look at page table every time
- lookup in TLB can be really fast because of hardware parallelization (associative mapping)
- CPU cache of main memory is also in this pipeline
- page size is important
- large pages have more internal fragmentation, but smaller page fault rate and page table size
- also more suitable for rotational secondary memory (transferring larger blocks at once)
- small pages are reverse of large pages
- there is an optimal middle ground for page size (measuring optimality by page fault rate)
- generally between 512 bytes and 8 Kbytes; can be as large as a Gbyte
- large pages have more internal fragmentation, but smaller page fault rate and page table size
- dynamic segment sizes make lots of things better
- can handle growing data structures
- programs can be altered/recompiled independently; each component requests segment of required size
- sharing is better (because shared segment can be exactly right size?)
- protection is more flexible (protect segments, rather than pages which are arbitrarily divided)
- unique segment table for each process
- rows contain present flag, change flag, other control bits
- also segment length, process identifier, etc.
- segment table can be big so it needs to be in main memory, like the page table
- segmentation can be combined with paging
- variable length segments that are broken into a number of fixed pages
- virtual address has segment #, page # within that segment, and offset
- protection and sharing are better with segments!
- segment can be in segment table of more than one process
- common scheme is ring-protection scheme: rings can access data of higher-number rings, but can only access services provided by lower-number rings
- OS must decide which algorithms to use; also whether to use paging, but everyone uses paging (usually in combination with segmentation)
- fetch policy determines when a page should be brought into main memory
- can just fetch only when page fault occurs
- prepaging is when additional pages are brought in (to exploit spin disks that make it more efficient to load many pages at once)
- when processes are resumed, all pages that were in main memory when it was suspended are restored
- placement policy determines where in main memory to put a process piece
- important in pure segementation, but if pages are involved the physical location doesn't really matter at all
- some research has been done on nonuniform memory access, where futher memory takes longer to access
- replacement policy determines which pages to replace when loading new pages
- distinct from resident set management (later section)
- some frames may be locked (e.g. kernel, key control structures) and will never be replaced
- optimal policy (theoretical) has omniscience and can predict exactly which page is best to remove
- LRU is LRU; expensive but almost optimal
- FIFO treats page frames as circular buffer; very simple but pretty bad
- clock policy is like FIFO, but you skip a frame if it has been used since the last time the clock went around; gets functionality close to LRU
- can use more bits to prefer pages that haven't been modified e.g., but performance costs because might have to cycle multiple times
- page buffering keeps a free page list and modified page list; pages still available for reading, but they are essentially queued for replacement when needed
- pages spend some time in buffer before being removed so that if they are used again in that time, they can be saved
- overhead is like FIFO, but performance closer to optimal
- resident set management addresses maintainance of how many pages per process should be in main memory
- too many pages are superfluous; fewer pages means that more processes can be in memory, but also increases page fault rate
- fixed-allocation gives each process a fixed number of frames, determined at process creation time; may be determined by application type
- replacements always occur within the frames assigned to that process
- variable-allocation allows variation over time of allocated frame count; high page fault indicates that more are needed, low means that less are needed
- much more overhead on part of OS
- a local replacement strategy selects frames from within those assigned to process causing fault; global chooses from anywhere
- fixed-allocation means local, variable-allocation usually means global, but...
- variable allocation with local scope can use the working set strategy
- working set is set of pages that process has referenced in past interval of time (window size)
- strategy monitors working set, periodically removes pages not in working set, only executes process if working set is in memory
- graph of working set size over time is periods of flat with big bumps between them (transient periods)
- optimal window size is hard to determine
- another strategy is the page fault frequency algorithm
- on page fault, check number of page references since last page fault; if less than F, add frame; else, remove all pages that haven't been used since last time pages were removed
- can be refined with two thresholds
- performs poorly during transient periods
- another strategy is variable-interval sampled working set
- has minimum and maximum duration of sampling interval (M, L) in virtual time
- if Q page faults occur between M and L, sample early, otherwise suspend and sample after L
- cleaning policy is opposite of fetch policy; determines when pages are written to memory (because they are dirty)
- demand cleaning writes pages to secondary memory only when they are selected for replacement
- precleaning writes pages out before they need to written out; batch is more efficient
- only a Sith deals in one cleaning policy exclusively
- when doing precleaning, write out only pages that are replaceable under replacement policy
- load control determines how many processes should reside in main memory
- processor utilization with respect to multiprogramming level is graph that looks like
/''..
- maximum utiliization occurs (experimentally) when mean time between faults is equal to mean time spent handling faults
- can monitor rate and increase or decrease this value at runtime as well
- process to be replaced when swapping in can be decided by various strategies:
- lowest-priority process
- faulting process (whichever process just caused the fault)
- last process activated (least likely to have its working set in memory)
- process with smallest resident set (least effort to remove, but penalizes strong locality)
- largest process (obtain most free frames)
- largest remaining execution window (approximates shortest-processing-time-first scheduling)
- processor utilization with respect to multiprogramming level is graph that looks like
- UNIX used to use variable partitioning with no virtual memory, but now UNIX and Solaris use paged virtual memory
- Solaris (and SVR4) separate paging system (memory management for use procs) and kernel memory allocator (which does memory management for kernel)
- paging system has very-important-to-remember data structures:
- page table is per process, rows contain page frame number, age, and various control bits
- disk block descriptor is associated with each page, which describes which swap device and block in that device contains the page
- page frame data table contains information about each frame in real memory, used by replacement algorithm
- swap-use table exists per swap device, has entry for each page on device, contains page number and reference count
- uses modified clock policy with two hands; fronthand clears reference bit, backhand moves unreferenced pages to list of pages to be paged out
- scanrate is rate at which hands scan through page list
- handspread is gap between fronthand and backhand
- hands speed up when there are fewer free pages (in order to free more)
- kernel memory allocator must support dynamic allocation for multiple kernel functions
- however, size of blocks in much smaller than typical machine page size, so full paging is inefficient
- instead, use variant of buddy system called lazy buddy system
- essentially, don't recombine blocks because block of same size might be needed soon; maintain pool of "locally free" blocks
- Linux uses a three level page table
- page directory exists per active process, is the size of a single page, rows point to pages in page middle directory
- page middle directory has entries pointing to pages in page table
- page table has entires pointing to individual virtual pages of a process
- virtual address therefore consists of four components: global directory, middle directory, page table, and offset
- 64-bit Alpha processor provides hardware support for three levels of paging
- Linux handles two-level hardware paging by defining the size of the middle directory to be 1, with optimizations done at compile time
- Linux used to use clock policy with 8-bit age variable; accesses increase age, hand sweeps decrease age
- now uses "split LRU algorithm"; splits memory into two zones, maintains active and inactive list for each zone
- daemon kswapd periodically performs page reclamation
- first reference to page sets
PG_referenced
flag, second moves it to active list PG_referenced
reset if second reference doesn't happen before daemon runs- active pages require two timeouts before being moved back to inactive list
- within kernel, Linux uses slab allocation to allocate smaller chunks within pages (this is complex and is not covered in text)
Not included on exam.
- 32-bit processes in Windows each have a separate 32-bit address space, allowing 4 Gbytes of virtual memory (though half is used by the OS)
- on 64-bit platforms, 8 Tbytes of user address space is available (Windows 7)
- a guard page at each end of user virtual space is set aside to catch errors; remainder is divided into pages
- pages can be available, reserved (not used but reserved for future use, e.g. growing stack), or committed (actually in use, possibly in main memory)
- Windows uses variable allocation local scope
- when memory is plentiful, resident sets are allowed to grow (with some beatdown on rapidly growing resident sets)
- when memory is scarce, memory manager recovers memory by removing less-used pages
- Windows 8 has a new virtual memory system to handle Windows Store interrupts
- paging holds items that haven't been accessed in a while, whereas swapping holds items that were recently taken out of memory
- essentially two levels of cache
- only Store apps can use the swap space
- whole application gets swapped
Not included on exam.
- extension to normal Linux with extra features
- ASHMem provides anonymous shared memory as file descriptors
- Pmem allocates virtual memory so that it is physically contiguous; useful for hardware without virtual memory support
- Low Memory Killer kills apps to free memory, since phones don't usually have swap capability
- long term scheduling: decision to add to pool of processes to be executed
- decision driven by desired degree of multiprogramming
- which job can be decided by FCFS, or maybe take into account priority, expected execution time, I/O reqs, etc.
- tries to keep balanced usage of different resources (processor, different I/O devices)
- medium term scheduling: decision to add to the number of processes that are in main memory
- also manages multiprogramming degree, but more dynamically
- more motivated by memory concerns though
- short term scheduling: deciding which available process will be executed by processor
- also known as dispatcher
- invoked by timer, interrupts, system calls, signals
- lots of algorithms, described in next section
- main objective is to allocate processor time to optimize some set of criteria
- criteria can be user-oriented:
- response time: time from submission of request until response begins to be received
- turnaround time: time from submission of process until its completion
- deadlines: number of processes that finish by their deadline
- predictability: jobs are consistent when run multiple times (in terms of completion time)
- or system-oriented:
- efficient utilization of resources (balance)
- throughput (number of processes completed per unit of time)
- fairness to multiple users (no user/process gets starved)
- criteria can be user-oriented:
- processes can have priorities, higher priority always chosen over lower priority
- not worth writing out details of all the algorithms, look at table 9.3 on page 405
- FCFS selects process with longest waiting time to execute; does not preempt
- process can still change if it blocks on I/O
- favours processor-bound processes over I/O-bound
- Round Robin uses time slicing to cycle through processes with preemption
- fair, no starvation, but still favours processor-bound over I/O-bound
- virtual round robin avoids unfairness with respect to processor- versus I/O-bound
- when something blocks on I/O, it gets put in separate queue; prioritized for resumption, then run for its remaining time when it blocked
- shortest process next uses estimated completion times and runs the shorted process next; no prempetion
- completion time is estimated from past completion time
- can use normal average, but exponential average (where more recent results are weighted higher) is better
- risk of starving longer processes
- shortest remaining time is like SPN, but can preempt when new process is scheduled
- takes into account time already spent running a process when estimating remaining duration
- highest response ratio next chooses process with maximum (w + s) / s
- similar to SPN or SRT, but doesn't starve long processes because they'll eventually have waited long enough
- feedback uses multiple queues
- processes are admitted into highest priority queue
- each time a process is preempted, it gets put into a lower priority queue
- lowest queue is treated in round robin fashion
- can avoid major turnaround time penalties by allowing longer time quantums on lower priority levels
- fair-share scheduling is designed to fairly distribute resources among sets of processes (e.g. processes belonging to different users)
- track CPU usage by each group of processes
- use complicated formula involving process priorities, intended distribution of processor time, and CPU usage for specific process
- traditional UNIX uses multilevel feedback within each priority level with one second preemption
- has adjustable "nice" priority, but cannot adjust to be outside its "band", which is base priority assigned by process type
- from highest to lowest, swapper, block I/O, file manipulation, character I/O, and user processes
- suited to general purpose time sharing and provide efficient use of I/O devices
- multiprocessor systems can be:
- loosely coupled/distributed: relatively autonomous systems
- functionally specialized: specific processsors for specific tasks
- tightly coupled multiprocessor: shared main memory, integrated control of OS
- granularity is frequency of synchronization between processes
- fine granularity is sub 20 instructions
- very coarse is 2000-1M
- independent is never, e.g. time sharing
- design issues include:
- assignment of procs to processors
- static allocation assigns proc to processor, never changes; short-term queue maintained for each processor
- dynamic load balancing prevents processors from being idle
- master/slave architecture has kernel running on only one processor; master responsible for all scheduling
- peer architecture allows all processors to self-schedule; OS must resolve conflicts, e.g. two procsesors trying to claim same proc
- use of multiprogramming on individual processors
- processes can still get blocked, should use multiprocessing for same reason it's normally used
- sometimes, if many processors are available, it may be better for performance to not do multiprogramming; might want all threads of a proc to be running at once
- dispatching of procs
- simpler scheduling algorithms may be acceptable since processor resources more plentiful
- assignment of procs to processors
- most systems have a single queue feeding into common pool of processors
- thread scheduling raises additional concerns
- medium-grain to fine-grain synchronization; significant interaction
- four strategies for handling multiprocessor thread scheduling
- load sharing: global queue of ready threads is maintained, not grouped by process
- distinct from load balancing, which is more long-term
- work distributed evenly across processors
- no centralized scheduler required
- global queue can be organized using priority etc., as in chapter 9
- FCFS actually does best on multiprocessing systems
- load sharing has disadvantage of central queue under mutex, worse caching (since procs switch processors), all threads of proc rarely run at same time
- gang scheduling: set of related threads scheduled to run on set of processors all at once
- designed to handle threads requiring finer-grained synchronization
- can weight run time for each proc by number of threads to minimize waste
- dedicated processor assignment: assign all proc threads to processors, run until complete
- can be viewed as extreme form of gang scheduling
- good for highly parallel system, processor utilization isn't as important as normal
- completely avoids process switching
- too many threads can be a problem, causes more preemption; limit number of active threads to number of processors
- process allocation more closely resembles memory allocation; can adapt ideas
- activity working set, which is minimum number of activities that must be thread before application is allowed to run
- process fragmentation is when there are unused processors left over, but not enough to do useful work
- dynamic scheduling: number of threads in process can change during execution
- OS is responsible for partitioning proessors among jobs; jobs uses processors currently assigned to execute subset of its runnable tasks/threads
- when jobs request processors, use idle proessors to satisfy request
- if none available, allocate single processor if the job doesn't already have one
- otherwise, wait until processors become available
- when another job releases processor, assign processors to jobs that are waiting
- one loop of round robin, then FCFS (i.e. dump remaining to first job in list)
- load sharing: global queue of ready threads is maintained, not grouped by process
- on multicore systems, reducing off-chip memory access becomes more important than proecssor utilization
- adjacent cores may share some caches, additional factor to consider
- beyond scope of text
- correctness depends not only on logical result, but also time at which results are produced
- hard real-time task must meet its deadline; unacceptable damage or fatal error if fail
- soft real-time task has associated deadline that is desireable but not mandatory
- aperiodic task can have deadline associated with start and/or finish time
- periodic task must be run "once per period T" or "exactly T units apart"
- real-time systems have five unique requirements:
- determinism: operations must be performed at fixed, predetermined times
- good measure is maximum delay from arrival of high-priority interrupt until servicing begins
- responsiveness: measures how long it takes to service interrupt (not start service)
- user control: user must have fine-grained control over task priority
- distinguish between hard and soft tasks, deadlines, etc.
- what algorithms to use for paging, swapping, keeping procs in main memory, etc.
- reliability: transient failures far more costly in real-time, must avoid
- fail-soft operation: ability to fail in a way that preserves capability and data
- a system is stable if, when it is impossible to meet all deadlines, it can still meet deadlines of most critical tasks
- determinism: operations must be performed at fixed, predetermined times
- short term scheduler is key
- enforce stricter priorities
- minimize interrupt latency
- provide more precise timing characteristics
- clock-based prempetion to ensure long-running low-priority tasks don't run during critical times
- real-time scheduling can be classified based on...
- whether it does schedulability analysis
- whether that analysis is static or dynamic
- whether the result of analysis produces a schedule or a plan
- classes of algorithms are therefore:
- static table-driven: static analysis of feasible schedules, creates a schedule that determines when tasks must begine executing
- good for periodic tasks/schedules
- earliest-deadline-first are usually this category
- static priority-driven preemptive: as above, but not schedule created; instead, priorities are assigned
- example is rate monotonic algorithm (see later)
- dynamic planning-based: feasibility determine at run time, arriving task accepted only if feasible; creates schedule or plan
- attempt to create new schedule when new task arrives, reject if not possible
- dynamic best effort: no feasibility analysis, tries to meet all deadlines and aborts processes that miss deadline
- assign priority dynamically based on characteristics of tasks
- used when tasks are aperiodic, no scheduling possible
- can't know in advance whether deadlines will be met
- static table-driven: static analysis of feasible schedules, creates a schedule that determines when tasks must begine executing
- deadline scheduling requires lots of information about processes
- ready time: when it is available for execution
- starting deadline: when it must start by
- completion deadline: when it must finish by
- processing time: time required to execute to completion
- resource requirements: that
- priority: relative importance of task (hard vs soft)
- subtask structure: task may be decomposed into hard and soft subtask(s)
- scheduling task with earliest deadline minimizes number of missed deadlines
- when starting deadline is specified, nonpreemptive scheduling makes sense; ending deadline => preemptive
- with ending deadlines, earliest-deadline-first makes most sense, implement via preemption
- with starting deadlines, best strategy is to schedule the earliest deadline to run, regardless of whether it's ready or not; once it has started, let it run to completion; no preemption needed
- rate monotonic scheduling is used for periodic tasks; highest priority goes to lowest period task
- for a given process, C/T is computation time over period, i.e. fraction of usage required
- RMS requires that sum of fractions is less than
n*(2^(1/n) - 1)
(which converges toln(2)
) to guarantee that all tasks will be scheduled successfully- however, earliest-deadline scheduling only requires that sum of fractions is less than 1, so it's better
- RMS adopted in practice because
ln(2)
is only conservative bound, can actually get 90% utiliziation - soft real-time components can absorve processor time not used by RMS
- stability easier to achieve with RMS because it involves priorities and priorities are good for stability
- priority inversion is when a high-priority task is forced to wait on a low-priority task
- e.g. low-priority task has locked a resource
- can cause problems if low-priority task never runs because medium-priority task is not blocked
- solution is priority inheritance
- can also use priority ceiling; priority of resource is one plus highest priority user, and any proc that locks the resource gets that priority
- three queues:
SCHED_FIFO
,SCHED_RR
, andSCHED_OTHER
; first two are real-time- non-real-time procs cannot execute if real-time proc is ready
FIFO
andRR
are the same, but round robin does timeslicing within the same priority level
SCHED_OTHER
scheduler did not scale very well; used single runqueue for all processors with single lock, no preemption- new scheduler for non-real-time tasks is O(1) scheduler
- maintains number of tasks, priority bitmap (which queues are nonempty), and [active/expired] priority queues for each processor
- schedule by selecting highest nonempty priority, scheduling round-robin within that priority
- if a process is interrupted during timeslice, it is put back in active queue; procs move to expired queue after complete timeslice
- when active queue is empty for a priority, swap active and expired queues
- time slices are between 10-200ms
- initial priorities assigned by user, dynamically adjusted over time
- favour I/O bound tasks because this produces good interactive response
- real time scheduler is a bit different
- real-time tasks do not have dynamic priorities
SCHED_FIFO
tasks do not have assigned timeslicesSCHED_RR
tasks are never moved to expired queue list; just returned to same priority queue
- 160 priority levels, top 60 for real-time (ensuring they run first), next 40 for kernel, last 60 for user (time-shared)
- dispatch queue associated with each priority, execute round-robin
- has bitmap for nonempty queues, just like Linux; it's called
dqactmap
- insertion of preemption points, safe place for preemption to occur
- when preemption point is reached, check
kprunrun
flag, preempt if higher-priority real-time ready process
- when preemption point is reached, check
- dynamic priority reassignment within time-shared class
- time quantum for time-shared varies based on priority, longest for lowest-priority
- real-time procs have fixed priority
- 256 priority levels into five priority classes:
- top 64 for bottom-half kernel, scheduled by interrupts
- next 64 for top-half kernel, which run until blocked or done
- next 32 for real-time user; run until blocked or preempted by higher-priority thread
- next 64 for time-sharing user; adjusts priorities based on processor usage
- last 32 for idle user, run only when no time-sharing or real-time threads to run
- latest version of FreeBSD scheduler addresses SMP/multicore support
- supoorts processor affinity (only migrate thread when necessary)
- better support for multithreading on multicore
- improve performance of scheduling, no longer function of number of threads
- queue structure split up; three queues for each processor
- each queue has 32 priority classes, internal queue for each
- two queues used for kernel, real-time, time-sharing; last queue just used for idle class
- the two queues are designed
current
andnext
- exactly like active and expired for Linux scheduling
- kernel and real-time threads always added to
current
queue - time-sharing thread assigned to
current
queue iff interactive
- interactivity scoring determines whether a thread is interactive
- ratio of run time versus voluntary sleep time is below a certain threshold => interactive
- scaling factor of max interactivity score / 2 is used to normalize results
- score is actually scaling factor times (run < sleep ? run / sleep : 1 + sleep / run)
- thread migration i.e. processor affinity
- take advantage of local caches
- don't move threads by default
- pull mechanism allows an idle processor to pull a thread from a nonidle processor
- push mechanism is a periodic task; twice per second, equalizes most-loaded and least-loaded processors in system
Not included on exam.
- group into human readable (printers, video display, keyboard, mouse), machine readable (disk drives, USB keys, sensors, controllers), and communication (digital line drivers, modems)
- can also classify by properties:
- data rate
- application: what device is being used for influences software choices
- complexity of control: complexity of required interface (simple for printer, complex for disk)
- unit of transfer: byte stream or larger blocks
- data representation: encoding schemes; character code, parity
- error conditions: types of errors, correction, reporting
- I/O can be performed in three different ways
- programmed I/O: process busy waits for I/O operation to complete
- interrupt-driver I/O: processor issues I/O command for process; can either block process until operation completes, or all it to continue running and interrupt when operation returns
- direct memory access: separate DMA module controls exchange of data, interrupts only after entire block has been transferred
- evolved from direct control, to programmed I/O, to DMA, to separate I/O processor, to separate I/O computer
- I/O channel is another term for I/O module
- to use DMA, it is given read/write bit, address of I/O device, starting location in memory, and number of words to read/write
- DMA transfers entire block and sends interrupt
- processor then handles data result
- various DMA architecture supports
- can be directly on main bus, separate from I/O devices
- can be integrated into I/O devices
- can act as interface to separate I/O bus; can only access I/O through DMA
- efficiency very important since I/O operations are often bottleneck
- generality is other important consideration; want to handle all devices in a uniform manner
- divide operation into uniform logical structure:
- logical I/O: deal with device as logical resource, not concerned with device details
- manage general I/O functions; read, write, open, close
- device I/O: convert requested operations into I/O instructions, channel commands
- also deals with buffering
- scheduling and control: active queueing/scheduling, handling interrupts, actually interacts with hardware
- logical I/O: deal with device as logical resource, not concerned with device details
- for file system, there are additional layers:
- directory management: converts symbolic file names into references to files themselves
- file system: deals with logical structure of of files; open, close, read, write
- physical organization: convert logical references to files into secondary storage addresses, accounting for track and sector structure
- when transferring a block to main memory, the page containing target addresses must be locked into main memory, which interferes with swapping considerations
- same thing happens with output
- to avoid problems, perform input in advance of requests and output some time after requests; known as buffering
- simplest buffer is single buffer
- buffer in main memory is used for I/O operations
- when transfer is complete, block is moved to user space and next block of memory is requested
- usually next block will be needed
- process can be swapped out while buffering is happening because buffering happens in system memory, not user memory
- have to be careful; if swapping process out requires use of same I/O device that is currently buffering, cannot swap out until operation completes, when it may no longer be appropriate
- double buffer has two buffers; load into one, swap pointers rather than transfer to user space; results in speedup
- circular buffer aims to smooth out flow of bursty I/O device; list of buffers treated like circular queue, essentially just double buffer with more buffers
- buffering in general smooths out peaks in demand, but does not allow I/O to keep pace with processor in long term
- processing is orders of magnitude faster than disk speed; makes sense to use some computation to optimize disk access
- three main delays associated with disk transfer:
- seek time: time to move head into correct position (select track)
- mainly optimized by reducing disk size (tracks closer together, less physical distance to move)
- typical modern seek time is under 10ms
- rotational delay: time for disk to rotate so that head reaches beginning of sector
- sum of seek and rotational is access time
- average rotational delay is 1/2r, where r is revs per second
- transfer time: time required to read the sector as it moves under head
T = b / (rN)
, where T is transfer time, b is bytes to transfer, N is byte on track, and r is revolutions per secondTa = Ts + 1/2r + b/rN
gives total access time
- seek time: time to move head into correct position (select track)
- there are also the queueing delays
- need to wait for device to become available
- need to wait for channel to device to become available, if I/O channel is shared
- some systems do seek, and then estimate when rotation will complete and suspend and resume communication accordingly; sometimes ends up missing the sector, called RPS miss
- huge delays are incurred if file is very fragmented and cannot be read in continuous transfer
- can improve disk efficiency by not doing accesses in random order
- FIFO is essentially random
- priority doesn't optimize disk access, but usually requests from high priority will be from same proc, which will be close together
- LIFO actually does pretty well, since last request probably requires least movement from previously finished request; can still be pretty random
- if track position is known, can use shortest service time first; choose request closest to current head position
- SCAN allows arm to move in one direction only, satisfying all requests as it moves
- reverse direction at end (either at last track, or with LOOK, when no more requests in that direction)
- prevents starvation that can occur in SSTF
- C-SCAN is like SCAN, but moves to beginning after last track instead of reversing
- reduces maximum delay experienced by new requests
- N-step-SCAN and FSCAN prevnent "arm stickiness"
- N-step-SCAN processes requests in batches of N; for large N, equivalent to SCAN basically
- FSCAN uses two subqueues; during scan of one queue, new requests added to other queue; process all old requests before new requests
- seven levels of RAID, but all share:
- set of physical disks viewed as single logical drive
- data distributed across disks
- redundant disk capacity is used to store parity information, can recover data if a disk fails
- RAID 0 and 1 don't support this
- RAID 0 has no redundancy, but data is distributed using striping (consecutive blocks on consecutive disks)
- supports high data transfer capacity and high I/O request rate
- RAID 1 just mirrors data using 2N disks
- read request handled by either copy, write requests done in parallel
- RAID 2 and 3 read/write all disks simultaneously at same position
- RAID 2 uses Hamming code for error correction; not usually needed
- RAID 3 uses single extra disk with parity bit, can restore data if any individual drive fails
- RAID 4 does parity like RAID 3, but on a block level
- RAID 5 does distributed parity, so the parity blocks are distributed throughout the drives
- RAID 6 does dual redundancy, so their are two parity blocks for each level
- up to two disks can fail while still allowing full recovery of data
- 30% drop in overall write performance, but read comparable to RAID 5
- can cache disk sectors in main memory
- when cached block is requested, can do transfer from within main memory, or just pass back pointer to shared memory location
- replacement policy must also be considered; LRU is best, LFU may be more appropriate
- frequency-based replacement is also a thing
- blocks are organized into stack, as in LRU; top part of stack is designated as new section
- rereferencing in new section does not increase access count
- replacement is done by selecting lowest count in old section
- can refine by having three sections instead of two; reference counts incremented on both middle and old, but only replace from old
- I/O devices associated with a special file managed by file system, read and written as files
- buffer cache is essentially disk cache
- block transfers handled through buffer cache
- maintain free list of slots in cache, device list (list of buffers associated with each disk), and driver I/O queue (lis tof buffer that are waiting for I/O on particular device)
- index into buffer cache using free list or hash table
- block replacement done using LRU; free list preserves LRU order
- separate buffering (character queue) used for character-oriented devices; producer/consumer model is used; reading character destroys it
- unbuffered I/O is done using DMA is used to maximize speed
- category of device (disk drive, tape drive, terminal, communication line, printer) determines which strategy is used
- similar to UNIX
- disk scheduler is called Linux Elevator
- essentially just LOOK, which is SCAN with reversing when no more records in current direction
- if new request is on same sector or immediately adjacent to existing request, merge requests
- if requests currently in request queue are old, insert at tail of queue
- if suitable location, insert in sorted order
- otherwise, tail of queue
- enhanced by deadline scheduler:
- read requests are more urgent than write requests because they block
- for each request, there is a expiration time, which is shorter for reads
- normally requests are taken from sorted queue, but taken from FIFO queue when head of queue is expired (along with next few requests)
- solves starvation problem, also read vs write problem
- enhanced by anticipatory I/O scheduler
- successive reads are likely, so delay slightly after serving a read to see if next read request comes in
- essentially just LOOK, which is SCAN with reversing when no more records in current direction
- maintains page cache for particular system files and virtual memory, separate buffer cache for block I/O
- page cache has benefits of allowing writing in bulk, and caches things because caches are good
- do we have to repeat these benefits every time we talk about any sort of cache?
- dirty pages written out when free memory falls below threshold, or when the dirty pages get older than a theshold
- page cache has benefits of allowing writing in bulk, and caches things because caches are good
Not included on exam.
- four I/O-related kernel components:
- cache manager: file caching for all file systems
- dynamically adjusts size of cache for particular file based on physical memory size
- lazy writer periodically writes things to disk, which allows it to be more efficient because batch writing is efficient
- like seriously, every time we talk about caches
- file system drivers: file system driver treated like any other driver, route file requests through this system, which sends to hardware device adapter
- network drivers: integrated networking capabilities for accessing remote file systems
- implemented as software drivers rather than Windows Executive
- hardware device drivers: software drivers that access hardware registers of peripheral devices
- cache manager: file caching for all file systems
- Windows I/O can be asychronous or synchronous
- async is more efficient for caller, but needs to be signalled
- signal using event on file object: proc continues and stops once it actually needs data from I/O operation, then checks file object
- signal event object: allows multiple simultaneous I/O requests against single device/file; event object created for each request, can wait on some subset of requests
- async procedure call: requests are made with user-mode routines to call when I/O completes
- I/O completion ports: used on server to optimize thread use; thread waits on completion port, kernel waits thread to handle I/O completion
- polling: async I/O requests write status and transfer count into memory, proc just checks these values
- Windows also has software report for RAID
- hardware RAID is what was discussed earlier; software RAID is non-contiguous disk space combined into logical partition(s)
- implements RAID 1 and RAID 5
- volume shadow copies are efficient ways of making snapshots of volumes for bakcup; implemented by software driver that makes copies before data is overwritten
- uses BitLocker to encrypt entire partition, supports multiple methods of supplying cryptographic key
- input and output almost always done to/from files (except for real-time)
- files have life outside individual applications, need dedicated management system
- files stored on disk
- files sharable between processes, access permissions permitting
- operations are: create, delete, open, close, read, write
- attributes maintained with metadata about persmissions, owner, creation time, access time, etc.
- file structure can be organized into hierarchy
- field is basic element of data
- record is collection of related fields
- file is collection of similar records
- database is collection of related data, probably in multiple files/tables
- database should support sensible operations
- sometimes files are just bytes
- objectives for file management system
- meet data management requirements of user; storage and operations
- guarantee that data in file is valid
- optimize performance
- provide I/O support for various storage devices
- minimize or eliminate potential for lost data
- provide standardized set of I/O interface routines to user procs
- provide I/O support for multiple users at once
- reiterate layers from previous chapter, but it's changed a bit
- device driver: interacts with device directly
- basic file system: deals with blocks of data exchanged with driver; concerned with placement
- basic I/O supervisor: responsible for I/O init and termination
- selects device, schedules disk accesses
- is part of OS
- logical I/O: enables user access to records; general purpose record I/O capabilities
- access method is main difference between systems; see 12.2
- file organization is logical structuring of records; influenced by blocking strategy, allocation strategy
- important criteria:
- short access time
- ease of update
- economy of storage
- simple maintenance
- reliability
- five organizations: pile, sequential, indexed sequential, indexed, direct (hashed)
- pile file: variable-length records, variable set of fields in each record
- have to search through entire file to find desired record
- uses space well though
- sequential file: fixed-length fields in fixed-length records
- only field values need to be stored
- key field uniquely identifies record; records sorted by key so you can binary search
- indexed sequential file: add index and overflow file
- index allows searching of key without so many cache misses etc. because index is smaller than entire file
- when record is inserted, inserted into overflow file, and pointer of preceding record already in file
- periodically merge overflow back into main file
- indexed file: employ multiple indexes to search on different keys
- don't bother sorting records, only access via index
- can also have variable-length records
- direct (hashed) file: hash key and go directly to storage location
- space penalties because of empty space
Not included on exam.
- file directory contains information about files including attributes, location, ownership, etc.
- directory itself is a file, though generally access indirectly via system routes
- must support search, create/delete file, list directory, update directory (update file attributes)
- simple list not well suited; unique naming problems, size problems, also want to conceal parts from users that may not have access
- two level scheme has master directory and subdirectory for each user
- more flexible approach is to allow nesting of subdirectories, which is what we would expect as people who have seen file systems before
- symbolic name consists of directory names and file name; this is a pathname
- working directory means you don't need to retype full path each time
- access rights to a file can be:
- none: none
- knowledge: know that the file exists
- execution: can execute file but not copy it (proprietary program)
- reading: can read file, copy; also allows execution since you could execute your own copy
- appending: can append, but can't modify or delete content
- updating: all writing
- changing protection: change access rights for other users
- deletion: deletion
- rights can be granted to specific users, groups, or everyone
- can handle simultaneous access by locking whole file, or just specific records during an update
- records are unit of structured file, blocks are unit of I/O with secondary storage
- larger blocks means more records can be transferred at once, but this is less efficient if records are accessed randomly because increased overhead
- three methods of blocking:
- fixed blocking: fixed-length records, fixed integral number stored in each block
- may be unused space at end of block
- variable-length spanned blocking: variable-length records, records can span blocks
- no unused space
- variable-length unspanned blocking: variable length records, records cannot span blocks
- usually some unused space because not every gap can be filled
- always some unused space because of how blocks fit into tracks and because of hardware-enforced gaps
- fixed blocking: fixed-length records, fixed integral number stored in each block
- allocation of blocks to files on physical disk is kind of like allocating blocks of memory in main memory
- files can grow over time; a portion of a file is a contiguously allocated block
- need data structure to keep track of portions assigned to file; example is FAT
- preallocation policy requires that maximum file size be declared at creation; dynamic allocation allocates portions as needed, which is more common because it's more flexible
- portion size is important consideration:
- contiguity of space is good for performance; larger portions do that
- too many portions increases size of management tables
- fixed-size portions simplifies reallocation
- variable or small-fixed size portions minimizes wasted space
- as a result, choose variable and large contiguous portions, or small fixed size portions (blocks)
- with variable size, can do first-fit, best-fit, nearest-fit (nearest to other parts of same file), mostly like chapter 7
- three methods for actual file allocation methods
- contiguous allocation finds contiguous set of blocks; requires preallocation
- easy to tell where desired block of file will be
- external fragmentation occurs
- chained allocation stores each file as a linked list
- have to iterate through file blocks to find required block; not necessarily big loss if files accessed sequentially anyway
- easy to add blocks as needed (dynamic allocation)
- indexed allocation uses an index
- index kept in separate block, points to all blocks in file
- most popular because it's literally just the best
- contiguous allocation finds contiguous set of blocks; requires preallocation
- free space is managed by disk allocation table
- can be bitmap, which is simplest and least overhead; also makes it easy to find contiguous blocks
- bitmap can be too large to search through though; instead chain together free blocks
- often chain of free blocks is very fragmented, so new allocations will be fragmented too
- indexing treats free space as a file and uses index table
- index table should be based on variable-size free portions so it doesn't have to index every block
- free block list assigns number to each block and maintains list of numbers
- this can get really big, too large to store in main memory; treat as stack and keep only top of stack in memory
- could also treat as queue, keeping only head of queue in main memory
- volume is logical disk
- often, one volume is one disk, but sometimes divide disk into partitions that each appear as a volume
- can also treat multiple disks as a single volume
- stuff can get corrupted if system crashes; have to be careful about write order to prevent problems
- six types of files on UNIX:
- regular/ordinary: arbitrary data in zero or more data blocks
- directory: list of file names plus pointers to inodes
- special: no data, provide mechanism to map to physical devices
- named pipes: interprocess communications facility
- links: alternative name for existing file
- symbolic links: data file that contains name of linked file
- all files administered by inodes (index nodes), which contain key information about file:
- type and access mode
- file owner and group access identifiers
- creation time, read time, write time, inode update time
- size of file in bytes
- sequence of block pointers
- number of physical blocks
- number of directory entries that reference file
- kernel and user-settable flags
- generation number of file (random number assigned to inode; used to detect deleted files)
- blocksize (sometimes larger than file system blocksize)
- plus additional attribute entries
- there is inode table/list on disk that contains inodes of all files in file system
- file allocation is multi-level; 12 direct data blocks, then single-, double-, triple- indirect layers which work as expected
- keeps inode fixed size
- small files accessed with little indirection
- very large maximum file size
- directories are normal directories, references to contained inodes
- UNIX file system has single partition with boot block, superblock (info about file system), inode table, and data blocks
Not included on exam.
- Linux has virtual file system, which present single uniform interface to whatever file system is used (so you can swap ext4 out for, like, eCryptfs-GCM)
- files have symbolic names, owner, protection, and other properties
- need to map logical operations to operation specific to the OS
- make system call for file operation, which is passed to mapping function for file system
- usually mapping function is pretty simple (1:1), but can be more complex
- for example, FAT directories are not files, so directory access requests have to be translated
- VFS is object-oriented but written in C, so objects are just structs
- superblock object: mounted file system
- has device, block size, dirty flag, file system type, flags, pointer to root directory
- inode object: specific file
- holds all the information about named file except for name and actual contents
- also has inode operations object which describes available operations
- dentry object: directory entry, either directory or file name
- pointer to inode and superblock, as well as partent and child dentrys
- file object: open file associated with process
- associated dentry object, file system, usage counter, user id, group id, file pointer
- also includes inode operations object
- superblock object: mounted file system
- three caches are used to improve performance:
- inode cache: stores recently visited inodes
- directory cache: caches mapping between directory names and inode numbers; speeds up ls
- buffer cache: independent of file systems, integrated into data buffer mechanisms
- basically cache closer to drive level
Not included on exam.
Not included on exam.
- processes have associated privileges regarding memory, files, instructions
- root access is highest level of privilege
- intruders can be:
- masquerader: not authorized to use system, but penetrates access controls to exploit legitimate user
- misfeasor: legitimate user who accesses unauthorized materials, or who misuses authorization
- clandestine user: seizes control of system and uses control to evade auditing and access controls, supress audit collection
- can exploit software bugs, or acquire protected information e.g. passwords
- malicious software is programs that exploit vulnerabilities
- parasitic malware is a fragment of some legit program; virus, logic bomb, backdoor
- malware can also be independent, scheduled to run by OS; worm, bot program
- malware can be activated by trigger (logic bomb, backdoor, bot program), others replicates itself to be activated later (virus, worm)
- intrustion detection is a countermeasure
- can be host-based IDS (monitor host and events on host for suspicious activity) or network-based (monitor network traffic)
- three logical components:
- sensor: collects data (from network packets, log files, system call traces)
- analyzer: analyzes data from sensors, determines if intrusion occurred
- user interface: view output from system, configure behaviour
- authentication is another countermeasure
- consists of identification step and verification step
- for ways to verify user's identify:
- something they know (password, PIN, answers to questions)
- something they possess (keycard, smart cards, keys, all referred to as tokens)
- something they are (static biometrics: fingerprint, retina, face)
- something they do (dynamic biometrics: voice pattern, handwriting, typing rhythm)
- can do two-factor authentication
- access control is another countermeasure
- mediate between user and resources, determine what they can and cannot access
- must authenticate first
- can audit to see who accessed what, who is authorized to access what
- firewalls are another countermeasure
- act as chokepoint between system and outside connection
- enforces local security policy, defines what types of traffic can pass
- firewall itself is secure against attacks
- memory protection protects a process' memory from other processes
- buffer overflow is writing more to a buffer than there is space, which can overwrite other memory locations
- can corrupt data, unexpected transfer of control, memory access violations, and probably termination
- can even execute arbitrary code
- stack overflow is most common type of buffer overflow:
gets()
does not include checking on amount of data copied; can overwrite other memory locations beyond target
- to exploit buffer overflow, need to:
- identify buffer overflow vulnerability that can be triggered by external input
- understand how buffer is stored in memory, potential adjacent memory that can be corrupted
- fuzzing can be used to identify vulnerabilities
- compilers can be used to defend against buffer overflow
- prevent vulnerabilities at compile time
- modern high-level languages with types and checks on permissible operations are immune to buffer overflow
- safe coding techniques (manually identify potential vulnerabilities) can also help
- language extensions can perform additional checks on operations
- Libsafe is dynamic library that can load before other code and prevent buffer overflows without even recompiling program
- stack protection mechanisms check stack frame for evidence of corruption
- Stackguard is GCC compiler extension; adds (random) canary value below old frame pointer; if canary is overwritten, program is aborted
- runtime defenses aim to prevent buffer overflows at runtime
- executable address space protection tags some pages as non-executable (hardware)
- if stack and heap are non-executable, hard to execute custom code
- however, some languages need executable code on the stack (e.g. nested functions in C, Linux signal handlers)
- address space randomization randomly moves key data structures
- if stack is moved around, can't predict target buffer's address
- guard pages are pages between critical sections of memory
- if anything attempts to write to guard page, illegal address exception is raised, program aborts
- can also put guard pages between stack frames, or between allocations on heap
- can be costly in terms of memory
- executable address space protection tags some pages as non-executable (hardware)
- easy to allow/disallow access to entire files based on user profile
- might want to apply permissions on level of record, or even part of record
- access matrix is model of permissions
- subjects on left, objects on top, access rights in cells
- access rights can be read, write, execute, plus functions in software objects
- matrix usually sparse, decompose by columns and give users access control lists
- can decompose by rows instead to get capabilitiy tickets, which lists authorized users and actions for specific resource
- have to make sure that access matrix info is not accessible to users
- three access control policies:
- discretionary (DAC): check identify of user and access rules for resource
- basically just access matrix as above
- objects can be processes, devices, memory locations/regions, or subjects (subjects are users)
- controller for object checks whether operation is allowed for requested on that object
- access matrix controller controls access to the access matrix
- copy flag on permission allows transfer of right (with or without copy flag) to another user
- owner can grant any right to any user
- controller of subject or owner of resource can revoke a right, read rights of subject
- creating an object requires no rights, creator becomes owner
- destroying an object requires owner
- subjects can be created and destroyed like objects
- mandatory (MAC): check security labels (how sensitive/critical resource is) with security clearances
- designed for military systems, not covered by book
- role-based (RBAC): check role of user and access rules for role
- users can have multiple roles
- one matrix for mapping users to roles, and access control matrix with roles as subjects
- discretionary (DAC): check identify of user and access rules for resource
- users have unique user ID and can be a member of primary group and possibly other groups with group IDs
- files are owned by user and belong to a group
- each file has 12 access bits
- (read/write/execute) for (user/group/other)
- SetUID and SetGID: when file is executed, it executes with permissions of user/group
- on directory, SetGID indicates that newly created files should inherit group of directory
- Sticky bit originally indicated that file contents should be retained in memory following execution; no longer used
- Sticky bit on directory indicates that only owner of file in directory can rename, move, delete that file
- useful for shared temp directories
- one user ID is the superuser who can do anything
- be careful when setting SetUID on file owned by superuser
- UNIX also has access control lists
- assign list of user and group IDs to file, each with three protection bits
- 9-bit permission field has same meaning as normal
- group class entry acts as mask for all other permissions granted by list
- install and patch OS
- unpatched system is vulnerable, don't connect to network while patching
- initial installation should be minimal
- boot process must be secure, limit which media system can boot from, reqiure password for changes to BIOS
- cryptographic file system helps secure system against malicious boot process
- driver installation very critical, since those execute in kernel mode
- automatically install updates and patches
- on critical systems, don't do it automatically, stage and validate patches on test systems
- remove unnecessary services, apps, protocols
- fewer packages means fewer potential vulnerabilities
- default configuration usually optimzed for ease of use, not security, don't use default
- uninstall scripts may fail to remove all components, so don't install in first place
- configure users, groups, permissions
- restrict elevated privileges to only those who need them
- only access elevated privileges when needed (i.e. don't log in as root, use sudo)
- decide on authentication system (local vs central)
- change default passwords, remove or secure default accounts
- allowable length, complexity, age of passwords can be set
- configure resource controls
- set permissions on data and resources
- yeah that's pretty much it
- install security controlls (antivirus, firewalls, IDS)
- whitelist for applications can prevent malware, but not always possible to implement depending on system use
- test security
- follow checklist in security-hardening guide (apparently this is not a security-hardening guide, it's just a guide to security-hardening guides)
- monitor and analyze logs
- informs you of bad things that have already happened
- ensure sufficient space is allocated for logs
- need automated analysis of logs
- perform regular backups
- may be legal requirements for backups
- backup is process of making copies of data, archive is process of retaining copies of data for long periods of time
- decide whether online or offline, local or remote
- if attackers compromise/destroy data, don't want them to also compromise/destroy backups
- recover from security compromises
- regularly test system security
- patch and update critical software, monitor and revise configuration
Not included on exam.