You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Need[i, j] = k ; i 表示Process_id, j表示Resource_id, k 表示instance數量
e.g. 如果Need[i, j] = k, 那麼Pi 需要多 k 個Rj的instance去完成工作
*當進程Pi要allocated requested時, 需要更改以下的state:
Available_ = Available - Request
Allocationi = Allocationi + Requesti
Needi = Needi - Requesti
Example
5 Process P0 to P4 with 3 resource types, A (10 instance ), B (5 instance), C (7 instance)
TO-DO:
先算出Need Array
從P0開始檢查available是否>=Need
不是的話, 該進程需要等待資源被Release
是的話: Available = Available + Allocation
繼續下一個process, 以circular queue的方式
When in Time T0:
Allocation Max Available Need
A B C A B C A B C A B C
P0 0 1 0 7 5 3 3 3 2 7 4 3
P1 2 0 0 3 2 2 1 2 2
P2 3 0 2 9 0 2 6 0 0
P3 2 1 1 2 2 2 0 1 1
P4 0 0 2 4 3 3 4 3 1
... After a few iterations ...
...
After Finish all:
Allocation Max Available Need
A B C A B C A B C A B C
P0 0 1 0 7 5 3 3(7) 3(5) 2(5) 7 4 3
P1 2 0 0 3 2 2 5 3 2 1 2 2
P2 3 0 2 9 0 2 10 5 7 6 0 0
P3 2 1 1 2 2 2 7 4 3 0 1 1
P4 0 0 2 4 3 3 7 4 5 4 3 1
Safe sequence: P1 | P3 | P4 | P0 | P2
Example : P1 Request more (1, 0, 2)
Do *
Available_ = Available - Request
Allocationi = Allocationi + Requesti
Needi = Needi - Requesti // don't forget to update Need Array
When in Time T0:
Allocation Max Available Need
A B C A B C A B C A B C
P0 0 1 0 7 5 3 3-1 3-0 2-2 7 4 3
P1 2+1 0+0 0+2 3 2 2 1-1 2-0 2-2
P2 3 0 2 9 0 2 6 0 0
P3 2 1 1 2 2 2 0 1 1
P4 0 0 2 4 3 3 4 3 1
Become:
Allocation Max Available Need
A B C A B C A B C A B C
P0 0 1 0 7 5 3 2 3 0 7 4 3
P1 3 0 2 3 2 2 0 2 0
P2 3 0 2 9 0 2 6 0 0
P3 2 1 1 2 2 2 0 1 1
P4 0 0 2 4 3 3 4 3 1
...
After a few iterations ...
...
Allocation Max Available Need
A B C A B C A B C A B C
P0 0 1 0 7 5 3 2(7) 3(5) 0(5) 7 4 3
P1 3 0 2 3 2 2 5 3 2 0 2 0
P2 3 0 2 9 0 2 10 5 7 6 0 0
P3 2 1 1 2 2 2 7 4 3 0 1 1
P4 0 0 2 4 3 3 7 4 5 4 3 1
allocate disk blocks for files, utilize space effectively
3 common way: contiguous, linked, indexed.
Contiguous
Each file own set of contiguous blocks of disk
Good performance in seq/ direct access
Simple
When file size grows, it has problems with finding space (dynamic storage-allocation problem -> external fragmentation)
Best-fit/ First-fit > Worst-fit
File size must be known at file creation -> overestimation cause large amount of internal fragmentation
Extent
Allocate disk block contiguously, and in extents
one file consist multiple extents
file blocks will record:
location
block count
ptr to the first block of the next extent
Internal fragmentation still a problem (if extents too large)
Extern fragmentation occur (if extents of varying size are allocated and deallocated)
Linked (List)
Use linked list data structure, blocks of one file are scattered anywhere on the disk
directory know the HEAD and TAIL
file end with NULL ptr
each node (block) contain ptr to next node (block)
No compaction - no external fragmentation
Easily grow
Bad for direct access (only good for Seq search)
Extra disk space for ptr - 4Byte ptr / 512 Byte - 0.78%
Not Reliable, ptr can be lost (bad block)
FAT (File Allocation Table)
Variation of Linked allocation
Beginning of volume has a FAT table section
Table has 1 entry for every disk block, indexed by block-id
Directory entry has block id of first block of file
Table entry indexed by that block id contain the block id of next block, until reach the last block.
unused block is value 0 in table entry. (finding 0 for allocate a new file)
FAT can be cached. Direct access time improve.
Indexed
Bring all pointers together into Index Block
Each file has own Index Block, contain Vector of pointer to its data block/ disk block addr
Direct access
No external fragmentation (any free block can satisfy a request for space allocation)
Extra space needed - Pointer overhead greater than linked list allocation
Index Block usually one disk block (512 Byte)
If file is too large, and indexed pointer not enough space to store in index block, we can:
Linked scheme - linking multiple index block together ( form a linked list)
Multilevel scheme - 1st level index block points to a set of second level index blocks, then point to data blocks.
in 32-bit system, 4K block, it can store 1,024 number of 4 byte (32-bit) pointer (point to data block). With 2-level indexing, it allow 1024*1024 = 1,048,576 data blocks and it equals a file with 4GB size.
Combined scheme
Direct blocks for small files
Indirect blocks for large files (means multilevel)
Performance
Declare file access type at creation -> select either contiguous allocation for direct-access file or
linked allocation for sequential-access file
Indexed
If the index block is already in memory, then access can be made directly. However, keeping index
block in memory requires considerable space.
If the index block is not in memory, we have to read the index block first and then desired data block
Free-Space Management
OS has free-space list to track available disk block.
Bit vector
require extra space:
if block size = 4 KB (2^12 byte)
disk size = 1 TB (2^40 byte)
n = disk_size / block_size = 2^28 bits (256 MB)
Linked (Free Space) List
Same logic of Linked allocation, connect free block with linked list
Grouping
Basically is linked list, but the different is...
continuous N free block can group as 1 node, and the N-1 is actually free, but the Nth contain the address of anther free blocks.
Counting
If contiguous-allocation algorithm or extents is used, we can use this method.
keep address of first free block and count of following contiguous free blocks
e.g. 0xACE268AE 15 means start at that adress, we have 15 contiguous free space
Efficiency
Disk-allocation and directory algorithms
Types of data kept in file’s directory entry
Pre-allocation or as-needed allocation of metadata structures
Fixed-size or varying-size data structures
Performance
keep data and metadata close
buffer cache
asynchronous
Free-behind and read-ahead – techniques to optimize sequential access
Reads frequently slower than writes, as writing contents are usually buffered in the cache,
and disk sorts its output queue according to disk address.
Remove all non-essential components -> move to user level
Make kernel smaller , easy maintain kernel codebase, easy to ported
Performance hit: IPC switching between user-mode & kernel-mode
Adding new service no need to modify kernel
Secure: most service done in user-mode (restricted privilege)
Simpler, imporve reliable
IPC method: message passing
Process scheduler
Long-term scheduler (aka job scheduler):
select the processes that should be enqueue to ready queue.
Short-term scheduler (aka cpu scheduler):
select the processes that should be executed next from ready queue (dequeue) & allocates CPU to run.
Degree of multiprogramming :
num of processes that a system can be running concurrently.
Medium term scheduler :
swapping scheduler; swap-in & swap-out.
Type of processes:
I/O bound process & CPU bound process.
Process Creation
fork()
Create new child process that duplicate entire address of parent.
-> return 0 to the child process;
-> return pid (always > 0) of that child process to parent process;
-> return -1 : no child is created, errno will be set;
exec()
Replace process's memory space with new program (used after fork()).
exit()
Terminate current process.
wait()
Suspend execution of current process until one of its children terminates; wait(NULL) means dont get the exit code of child; wait(&status) means get the exit code of child;
abort()
Terminate whole program, cause all children terminate (cascading termination), send out SIGABT signal.
Things happend in process creation
New process control block
Memory allocation
Complete copy of parent memory and I/O state (stdout, stdin, file handles)
Same
Global & local variables
Process resources
Address space
Diff
PID
Running time
Running State
Zombie
Terminated process's parent no wait() issued, then PCB will not release even the child processes are finished.
Unix not handle zombie process by default, unless:
- user issue 'kill' systmem call
- system reboot
Orphans
Parent terminated without wait() the child process finished, all child processes become orphans.
Init process will be the new parent of those orphans, and call wait() periodically to reclaim the resources of orphan.
File Descriptor (fd)
fd(0) stdin
fd(1) stdout
fd(2) stderr
Pipes
One of the IPC (inter-process-communication) mechanisms.
Allow two processes to communicate.
Unidirectional & Bidirectional
Single direction : data stream from left to right
Bi-direction : data stream from left to right and right to left are avalible
Half-duplex & Full-duplex
First one can only either read or write; Second one can read and write at a same time;
When a thread block, the entire process (other thread) will be blocked. Only one thread can access the kernel at a time, so multiple threads are unable to run in parallel on multiprocessors.
One to One
One system call blocks own one thread, more concurrency
Overhead for create kernel thread for each user thread sometimes
Many to Many
A combination of parallelism and efficiency
Blocking the kernel system calls does not block the entire process
The number of kernel threads may be specific to either a application or a particular machine
Two-level Model
Many to Many + One to One
Thread Cancellation
Asynchronous cancellation
terminate target thread at once
no matter in which statge (maybe in shared data update, it will cause data corrupt)
Deferred cancellation
keep polling to find a suitable timing to terminate target thread
Process has affinity for processor on which it is currently running, i.e. the caches remain in the processor last time can take advantages (like cache-hits) if the process is being scheduling to the processor again.
Soft affinity it the OS attempt to keep a process running on the same processor, not guaranteeing it.
Hard affinity allows a process to specify a subset of processors on which it may run.
Race condition
Multiple process modify a shared data concurrently; The result depends on the running order of the processes.
Which will cause data inconsistent
Critical Section
A segment of code, that:
process may changing shared variable
TO-DO: make sure no other can in its critical section when one is running critical section
Solutions
Consider~
Mutual Exclusion
當有一個 process 佔住 critical-section 時,其他 process 不能進入 critical section,不會有兩個 process 同時間在 critical-section 中工作。
Progress
當沒有 process 要在 critical-section 中執行時,不能阻擋其他想要進入 critical section 工作的 process 進入 critical-section,要選擇其中一個候選 process 進入 critical-section,不能空在那邊。(延遲:postponed)
Preemptive : allows preemption of process when running in the kernel mode, not free from the
race condition, and increasingly more difficult in SMP architectures.
Non preemptive: runs until exiting the kernel mode, blocks, or voluntarily yields CPU. This is
essentially free of race conditions in the kernel mode.
int turn; // indicates which process is enter to C.S.
bool flag[2]; // true means a process N is ready and request to enter C.S.
Code:
// In process P_{i}do {
flag[i] = true;
turn = j; // <------------- GG, if context switch ocurr herewhile(flag[j]==true && turn == j); // when P_{j} is accessing, P_{i} dont go in// critical section
flag[i] = false; // after P_{i} done C.S. reset it for P_{j} to used// remainder section
} while(true);
...
...
...
// In process P_{j}do {
flag[j] = true;
turn = i;
while(flag[i]==true && turn == i); // when P_{i} is accessing, P_{j} dont go in// critical section
flag[j] = false; // after P_{j} done C.S. reset it for P_{i} to used// remainder section
} while(true);
The variable, lock is initialized to False, then we have function definition:
volatilebool lock = false; // global variablebooltest_and_set(bool *target){
bool original_input = *target;
*target = true; // always set to true, maybe keep setting it...return original_input;
}
and,
do{
while (test_and_set(&lock)); /* can only get in when lock = false *//* critial section */
lock = false; // Release Lock, let other process get in/*remainder section */
} while (true);
Problems
Notices that, this solution didn't guarantee __Bounded-waiting __/ Fairness.
Anyone process may have bad luck to wait forever until they got their turn go in C.S.
The chance of acquiring the lock is not fair when processes are set free.
High bus traffic , keep setting lock to true during waiting.
volatilebool lock = false;
/** * If input is equal to expected value, then set it to new value. * It will always return the original value of param {input}.*/intcompare_and_swap(int *input, bool expected, bool new_value){
int original_input = *input;
if (*input == expected)
*input = new_value;
return original_input;
}
and,
do{
/* if lock is false, can go in C.S. */while (compare_and_sawp(&lock, false, true) == true);
/* critical section */
lock = false; // release/* remainder section */
} while (true);
Actually CAS and TAS are very similar, the difference is:
TAS change the contents of a memory location and returns its old value as a single atomic operation.
CAS atomically compares the content of a memory location to given value, then change the contents of that memory location to a new value only if their content are the same.
...which means, by using CAS, it don't keep setting lock to true during waiting.
Problems
Also same problem with previous solution, it is Not Bounded-waiting/ fairness.
To achieve, the solution is exactly same as Bounded-waiting TAS solution.
Spin Lock is one of the Synchronization Mechanism in Mutex;
__B__ut In real life, Win32 and Linux API does have Mutex Lock, and noted that Mutex = Binary Semaphore.
In Linux API:
Mutex is extremely similar to semaphore, which : when process acquire lock fail, call sleep() and enqueue it to waiting queue, this cause kernel perform context switch to another process, after another process release the lock, wake up the previous process. It is also sleep-waiting.
Must: No more than one process can execute wait() and signal() operations on same semaphore at the same time. (critical section problem by itself)
Require to disable interrupts in single-processor system would work.
Require to put wait() and signal() into critical section.
Semaphore with busy-waiting
Semaphore can be negative value but always have a chance that can never be negative value
If it is negative, then its magnitude is the number of processes currently waiting on the semaphore. (負數等於正在等待的processes數量, e.g. -10 表示10個processes正在等待)
Counting Semaphore
Use to control access to given resource consisting a finite number of instances. Semaphore value 'S' is initialize to number of available resources.
Binary Semaphore
Semaphore value can either be '0' or '1', behaves like mutex locks (spin lock).
volatileint S = 0;
voidwait(){
while( S <= 0);
S--;
}
voidsignal(){
S++;
}
Semaphore with no Busy waiting
Each semaphore is associated to a waiting queue, each node contain value and pointer to next node (just a linked list implemented Queue).
Operation
block() (a.k.a sleep() )
place the process invoking the operation on the appropriate waiting queue
wakeup()
remove one of processes in the waiting queue and put it in ready queue
/* semaphore object */typedefstruct{
int value;
structprocess *list;
}semaphore;
voidwait(semaphore *S){
S->value--;
if ( S->value < 0 ){
// add current process to waiting queueenqueue(this->process, S->list);
block(); // sleep
}
}
voidsignal(semaphore *S){
S->value++;
if (S->value <= 0){
// remove head process from waiting queue
process *P = dequeue(S->list);
wakeup(P); // wake it up
}
}
So, what is the difference between them?
Semaphore Vs Spin Lock Vs Mutex
Semaphore can set number of lock acquire (if it is allow multiple process access critical section, but side effect is Deadlock), Spin lock CANNOT
Mutex lock can only be released by the process who acquired the lock, but Semaphore did not has this restriction, any process can release the lock.
Spin Lock has bad performance in single processor system, it wastes resource, slow down the owner of the lock, and cause deadlock.
Spin Lock Good for protect a code segment, for short time.
Semaphore Good for long time and large code segment protection.
Spin lock is busy waiting (no context switch), Semaphore and Mutex is sleep waiting.
LP (Low priority process) require and accessing C.S.
MP (Medium priority process)
HP (High priority process) require C.S.
Scheduling problem that when lower-priority process holds a lock needed and accessing a C.S resource that HP wanted.
In normal situation, after LP finish and leave C.S., HP can continue execute and go in C.S.
In-case if MP come-in, MP did not request the lock and C.S, but MP still preempt LP's execution.
HP will preempt MP, but HP want C.S. and the lock is acquired by LP, HP can do nothing until LP release the lock.
Then MP keep doing thing, because MP's has high priority than LP, it cause LP's cannot not release the lock until MP finish, make HP is keep blocking.
Finally MP finish, give to LP, LP finish and release the lock, then execute HP
Result execution sequence: MP -> LP -> HP
Solution - Priority-inheritance protocol
All processes that accessing C.S. that needed by a higher priority process will inherit the higher priority process's priority number.
Once they are finished accessing C.S. , their priorities will be revert to the origin.
God Damn Problem 1 - Bounded-Buffer Problem (a.k.a Producer-consumer problem)
Assume pool consist of n buffer, each capable of holding one item.
Producer and consumer will share these of the following:
int n;
semaphore mutex = 1; // muual exclusion for accesses to buffer pool
semaphore empty = n; // count num of empty buffer
semaphore full = 0; // count num of full buffer// RECALL THAT :D/* semaphore object */typedefstruct{
int value;
structprocess *list;
}semaphore;
voidwait(semaphore *S){
S->value--;
if ( S->value < 0 ){
// add current process to waiting queueenqueue(this->process, S->list);
block(); // sleep
}
}
voidsignal(semaphore *S){
S->value++;
if (S->value <= 0){
// remove head process from waiting queue
process *P = dequeue(S->list);
wakeup(P); // wake it up
}
}
/* producer process */do {
//.../* produce an item in next_produced *///...wait(empty);
wait(mutex);
//.../* add next produced to the buffer *///...signal(mutex);
signal(full);
} while (true);
/* consumer process */do {
wait(full);
wait(mutex);
// .../* remove item from buffer to next_consumed */// ...signal(mutex);
signal(empty);
// .../* consume the item in next_consumed */// ...
} while(true);
No reader waiting, until writer has gained access to use C.S. This can cause starvation for writers, and delay the update of object.
Second variation
When any one of the writer (first) ready, perform update asap. If a writer is waiting to access the section (i.e. reader is reading), NO MORE NEW reader may start ready, and they must wait after writer update the object. After the reader who is reading is finished, give the turn to writer, after writer finish, give the turn to NEW reader
Visualize the 2nd Variation:
reader(reading) <-- writer want (wait)
...Here, new reader comes.
reader(reading) <-- writer want(wait) <--- new reader (wait)
reader(finish) <-- writer want(wait) <--- new reader (wait)
writer want(writing) <--- new reader (wait)
new reader (reading)
Always possible causing starvation
Can be solve by kernel provided Reader-Writer Locks
Multiple processes are permitted to concurrently acquire a read-writer lock in READ Mode.
Only 1 process can acquire the read-writer lock in WRITE Mode.
God Damn Problem 3: Dining (Philosophers) Sucker Problem
One of the interesting problem abstraction that I have ever seen LOL
Philosopher
thinking
eating
Non interact to each other
require 2 chopsticks to eat (can only get left and right one)
If Philosopher (process) = 5
... and Bowl of Rice (Shared Data Set) = 5
there have number of Chopsticks (Semaphore) = 5
How Deadlock occur?
The original problematic Philosopher Process (Deadlock)
// i means the (current) index of philosopherdo {
// one p must need 2 chopsticks to eat wait(chopstick[i]); // pick left firstwait(chopstick[(i+1) % 5 ]); // then right/** Eat la eat la ... **/// Eating finsihed, relase 2 chopstickssignal(chpostick[i]); // relase leftsignal(chopstick[(i+1) % 5]); // realse right// They start to think alone
} while (true);
Whats the problems:
guarantee no 2 neighbors are eating at same time
In 5 man, only 2 people (not neighbors) can eat
If 5 philosophers want to eat at the same time
each of them get the left chopstick, and waiting for the right chopstick... (or reverse)
However right chopstick is the left chopstick of the neighbor (with is already picked)
They all wait for right chopstick until one of them finish eating ... but NO ONE can do it
DEAD LOCK
Or... you can solve it by:
When philosophers who are waiting for another chopstick (already) have one on the hand, need to release the chopstick they already acquire After 5 mins
Deadlock is solve !!! but...
If that 5 people go in the restaurant at the same time, and pick left chopstick at the same time, then they will also face deadlock so wait for 5 minutes concurrently
After the waiting they begin to release their own chopstick at the same time and wait for 5 minutes together
... and they pick up their left chopstick again... LOL
endless loop, no one can eat rice
Low probability event but it does happen!
Solution
1. Waiter
Waiter is a observer know which chopsticks are using.
philosophers need waiter's permission to pick up chopsticks
Example:
If philosopher A and C are eating
-> B cannot eat because B doesn't have any chopstick next
-> D, E only have 1 chopstick next to, so cannot eat
If D want to eat and pick up chopstick, Deadlock may occur
So he need to ask Waiter, Waiter will let him wait until 2 chopstick are free (guarantee he can eat) to prevent dead lock.
It is also called Monitor solution:
// Monitor act like an observer - The Waiter
Monitor DiningPhilosophers{
enum{Thinking,Hungry,Eating} state[5];
Condition self[5];
voidpickup(int i){
state[i]=Hungry;
// check if I able to eattest(i);
// I cannot eat, so I waitif(state[i]!=Eating) self[i].wait;
}
voidputdown(int i){
// I finish eating, then I think
state[i]=Thinking;
// ask left&right neighbors// do they want or able to eat?test((i+4)%5);
test((i+1)%5);
}
/** when neighbors are not eating and I am hungry, I eat **/voidtest(int i){
if ((state[(i+4)%5]!=Eating) &&
(state[i]==Hungry)&&
(state[(i+1)%5]!= Eating]))
self[i].signal(); //dont wait, eat
}
voidinitialize(){
for(int i=0;i<5;++i)
state[i]=Thinking;
}
}
Noted that condition variable self here represent philosopher. And the waiting queue (for suspend), have at most 1 process.
each Philosopher (process) should run like:
DiningPhilosophers.pickup(i);
/** if its able to eat, go eating here **/
DiningPhilosophers.putdon(i);
2. Resource hierarchy
label a ID for chopsticks in order.
Philosophers can only pick up the chopstick which has smaller ID first, then the larger one.
After finish eating, he always release larger ID chopstick first.
If we have 5 man, and 5 of them want to eat, then the last man can not eat because the smaller ID chopstick has already been picked, so he need to wait his neighbor to finish eating (Prevent Deadlock perfectly)
3. Chandy & Misra 's Solution
Those philosophers who want to eat but did not have any chopsticks will get a chopstick coupon.
philosophers who hold chopstick coupon give his coupon to the one who finished eating, and the one give his chopstick to the one who want to eat and hold a coupon.
NO ONE can hold TWO coupons.
Make sure who have chopstick who can eat.
Monitor
Provide better management for synchronization
Schematic
shared data define inside monitor, NOT Critical section data, but something like states, status, condition variable, priority, id, etc...
Only change shared data through monitor's methods (functions)
Initialization code (like constructor)
Only one process can be active within the monitor at a time
attaching a entry queue (incoming process)
hold multiple condition variables
Condition Variables
Each condition variable represent a condition (similar to a semaphore) , and it hold a queue for processes.
Condition x, y, z ...;
So, each condition variable also have these 2 functions:
x.wait()
Move current activated process into suspend list, and it has to release monitor to the other processes.
x.signal()
Resume one of processes that invoked x.wait() earlier
Does nothing If no waiting is called before
Different with signal() from semaphore (semaphore increase number, this bring something from suspend queue out to ready queue)
Case: Handle process switch in Monitor
If P call x.signal() when Q is under x.wait() ...
Possible options
Q resumed and activated, then P become x.wait()
Signal & wait:
係指 P 一call x.signal() 嘅時侯 x 嘅waiting queue一個process (呢到嘅例子係process Q)會即刻俾人resume, 就算P未行完都會強行暫停咗, 變咗P入去waiting queue,輪到Q行
Signal & continue:
係指 P 一call x.signal()嘅時侯monitor知道process Q應該要俾人resume, 但唔係即刻resume, 而係等到P 行哂完哂之後先resume process Q
Using Semaphore to Create a Monitor?
Possible.
/** init = 1 **/
Semaphore mutex;
/** representing entry queue The signaling process also suspend itself and goto here init = 0**/
Semaphore next;
/** count the number of process suspend inside next **/int next_count = 0;
Information like these are kept in Directory structure. Part of it can be cached in RAM.
Open File Lock
Share Lock - Reader-writer lock - multiple process can access concurrently
Exclusive lock - write lock - either one can write
Single-Level directory
Two-Level directory
Separate dir for each user
NO grouping capability
Tree structured directory
Efficient searching
Group capability
Acyclic-Graph directory
Using Link (symbolic link and hard link)
Difficult to ensure no cycles in acyclic graph
General Graph directory
Allow only links to file not subdirectories
Every time a new link is added use a cycle detection algorithm to determine whether there is a cycle or not – time consuming
Chapter 11 Implementing FS
Layered FS
application program > logical FS > file organization module > baisc FS > I/O control > device driver
I/O control, device driver
device driver, interrupt handler, give disk header cmd
Basic FS
manage cache and buffer for block allocation, freeing and replacement
File organization module
translate logical block addr to physical block addr
manage free disk space, disk block allocation
Logical FS
metadata - the fs structure data of data
manage directory structure
translate file name -> file number/ file handle by maintaining File Control Block
File Control Block
Inode - in Unix file system
contain information about permission, ownership, location of file contents on the disk (it act like a linked list node)
Two things must consider to implement a FS
On disk structure
boot-loader, num of blocks, location of free blocks
Boot control block (per volume)
contain bootloader, first block, called boot-block (UFS) or partition boot sector (M$)
Volume control block (per volume)
contain volume/partition data, called super block (UFS) , or master file table (M$)
Directory structure (per FS)
file name and associate inode number
FCB (inode) (per file)
In memory information
Used for performance improvement via caching. Include Data load in mount time, update during fs operations.
mount table - contain data of mounted volume
system wide/ per-process open-file table
contain inode (File-control-block) of all or process opened file
Buffer hold inode when r/w required
Directory implementation
Linear List
simple
bad time-complexity for linear search
can improve by caching
keep ordered by linked-list / B+ tree
e.g. Btrfs use B+ tree, Ext4 use Vector Table
Hash Table
O(1) search time
collision problems
only good if size are fixed, or use chained-overflow method
... Continue with FS Allocation Methods.md
Chapter 12 mass storage system
HDD
positioning time, aka. random access time = seek time + rotational latency
seek time = time for disk arm move to desired cylinder
rotational latency = time for desired sector to rotate under disk head
Head crash = disk head making contact with disk surface. Then the Disk GG.
I/O bus
EIDE (IDE), SATA, USB Fibre Channel, SCSI, SAS
Disk controller - built-in cache
Data transfer at drive - between cache and disk surface
Data transfer to host - between cache and host controller
Host controller - end of the buses (e.g. SATA controller), using memory-mapped I/O ports, send commands via message to Disk controller perform disk I/O operations.
Performance
Average Seek time
1/3 * number of tracks
fastest: 3ms + 2ms = 5ms
slow: 9ms + 5.56ms = 14.56ms
Latency based on spindle speed
Latency = 60/RPM (a.k.a RpS)
Average Latency = latency * 0.5 (unit: s)
Average I/O time
average access time+ (amount to transfer / transfer rate) + controller overhead
e.g. 5ms(average seek time) + 4.17ms (7200 rpm disk) + 4KB(data required to transfer)/ 1Gb/sec(transfer rate) + 0.1ms (controller overhead)
= 9.27ms + 0.12ms
= 9.39ms
1KB/s = 1024*8 = 8192bps
1Gbps = 230bps = 1073741824bps
1073741824/8192
==>1 Gbps = 131072 KB/s
SSD
single-level cell (SLC), multilevel cell (MLC)
Require faster bus interface (SATA is too slow, need PCIE)
Pros
more reliable => no mechanical moving parts
much faster => no seek time and rotation latency
consumes less power
Cons
expensive per MB
less capcity
shorter life span
Magnetic Tape
extreme slow (random) access time
always used for data backup
large capacities
Disk Structure
Drive are Addressed as vector of logical blocks
logical block - smallest unit of transfers
size is usually 512 byte for traditional HDD
modern HDD/ SSD used 4K block
Vector of logical blocks is mapped into sectors of the disk sequentially
NAT (Network Attached Storage)
Implemented via remote procedure calls (RPCs)
over network protocol: TCP/UDP based
e.g. NFS, CIFS
Performance less efficient than Direct-attached storage protocols, like FC used by SAN
consume band-width; increase latency
SAN (Storage Area Network)
FC (Fibre channel) - SAN (Storage area network)
arbitrated loop (FC-AL) with address 126 devices including drives and controllers
Provide flexibility
multi hosts and multiple storage arrays can attach to same SAN
Multiple I/O requests may arrive by different processes and only one I/O request can be served at a time by disk controller. Thus other I/O requests need to wait in waiting queue and need to be scheduled.
Two or more request may be far from each other so can result in greater disk arm movement.
Hard drives are one of the slowest parts of computer system and thus need to be accessed in an efficient manner.
How to Calculate DISK ACCESS TIME?
Disk Access Time = Seek Time + Ratationla Latency + Transfer Time
More details:
Seek Time : Seek time is the time taken to locate the disk arm to a specified track where the data is to be read or write. So the disk scheduling algorithm that gives minimum average seek time is better.
Rotational Latency: Rotational Latency is the time taken by the desired sector of disk to rotate into a position so that it can access the read/write heads. So the disk scheduling algorithm that gives minimum rotational latency is better.
Transfer Time: Transfer time is the time to transfer the data. It depends on the rotating speed of the disk and number of bytes to be transferred.
FCFS (first-come-first-serve)
Just follow the request order to move disk head.
Pros:
Every request have fair chance
No indefinite postponement
Cons:
Seek time is NOT optimal
May not provide best possible service
SSTF (shortest-seek-time-first)
Choose the pending request closet to the current head position, either forward or backward.
Pros:
shortest seek time. Disk access time is optimal.
Cons:
Possible of causing Starvation
SCAN Scheduling (Elevator algorithm)
Disk arm moves into certain direction and services the coming requests coming in its path until reach the end of the disk.
Characteristic :
The requests at the mid-range are serviced more
Arriving behind the disk arm need to wait
Pros:
High throughput
Low variance of response time
Average response time
Cons:
Long waiting time for requests those locations just visited by disk arm.
C-SCAN (Circular-SCAN Scheduling)
Same as SCAN Scheduling, but when disk head reach one end, it will NOT Reverse its direction, but go to the other end of the disk, and start servicing the requests from there.
Pros:
More uniform waiting time compared to SCAN
LOOK and C-LOOK
Another version of SCAN and C-SCAN
Difference:
Disk head will not go to the end of the disk
...But only go to the Last Request location to be serviced
And also for C-LOOK, when it reach the end (last request), it will go back to other end's FIRST REQUEST
Separate disk into multiple groups of cylinders, it records its own data structure
Logical formatting
stores file-system data
Cluster
chunk of blocks
Increase efficiency
Disk I/O done in blocks
File I/O done in clusters
Raw Disk
Disk partition as a large sequential array of logical blocks, NO file-system.
For Application that want to do their own block management, keep OS out of the way
(databases for example) – raw I/O bypasses all file-system services such as space allocation, file
names and directories
Sector sparing
invisible to OS
used to handle bad blocks
RAID (Redundant Array of Independent Disks)
many cheap disk composed together
for high reliability via redundancy & high data transfer rate
mirroring
NVRAM (solid-state nonvolatile RAM) can handle power failures -> add to RAID
Parallelism & stripping
goals:
increase throughout of many small access by load balancing
reduce response time of large access
Bit-level stripping:
If we have 8 disk running bit-level stripping, write bit i of each byte to disk i.
array of 8 disks treated as a single disk with sector that are 8 times larger. access rate 8x up.
disk num should be mul of 8 or div 8
e.g. 4 disks, bit i and bit 4+i of each byte will store in same disk i (let say i = 0, then bit 0 and bit 4 will go same disks)
Block-level stripping:
a file are stripped across multiple disk
n disks, block i of file goes to disk (i mod n) + 1
Common RAID Structure
RAID 0:
no redundancy
blocks level stripping, pure performance up
RAID 1:
only redundancy
duplicate each disk, called mirroring / shadowing
RAID 2:
Hamming code, Error correction code
RAID 3:
one disk store parity
bit interleaved parity
RAID 4:
one disk store parity
block interleaved parity
RAID 5:
Distributed Parity
block level stripping
RAID 6:
similar to RAID 5, but add one more extra parity block
Hybrid RAID (0 + 1 or 1 + 0)
high performance & high redundancy
More feature
snapshot , save last update change, for recovery
replication, automatic duplication of writes between separate sites for redundancy and disaster
recovery.
hot spare, when disk failure, auto replace a new disk to RAID
main memory, cache and registers are the only storage that CPU can direct access
CPI access secondary device (HDD) via I/O controller
Hardware Address Protection
Using Base registers and Limit registers
Define the address space (start & end) of process
CPU ensure every memory access within the range
Larger or equal than Base & Smaller than Base + Limit
Address Binding
Source code address is symbolic
Compiled code address Bind to relocatable address
14 bytes start of this module
Linker / Loader bind relocatable address to physical address
74000 + 14 = 74014
Each binding map 1 address space to another 1 address space (1 to 1 binding)
Program Address Binding Stage
Compile time:
If user already know target memory address, absolute code can be generated (boot-loader); must recompile code if program RAM starting location change.
Load time:
Complier gen relocatable code if memory location is not known (most of the program). Just reload the code to incorporate changed value (linking).
Execution time:
If process can moved during execution from one memory segment to another. Need hardware support for address maps (base & limit reg), the examples are dynamic library (*.so).
Logical & Physical Address Space
Logical address - gen by CPU, virtual address
Physical address - address seen by MMU (Memory-Management-Unit)
Logical and physical addresses are the SAME in Compile-time and Load-time address binding schemes;
... but only DIFFERENT in execution-time address-binding scheme
This image show what MMU does.
Dynamic loading (a.k.a dynamic linking) provide better memory-space utilization.
Swap
Temporary memory (usually using disk drive)
Increase degree of multi-programming
Swap-in, Swap-out
Lower priority process swapped out, higher-priority process can be loaded (swapped in) and executed