vonnenaut/multithreading_c.md

## multithreading_c.md

      
    Raw
  

              multithreading_c.md
            
          
    Multi-Threaded Programming in C

pthreads


include header
#include <pthread.h>


Compile with one of the following options to link the library and allow reporting of compilation errors:
gcc -o main main.c -lpthread
gcc -o main main.c -pthread


Check return values on common functions - helps when dealing with potential issues related to creation of multithreaded programs


  pthreads Example
#include <stdio.h>
#include <pthread.h>
#define NUM_THREADS 4

void *hello(void *arg) {
    /* thread main */
    printf("Hello Thread\n");
    return 0;
}

int main(void) {
    int i;
    pthread_t tid[NUM_THREADS];
    
    for(i = 0; i < NUM_THREADS; i++) {
        /* create/fork threads */
        pthread_create(&tid[i], NULL, hello, NULL);
    }
    
    for(i = 0; i < NUM_THREADS; i++) {
        /* wait/join threads */
        pthread_join(tid[i], NULL);
    }
    
    return 0;
}

 
General Information

Overview


one mechanism for concurrent programming
takes advantage of multi-processor and multi-core system resources
allows for multiple simultaneous points of execution in a program
useful when a slow process is underway and other things can be done while waiting (network or disk transfer, etc.)
uses fork to spawn a light-weight thread whose creation, existence, destruction and synchronization primitives are cheap
uses mutexes (mutual exclusion) to deal with the common pitfall of multiple threads accessing the same shared memory location at the same time by locking access to such

mutex has two states - locked and unlocked


Common Pitfalls


deadlocks (easy to recognize, preferable to other pitfalls)
performance penalties (insidious)

 
Mutexes


used to synchronize threading in a program
used to avoid some pitfalls
simplest type of mutex is a global variable

TYPE List = REF RECORD ch: CHARl next: List END;
Var m: Thread.Mutex;
Var head: List;

LOCK m DO
    newElement^.next := headl
    head := newElement;


one thread is executing inside the LOCK clause at any instant in the above
a mutex might otherwise be placed inside a data structure as an element; in this case, the LOCK clause would begin w/an expression selecting the mutex field of the data structure

 
Condition Variables and Monitors

condition variables allow for more complex resource scheduling policies with mutexes  by allowing a thread to block until some event happens

always associated with a particular mutex and data protected by that mutex

a monitor is the combination of data, a mutex and zero or more condition variables

wait operation atomically unlocks mutex and blocks the thread
"Signal" operation does nothing unless there's a thread blocked on the condition variable, in which case it awakens at least one such blocked thread (or more than one)
"Broadcast" operation is like "Signal", except it awakens all threads currently blocked on condition variable
when a thread is awoken inside "Wait" after blocking, it re-locks the mutex, then returns; note mutex might not be available, in which case thread blocks on mutex until available

TYPE Condition;
PROCEDURE Wait(m: Mutex; c: Condition);
PROCEDURE Signal(c: Condition);
PROCEDURE Broadcast(c: Condition);

Lock Clause and Releasing Mutexes


might want to unlock mutex in certain situations, e.g., before calling down to a lower level abstraction that'll block for a long time, to avoid provoking delays for other threads that want to lock the mutex
done using Axquire(m) and Release(m) raw operations;


requires caution:

must be sure operations are correctly bracketed, even in presence of exceptions
must be prepared for possibility state of monitor's data might have changed while you had mutex unlocked


can be tricky if you called Release explicitly instead of just ending LOCK clause b/c program counter might  now depend on previous state of monitor's data, implicitly making decision that might no longer be valid, hence this approach is discouraged


can be dangerous to use separate explicit calls to Acquire(m) and Release(m) in the context of forking, e.g., might be executing with mutex held adn want to fork new thread to continue working on protected data while original thread continues w/o further access to same data, i.e., transfer holding of mutex to newly-forked thread atomically;  It is difficult to verify correct functioning of mutex and this approach will likely not work on systems whose threading facilities keep track of which thread is holding a mutex


Using Condition Variable to Schedule Shared Resources

condition variables are used when programmer wants to schedule the way muptiple threads access a shared resource and simple, one-at-a-time mutual exclusion provided by mutexes is insufficient
General pattern for condition variable usage:
WHILE NOT expression DP Thread. Wait(m,c)END;

it's good to hav ea thread re-check the expression upon waking because circumstances that awoke it may have changed, and more than one thread attempting the same action may have also awoken at the same time
this pattern is generally robust
it simplifies the programming of Signal or Broadcast calls by making extra wake-ups of other threads benign, making careful coding to ensure only correct thread awakening purely a performance matter

 
Usefulness of Broadcast calls


Broadcast awakens all threads that have called Wait, whereas Signal affects one thread alone, so if you program in the recommended style above, that o fre-checking an expression after return from Wait, substitution of Signal with Broadcast calls will not affect the correctness of your program


Broadcast can be used to simplify your program by awakening multiple threads even if you know not all can make progress, allowing less carefulness in separating different wait reasons into different condition variables.  This use trades slightly poorer performance for greater simplicity


Broadcast can also be used when a resource just made available can be used by multiple threads


Broadcast is useful in the shared/exclusive locking (or readers/writers locking) scheduling policy

used when you have shared data being read and written by multiple threads
algorithm will perform better when allowing multiple threads to read data concurrently but writing should be limited to one thread at a time


Procedure:

Any thread wanting to read data calls AcquireShared, reads data, then calls ReleaseShared
Any thread wanting to write data calls AcquireExclusive, modifies data, then calls ReleaseExclusive
Variable i is greater than zero when counting number of active readers; when negative, there is an active writer; when i is zero, no thread is using the data
If a potential reader thread inside AcquireShared finds i < 0, it must block until writer calls ReleaseExclusive
Using Broadcast is convenient in ReleaseExclusive b/c terminating writer doesn't care how many readers are able to proceed; otherwise, you could just use Signal and add a counter of number of readers waiting, calling Signal that same number of times in ReleaseExclusive
This example exemplifies many of the problems that can occur when using condition variables

Spurious Wakeups

simple usage of condition variables might awaken threads that can't make useful progress; this can happen if using Broadcast when Signal would suffice or have threads waiting on a single condition variable for multiple different reasons

unless your scheduler is unusually efficient, you should separate the blocked threads onto two condition variables, one for readers and one, writers; a terminating reader only then signals the writers' condition variable, the terminating writer, one of the two, depending on which was non-empty

Spurious Lock Conflicts

use of condition variables can lead to excessive scheduling overhead

when a terminating reader inside ReleaseSahred calls Signal, it still has the mutex locked
on a multi-processor system, the effect is likely that a potential writer is awakened inside Wait, executes a few instructions, then blocks trying to lock the mutex which is still held by terminating reader, which unlocks the mutex a few microseconds later, allowing writer to continue, costing us two extra re-schedule operations
this situation is common and has a simple solution: move Signal call to after end of lock clause

a second, more complicated case of spurious lock conflicts occurs when terminating writer calls Broadcast w/mutex held, and only one waiting reader at a time can lock the mutex to re-check and increment i, so on a multi-processor other awoken readers are likely to block trying to lock mutex

can be fixed by awakeing just one reader in ReleaseExclusive (calling Signal instead of Broadcast), and having each reader in turn awaken the next, as in

PROCEDURE AcquireShared();
BEGIN
    LOCK m DO
        readWaiters := readWaiters+1;
        WHILE i < 0 DO Thread.Wait(m, cR) END;
        readWaiters := readWaiters-1;
        i := i + 1;
    END;
    
    Thread.Signal(cR);
END AcquireShared;

 
Starvation

there is valid concern about scheduling decision fairness - are all threads equal or are some more favored?; w/mutexes, this is deal w/for you by threads implementation, typically FIFO (First-in-first-out) rule for ea. priority level

this is mostly also true for condition variables, but sometimes programmer must become invovled
starvation is the most extreme form of unfairness, where some thread will never make progress
this can happen in the reader-writer locking example if system is heavily loaded, always at least one thread wanting to be a reader, starving writers; when there's always an active reader, a writer can never proceed, remaining always blocked.  The fix could include adding a counter for blocked writers, as in:

VAR writeWaiters: INTEGER;

PROCEDURE AcquireShared();
BEGIN
    LOCK m DO
        readWaiters := readWaiters + 1;
        
        IF writeWaiters > 0 THEN
            Thread.Signal(cW);
            Thread.Wait(m, cR);
        END;
        
        WHILE i < 0 DO Thread.Wait(m, cR) END;
        readWaiters := readWaiters - 1;
        i := i + 1;
    END;
    
    Thread.Signal(cR);
END AcquireShared;

PROCEDURE AcquireExclusive();
BEGIN
    LOCK m DO
        writeWaiters := writeWaiters + 1;
        WHILE i # 0 DO Thread.Wait(m, cW) END;
        writeWaiters := writeWaiters - 1;
        i := -1;
    END;
END AcquireExclusive;


there's no limit to how complex this can become in terms of elaborate scheduling policies; programmer must exercise restraint only adding such features if really required by actual load on resources

 
Complexity

worrying about spurious wake-ups, lock conflicts and starvation makes a program more complicated

often only moving the call of Signal beyond the end of the LOCK clause is needed when load is not high enough to warrant the rest
do nonetheless explicitly consider your particular situation before dismissing the other performance enhancements mentioned above

 
Deadlock

use of condition variables introduces deadlocks, e.g., w/two resources, the following sequence produces a deadlock:
Thread A acquires resource (1);
Thread B acquires resource (2);
Thread A wants (2), so it waints on (2)'s condition variable;
Thread B wants (1), so it waits on (1)'s condition variable;


you should arrange that there's a partial order on the resources managed w/condition variables and that ea. thread wishing to acquire multiple resources does so according to this order. for ex., might order (1) before (2), then thread B wouldn't be permitted to try to acquire (1) while holding (2) so deadlock wouldn't occur


an interaction between condition variables and mutexes is a subtle source of deadlock:


VAR a,b: Thread.Mutex;
VAR c: Thread.Condition;
VAR ready: BOOLEAN;

PROCEDURE Get();
BEGIN
    LOCK a DO
        LOCK b DO
            WHILE NOT ready DO Thread.Wait(b, c) END;
        END;
    END;
END Get;

PROCEDURE Give();
BEGIN
    LOCK a DO
        LOCK b DO
            ready := TRUE; Thread.Signal(c);
        END;
    END;
END Give;

Code Breakdown:


if ready is false and thread A calls Get, it'll block on a call of Wait(b,c). This unblocks b but leaves a locked. If thread B calls Give, intending to cause a call of Signal(c), it'll instead block trying to lock a and a deadlock will ensue.


this problem, called the nested monitor problem, most often occurs when you lock a mutex at one abstraction level of your program and then call down to a lower level, which, unknown to the higher level, blocks. If this block can only be freed by a thread that's holding the higher level mutex, you'll deadlock.


It's generally risky to call into a lower level abstraction while holding one of your mutexes, unless you understand fully the circumstances under which the called procedure might block.


One solution is to explicitly unlock the mutex before calling the lower level abstraction as discussed earlier, with its own dangers likewise.


A better solution - arrange to end the LOCK clause before calling down


Using Fork - Working in Parallel

situations where you want to fork a thread:

to utilize a multiprocessor
to do useful work while waiting for a slow device
to satisfy human users by working on several actions at once
to provide network service to multiple clients simultaneously
to defer work until a less-busy time

ex. - 1 thread doing main computation, a second writing some output to file, a third waiting for (or responding to) interactive user input, a fourth running in the background to clean up your data structures (re-balancing a tree, for ex.)


need to exercise care in accessing data from interactive interface (ex. - value of current selection or contents of editable text areas) since these values might change once you start executing asynchronously. This is a difficult design issue and each system tackles it differently


often a single housekeeper thread to oversee requests is preferable; it can be programmed to maintain a data structure in an optimal form, to merge similar requests into a single action or to restrict itself to run not more often than a chosen periodic interval


Pipelining

a technique for optimal use of resources on a multiprocessor system which utilizes a chain of producer-consumer relationships, called pipelines
Example:

when thread A initiates an action, it simply enqueues a request in a buffer
thread B takes the action from the buffer, performing part of the work, then enqueueing it in a second buffer
thread C takes it from there and does the rest of the work

This forms a 3-stage pipeline; the 3 threads operate in parallel except when synchronizing to access the buffers, so this pipeline is capable of utilizing up to 3 processors
At its best, pipelining can achieve almost-linear speed-up and can fully utilize a multi-processor

  Example 3-Stage Pipeline
TYPE RasterList = REF RECORD ch: CHAR; next: RasterList END;
TYPE PaintList = REF RECORD r: Bitmap; next: PaintList END;
VAR rasterTail: RasterList;
VAR paintTail: PaintList;
VAR m: Thread.Mutex;
VAR c1, c2: Thread.Condition;

PROCEDURE PaintChar(arg: CHAR);
    VAR this: RasterList;
BEGIN
    NEW(this);
    this^.ch := arg; this^.next := NIL;
    (* Enqueue request for Tasterize thread ... *)
    LOCK m DO
        rasterTail^.next := this; rasterTail := this;
    END;
    
    Thread.Signal(c1);
    (* Now return to our caller... *)
END PaintChar;

PROCEDURE Rasterize(init: RasterList);
    VAR last: RasterList;
    VAR this: PaintList;
BEGIN
    last := init;
    LOOP
        (* Wait for RasterList request and dequeue it... *)
        LOCK m DO
            WHILE last^.next = NIL DO Thread.Wait(m, c1); END;
            last := last^.next;
        END;
        
        (* Convert character to bitmap... *)
        NEW(this);
        this^.bitmap := Font.Map(last^.ch); this^.next := NIL;
        (* Enqueue request for painter thread *)
        LOCK m DO
            paintTail^.next := this; paintTail := this;
        END;
        
        Thread.Signal(c2);
    END;
END Rasterize;

PROCEDURE Painter(init: PaintList);
    VAR last: PaintList;
BEGIN
    last := init;
    LOOP
        (* Wait for PaintList request and dequeue it *)
        LOCK m DO
            WHILE last^.next = NIL DO Thread.Wait(m, c2); END;
            last := last^.next;
        END;
        
        (* Paint the bitmap *)
        Display.PaintBitmap(last^.bitmap);
    END;
END Painter;

NEW(rasterTail); rasterTail.next := NIL;
NEW(paintTail); paintTail.next := NIL;
Thread.Fork(Rasterize, rasterTail);
Thread.Fork(Painter, paintTail);


There are 2 problems with pipelining:

need to be careful about hwo much of the work gets done in ea. stage; ideally, stages are equal, achieving this ideal requires hand-tuning and re-tuning as the program changes
the # of stages determines statically the amount of concurrency - if you know the # of processors you have and exactly where real-time delays occur, this is fine; for more flexible and portable environs, it can be a problem

Pipelining is a powerful technique w/wide applicability despite these problems
Impact of Your Environment

design of OS and runtime libs wll affect extent of usefulness and desirability in using forked threads

you'll need to know some performance parameters of your system's threads implementation

What is cost of creating a thread?
cost of keeping a blocked thread in existence?
cost of a context switch?
cost of a LOCK clause when the mutex is not locked?


knowing the above, you can decide the feasibility/usefulness of adding extra threads to your program

Potential Problems with Adding Threads


significantly more threads than processors degades performance

this is b/c most thread schedulers are quite slow at making general re-scheduling decisions
adding threads just to improve program's structure (driving slow devices, responding to mouse clicks speedily or RPC invocations) won't cause you to encounter  this problem
adding threads for performance purposes (multiple actions in parallel, deferring work, utilizing multi-processors), you need to worry about overloading the system


Using Alerts

Alerts are used to cause the termination of a long-running computation or a long-term wait
ex. - fork multiple competing algorithms to solve a problem and when the first completes, abort the rest; or a long computation might start but then the user clicks cancel

alerts must be used with great restraint or they'll make your program undreadable, unmaintainable and possibly incorrect
alternative options to using alerts include setting a boolean flag and signaling a condition variable
they're most useful when you don't know exactly what's going on

Additional Techniques for Using Threads

Up-calls
maintaining a top-down layering-of-abstractions approach where lower levels don't make calls upward but only downward, in certain contexts like the receiving end of network communications, results in poor performance due to context switches at each layer
The alternative technique is up-calling which is maintaining a pool of threads willing to receive incoming data link layer (ethernet) packets

receiving thread dispatches on thernet protocol type and calls up to network layer (IP) where it dispatches again and calls up to transport layer (TCP) where there's a final dispatch to appropriate connection; on some systems this extends into the application; this is a high-performance implementation due to elimination of context switching - all top-performing network implementations are structured this way
this benefit is paid for by complexity - there is now an up-call interface in addition to the down-call interface and the synchronization problem is now more delicate

in purely top-down designs, it's ok to hold one layer's mutex while calling a lower layer (unless lower layer might block on a condition variable and cause the sort of nested monitor deadlock discussed above), but w/up-calls, this can easily provoke a deadlock involving just the mutexes - if an up-caling thread holding a lower level mutex needs to lock the higher level one, i.e., presence of up-calls makes it more likely you'll violate the partial order rule for locking mutexes
to avoid this, avoid holding a mutex while making up-calls


Version Stamps

concurrency can make it more difficult to use cached info., like when a separate thread executing at a lower level invalidates info. known to a thread currently-executing at a higher level (e.g., info. about a disk volume)


can use up-calls to invalidate cache structures at higher level, but this won't invalidate state held locally by a thread; between time info. comes from cache and call actually occurs, info. might become invalid


version stamps are useful here -


in low level abstraction, maintain a counter assoc. w/true data.
Whenever data changes, increment counter.
Any copy of data issued to a higher level is accompanied by the counter value and whenever you make a call back down to lower level and the call or its parameters depend on previously obtained data, you include associated value of counter.
When lower level receives such a call, it compares the incoming value of counter with current truth value.
If they're different, it returns an exception to higher level, which then knows to re-consider its call.
This technique is also useful when maintaining cached data across a distributed system

Work Crews

dealing with the scenario where there are too many threads for the system resources available, overrunning memory and/or slowing performance, means either (1) being more restrained in your forking or (2) using an abstraction that will control forking for you (see 'Workcrews: An Abstraction for Controlling Parallelism', Vandevoorde and Roberts, DSRC, 1989)

(2): idea is to enqueue requests for asynch activity and have fixed pool of threads performing requests; complexity comes in managing requests, synching between them and coordinating results

An alternative proposal is to implement Fork in such a way that it defers actually creating new thread until there's a processor avail. to run it
Building Your Program

How to tell if you've succeeded in implementing your program successfully?

concurrency affects usefulness in the design of the interfaces to library packages


you must design interfaces w/assumption that your callers will be using multiple threads


must ensure all entry points are thread re-entrant (the can be called by multiple threads simultaneously) even if it means ea. procedure immediately locks a central mutex


must not return results in shared global variables, nor in global statically-allocated storage


calls should be synchronous, not returning until results are available


if caller wants to do other work meanwhile, can do so in other threads


even w/o multi-threaded clients of the interface, follow these guidelines to avoid future problems


must be fastidious about assoc. ea. piece of data w/one and only one mutex

use condition variables in recommended style - retesting boolean expression after returning from Wait


analyze deadlocks with symbolic debuggers

debugger must provide at least minimal support for threads - enumerating existing threads and looking at each thread's stack, hopefully also filtering in thread enumeration (finding all threads blcoked on a particular mutex or condition variable, for ex.)


will produce an answer quickly, making good use of available resources


obtain stats on lock conflicts (ex.- how often threads have had to block on this mutex and how long they've waited) and on concurrency levels (ex. - what was avg. # threads ready to execute in your program, what % of time were n threads ready)


Don't emphasize efficiency over correctness


Further Information


'An Introduction to Programming with Threads' by Andrew D. Birrell, Digital Systems Research Center (1989) https://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-35.pdf


'Workcrews: An Abstraction for Controlling Parallelism', Vandevoorde and Roberts, DSRC (1989) https://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-42.pdf


https://www.udemy.com/course/advanced-c-programming-course


https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/generalprogramming/signal_mgmt.html