In the below OpenMP code, a shared memory location counter
is being
incremented by two threads, each thread incrementing counter
100 times. The
incrementation process
counter += 1
is written out explicitly to emphasize the 3-step read-increment-write process which is subject to race conditions.
int counter = 0;
#pragma omp parallel shared(counter) num_threads(2)
{
// private variables: i, N, read
int i, N, read;
N = 100;
// each thread increments "counter" N=100 times. note that
// this is *not* an `omp for` block!
for (i=0; i<N; ++i)
{
// the following is equivalent to "counter += 1;"
read = counter; // read
read = read + 1; // increment
counter = read; // write
}
}
printf("counter = %d\n", counter);
What are theoretical maximum and minimum values that counter
can take
on at the end of this parallel code?
The theoretical maximum is counter == 200
. This occurs in any situation
where there are no race conditions on the reading and writing of counter
. For
example, in the case when Thread 0 performs all of its iterations before Thread
1 performs all of its iterations.
A "timeline plot" of this example behavior is given below. (Behold my beautiful ASCII art.)
--------
THREAD 0
--------
___ ___ _____
| | | | | | each "staple" represents a
| | | | ... | | <---- read, increment, write
c=0 c=1 c=1 c=2 c=99 c=100
------------------------------------------------------------
c=100 c=101 c=199 c=200
| | ... | |
|_____| |_____|
--------
THREAD 1
--------
As for determining the theoretical minimum (which is the hard part of this problem) consider the following worst-case thread scheduling scenario:
- Thread 0 read
c == 0
- Thread 1 read
c == 0
- Thread 1 increment and write such that
c == 1
- Thread 1 read, increment and write x98 more times ending up with
c == 99
. Note that there is one more iteration that Thread 1 needs to take before its work is complete. - Thread 0 increment and write resulting in
c == 1
. - Thread 1 perform last read
c == 1
- Thread 0 read, increment, and write remaining x99 times ending up with
c == 99
. Thread 1 has finished its iterations. - Thread 1 perform final increment and write resulting in
c == 2
.
A picture might help. (Again, marvel at this ASCII greatness.)
--------
THREAD 0
--------
(iteration 1) (iterations 2-100)
________________________ ____ ______
| | | | | |
| | | | ... | |
c=0 c=1 c=1 c=2 c=98 c=100
------------------------------------------------------------
c=0 c=1 c=98 c=99 c=1 c=2
| | ... | | | |
|___| |____| |___________________________|
(iterations 1-99) (iteration 100)
--------
THREAD 1
--------
Note that we cannot achieve c = 1
for the following reason: in order to have
c = 1
in the end there must have been a situation where a thread read c = 0
and wrote c = 1
. However, because each iteration increments c
this means
that, without loss of generality, Thread 0 read c = 0
at the very beginning
before Threads 0 or 1 performed any of their writes. However, even if Thread 0
overwrote Thread 1's iterations it still needs to perform 99 more iterations and
writes which leads to a contradiction. Therefore, c = 2
is the minimum.
The moral of the story: shared memory parallelism is hard.