Skip to content

Instantly share code, notes, and snippets.

@cswiercz
Last active June 8, 2016 16:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cswiercz/94fcdc9049cbcd9622734414f2d13636 to your computer and use it in GitHub Desktop.
Save cswiercz/94fcdc9049cbcd9622734414f2d13636 to your computer and use it in GitHub Desktop.
A race condition problem and solution from the AMath 483/583 final examination I wrote this Spring.

Problem Statement

In the below OpenMP code, a shared memory location counter is being incremented by two threads, each thread incrementing counter 100 times. The incrementation process

counter += 1

is written out explicitly to emphasize the 3-step read-increment-write process which is subject to race conditions.

int counter = 0;

#pragma omp parallel shared(counter) num_threads(2)
{
  // private variables: i, N, read
  int i, N, read;
  N = 100;

  // each thread increments "counter" N=100 times. note that 
  // this is *not* an `omp for` block!
  for (i=0; i<N; ++i)
  {
    // the following is equivalent to "counter += 1;"
    read = counter;  // read
    read = read + 1; // increment
    counter = read;  // write
  }
}

printf("counter = %d\n", counter);

What are theoretical maximum and minimum values that counter can take on at the end of this parallel code?

Solution

The theoretical maximum is counter == 200. This occurs in any situation where there are no race conditions on the reading and writing of counter. For example, in the case when Thread 0 performs all of its iterations before Thread 1 performs all of its iterations.

A "timeline plot" of this example behavior is given below. (Behold my beautiful ASCII art.)

--------
THREAD 0
--------

  ___     ___          _____
 |   |   |   |        |     |             each "staple" represents a
 |   |   |   |   ...  |     |       <---- read, increment, write
c=0 c=1 c=1 c=2      c=99 c=100
------------------------------------------------------------
                                 c=100 c=101     c=199 c=200
                                   |     |   ...   |     |
                                   |_____|         |_____|
        
--------
THREAD 1
--------

As for determining the theoretical minimum (which is the hard part of this problem) consider the following worst-case thread scheduling scenario:

  • Thread 0 read c == 0
  • Thread 1 read c == 0
  • Thread 1 increment and write such that c == 1
  • Thread 1 read, increment and write x98 more times ending up with c == 99. Note that there is one more iteration that Thread 1 needs to take before its work is complete.
  • Thread 0 increment and write resulting in c == 1.
  • Thread 1 perform last read c == 1
  • Thread 0 read, increment, and write remaining x99 times ending up with c == 99. Thread 1 has finished its iterations.
  • Thread 1 perform final increment and write resulting in c == 2.

A picture might help. (Again, marvel at this ASCII greatness.)

--------
THREAD 0
--------

       (iteration 1)               (iterations 2-100)
  ________________________         ____        ______
 |                        |       |   |       |      |
 |                        |       |   |  ...  |      |
c=0                      c=1     c=1 c=2     c=98  c=100
------------------------------------------------------------
    c=0 c=1     c=98 c=99    c=1                         c=2
     |   |  ...  |    |       |                           |
     |___|       |____|       |___________________________|
     
      (iterations 1-99)              (iteration 100)
        
--------
THREAD 1
--------

Note that we cannot achieve c = 1 for the following reason: in order to have c = 1 in the end there must have been a situation where a thread read c = 0 and wrote c = 1. However, because each iteration increments c this means that, without loss of generality, Thread 0 read c = 0 at the very beginning before Threads 0 or 1 performed any of their writes. However, even if Thread 0 overwrote Thread 1's iterations it still needs to perform 99 more iterations and writes which leads to a contradiction. Therefore, c = 2 is the minimum.

The moral of the story: shared memory parallelism is hard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment