convict-git/concurrency-sync-cpp.md

## concurrency-sync-cpp.md

      
    Raw
  

              concurrency-sync-cpp.md
            
          
    Concurrency and Synchronization using Modern C++

Thread Management

Each running program has at least one thread which is occupied by its entry point. There can be many threads for a single program and the memory is shared among all threads. The objects possessed by the main thread (and shared by other threads as a reference to it) would be destroyed as soon as they go out of scope. This might lead to corrupt memory access by the rest of the threads.
Hence, sometimes, it's very important to make sure that all the threads are joined using std::thread::join() before the main thread terminates. Other times, you basically want a daemon thread running in background even after the termination of main thread and that can be achieved by calling std::thread::detach(), but it may or may not be possible to join a detachable thread.
Resource Acquisition Is Initialization (RAII) is a famous programming idiom in C++ (a misnomer actually) also (better) known and understood as Scope-Bound Resource Management. In C++, as soon as the object is declared (*), the constructor method of the class is evoked which generally is responsible for resource allocation. Similarly, when the object gets out of scope, the destructor is called which generally is responsible for freeing the acquired resources.
RAII is one good way of making sure that a thread is joined by writing a wrapper class as a container for the thread instance which calls the join in its destructor. Functors are similarly used to wrap functions by overloading the calling operator () which makes a class callable.
C++ has a robust library to support thread management.
A thread instance can take any callable (functions, functors, binds or lambda functions) and its parameters as arguments. It's important to note that the parameters calls the default copy constructor to create a copy of the parameter in its scope. Hence, to actually 'share' memory between the caller and callee thread, the parameter objects must be wrapped with std::ref() or by the conventional way of calling using pointers. To transfer the ownership of a object, the object can be wrapped with std::move() which evokes the move constructor instead of the copy constructor at the time of initialisation.

Data Race and Mutex

When multiple threads share memory, the order of memory access and writes matter a lot. Basically, the output of the program then completely depends on the order of execution of the threads. This causes data races.
Data race is the worst nightmare a programmer might want to face. Debugging a program which has a race conditions (for a given input) can be almost impossible to debug due to the inconsistency in the output. Hence, we at all cost, have to avoid race conditions in our engine.
It's essential to work with the common resources synchronously. This is usually achieved quite well with mutex and fortunately enough, C++ has support for mutex in the standard library.
std::mutex gives us the bare std::mutex::lock() and std::mutex::unlock() to lock and unlock the mutex respectively. But it's not recommended to do it manually as it arises one common issue: what happens if a thread locks a mutex and terminates with some exception. No other thread now can unlock it resulting in infinite wait. RAII again comes to the rescue by providing a wrapper std::lock_guard. When using std::lock_guard with std::mutex, just keep the critical section in the scope of it, and as soon as the lock guard goes out of scope, std::mutex::unlock() is evoked automatically.
How can we avoid race conditions?


Make sure that the common resources are not accessible without locking a mutex associated with the resource.
Never return the protected resource or pass it as an argument to 3p functions.
Design the interface appropriately. Figure out the atomic operations and bind them.


Dead lock

op1() { if (x == 1) y += 1; }

op2() { if (y == 1) x += 1; }

Let op1() and op2() run on two separate threads. It's easy to realise that if op1() and op2() locks x and y for the condition checks respectively, they get into a state where neither can they release locks nor unlock the mutex for the next statement. Such conditions are called deadlocks and appear because the locking of mutex were executed in different orders. Luckily, we can avoid manually finding the right order of locking the mutex using the deadlock removal mechanism provided to us in the standard library as std::lock().
std::mutex m1, m2;
...
std::lock(m1, m2); // makes sures the lock avoids deadlock
...
// use adopt_lock() which makes sure that
// lock guard doesn't try to lock the already locked mutex but
// will certainly unlock when out of scope
{
std::lock_guard<std::mutex> lock1(m1, std::adopt_lock());
std::lock_guard<std::mutex> lock2(m2, std::adopt_lock());
...
// here work on the resources corresponding to m1 and m2
}
// m1 and m2 gets unlocked as lock1 and lock2 gets out of scope
How to avoid deadlocks?


Prefer locking single mutexs as much as possible.
Avoid locking a mutex and then calling a user provided function as it may further lock more mutexes on which you have no control over leading to deadlock scenerios.
Lock the set of mutexes in same order.

Fine grained locks can protect small amounts of data but are too vulnerable to deadlocks whereas coarse grained locks can protect large amounts of data at once but decreases the efficiency of parallel processing as other threads wait too much on the locks.
Looking at the locking granularity, it's important to find a perfect balance.

Unique Lock instead of Lock guard?

In the previous example, we have seen the use of std::adopt_lock(). We have std::try_to_lock() and adopt_lock() as well in the standard library that can help us decide the locking strategies while using std::lock_guard or std::unique_lock as we will see now.
std::lock_guard is very restrictive as it locks the mutex (or not) in the constructor and unlocks in the destructor and you have to carry the locked mutex till the scope ends. Also, std::lock_guard can only lock or unlock the mutex once. This may not be always flexible to use. We have std::unique_lock which is more flexible on the aforementioned points and also allows transfer of ownership (move constructor) which is absent in std::lock_guard.
So should we always prefer unique locks? It is important to note that the flexibility comes at a cost and std::lock_guard is much more lighter and has less overhead as compared to std::unique_lock. So the choice has to be made wisely.
Lazy initialisation

Assume a scenario where we have multiple threads (sharing the memory) and some resources that may or may not always be used synchronously by the threads, that is, there might exist some input for which the path never allowed the acquisition of a resource that was present. So it this a problem?
Resource initialization is sometimes expensive and can bring a lot of overhead. Hence sometimes it is wise to delay the initialisation of the resource till at least one call (resource acquisition) happens, initialise it only then and use the resource thereafter.
Standard library in C++ allows us to do this using std::call_once which evokes a passed callable exactly once no matter how many several more calls is made, even from different threads.
std::once_flag f1; // available to the threads that may need the resource
...
// in all those threads, the invocation can be done as follows ..
std::call_once(f1, /* some callable which does the initialisation */);
...

Condition Variable

There may be instances where one of our thread is waiting for some event (or some condition to be fulfilled) by the other thread. Easiest way to achieve this is by busy-waiting for the event. It is well known that busy wait takes up a lot of CPU power. Better alternative is to use the concepts of signalling.
Standard library in C++ gives us std::condition_variable to achieve signalling. The waiting thread can use parameter, a which takes std::mutex as an argument and waits on it, while he other thread can use std::condition_variable::notify_all() or std::condition_variable::notify_once() to signal.
To make sure that the wake up call on a condition variable is not a spurious wake-up, we can send another parameter along with the std::mutex, a callable which checks whether the condition is true or not, to std::condition_variable::wait().
So before waking up, the thread can test the condition using the callable and decide whether or not to wake up. It is worth noting that std::lock_guard isn't compatible with std::condition_variable as the condition variable needs control on the lock (to unlock) and lock/unlock possibly multiple times. Hence, we use std::unique_lock with condition variables.

Future

It may happen that a parent thread expects some result from the child thread. Once the parent spawns the child thread and reaches for at the execution point where it needs the results, it possibly waits for the child thread to evaluate the result. There are lot of ways to do it as they share the same memory but it again requires us to be cautious with the synchronous access and data racing.
Standard Library in C++ provides us with std::async(), a method that returns std::future object, and takes any callable with different calling strategies. We can get the return value from the child thread later in the parent thread using std::future::get(), but only once, i.e. it's not allowed to evoke get() twice for the same future object.
std:async() by default may or may not launch a seperate thread for the callable. We can use either std::launch::deferred which deferes the execution till later point (until get is called, and doesn't create a seperate thread for the execution) or std::lauch::async which necessarily launches another thread at that instance. The default argument value if the bit-wise xor of std::launch::async and std::launch::deferred.
Promise

There might be instances where the parent thread wishes to send result to a child thread but the result isn't evaluated yet and isn't present with the parent thread at the time of evoking the child thread.
For that, standard library has std::promise, which is itself a container of std::future (so you get a future from promise).
We basically send the future object as reference (no copy allowed), which we fetch from the promise object of parent by std::promise::get_future(), to the child thread and later, the parent thread sets the value of promise (keeps the promise) using std::promise::set_value().
If promise object gets out of scope before setting the value, an exception will be thrown (std::future_errc::broken_promise) at the step where get() is invoked on the future object in the child thread. To avoid this, in case parent isn't able to set the value, it can set a meaningful exception using std::promise::set_exception.
Shared future

Think about a broadcast kind of model, where parent thread wants to send some value in future to multiple child threads. This isn't possible with std::future as same object will be shared (no copy) but get() could be called only once.
For such situations, we have std::share_future which we get from std::future::share(). It allows the use of copy constructor and hence all child threads can be evoked with different copies of shared future object and can evoke get() on their own.
Packaged Task

Packaged task is a part of future in standard library of C++. The template argument is similar to functional lambdas.
std::packaged_task is similar to std::async but can be invoked at developers will. Also, the packaged tasks can be sent to other different threads. It returns a std::future object. Again, it's important to note that no copy constructor is allowed and only move semantics work with it.

Some example codes

A double ended queue for tasks to be handled by the worker thread

std::deque <std::packaged_task <int()>> deq_t;
std::mutex mu_deq_t, mu_cout;
std::condition_variable cond_deq_t;

// safe cout makes synchronous cout possible
template <typename T>
void safe_cout (const T &x) {
   {
      std::lock_guard <std::mutex> lg_cout(mu_cout);
      std::cout << x << std::endl;
   }
}

// method to find a fatorial for given integer n
int factorial (int n)
{
   int x = 1;
   for (int i = 1; i <= n; ++i)
      x *= i;
   return x;
}

void worker_thread ()
{
   std::packaged_task<int()> t;
   { // lock the mutex for the dequeu
      std::unique_lock<std::mutex> ul_deq_t(mu_deq_t);
      cond_deq_t.wait(ul_deq_t, []{ return !deq_t.empty(); }); // avoid suprious awakes
      t = std::move(deq_t.front()); // only moved constructor defined
      deq_t.pop_front();
   }
   t(); // call the callable
}

signed main () // main thread
{
   std::thread t1(worker_thread);
   // bind the callable factorial with arg 5 to a new binded callable
   std::packaged_task <int()> t(std::bind(factorial, 5));
   // get a future object associated to the packaged_task
   std::future <int> f = t.get_future();

   { // locks the mutex for the deque
      std::unique_lock<std::mutex> lg_deq_t(mu_deq_t);
      deq_t.push_back(std::move(t));
   }
   cond_deq_t.notify_one(); // notify on the condition variable
   safe_cout(f.get()); // get the value using future
   t1.join();
   return EXIT_SUCCESS;
}
A Thread-safe Logger Framework

namespace GameEngine {
   // map to bash colors for pretty looking
   std::unordered_map <std::string, std::string> colors_map = {
      {"ERROR", "\e[31m"},
      {"SUCCESS", "\e[32m"},
      {"WARNING", "\e[33m"},
      {"INFO", "\e[0m"}
   };
   std::string normal_color = "\e[0m";

   // Logger class
   class Logger {
      private:
         /* Method to get the time stamp for the logger */
         static std::string get_time_stamp ()
         {
            auto temp_time = std::chrono::system_clock::to_time_t (std::chrono::system_clock::now());
            return stringstream(std::put_time(std::localtime(&temp_time), "%Y-%m-%d) %X").str();
         }

         std::ostream &os; /* stores a reference to ostream object */
         std::mutex mu_cout; /* Mutex for protect the os object */

         void print () { } /* Base condition of print */
         /* Variadic template to allow any number of prints in one go */
         template <typename T, typename ... Types>
            void print (const T& x, Types ... args)
            {
               os << x;
               if (sizeof...(args)) os << ", ";
               print (args...);
            }

      public:
         /* Not allowing copying of Logger instance */
         Logger() = delete;
         Logger (const Logger &) = delete;
         explicit Logger (std::ostream &_os) : os(_os) { }
         ~Logger () { os.flush(); }

         template <typename ... Types>
            void log(std::string msg_type, Types ... args)
            {
               assert(colors_map.find(msg_type) != colors_map.end() &&
                     "Incorrect message type for the logger");
               { // lock the mutex
                  std::lock_guard <std::mutex> lg_cout(mu_cout);
                  os << colors_map[msg_type] <<
                     "[" << msg_type << "] [ " << get_time_stamp() << " ] ";
                  print (args...);
                  os << normal_color << std::endl;
               }
            }
   };
   Logger logger (std::cout);
} // namespace GameEngine