Skip to content

Instantly share code, notes, and snippets.

@magnific0
Created February 27, 2014 14:06
Show Gist options
  • Save magnific0/9250719 to your computer and use it in GitHub Desktop.
Save magnific0/9250719 to your computer and use it in GitHub Desktop.
Implementing cooperative multi-tasking for multi-threaded C++ Python calls

In response to pagmo issue #44

By default the GIL is not managed which means that multiple threads can access the Python interpreter without acquiring a lock. This default behaviour saves the hassle of the user to work with the GIL. This works perfectly fine as long as only one thread is accessing the Python interpreter, as in the case with most algorithms. However if an algorithm that spawns multiple threads (e.g. PaDe) that try to access the interpreter (because the problem is in Python) at the same time this will eventually cause segfaults.

There are three possible ways of managing this problem:

  1. do not spawn multiple threads in algorithms;
  2. discourage users from defining PyGMO problem for these select few algorithms;
  3. use cooperative multitasking.

Below an implementation of option 3 is discussed. In cooperative multitasking each Python callback function is wrapped with the following lines (source):

PyGILState_STATE gstate;
gstate = PyGILState_Ensure();

/* Perform Python actions here. */
result = CallSomeFunction();
/* evaluate result or handle exception */

/* Release the thread. No Python API allowed beyond this point. */
PyGILState_Release(gstate);

This requires a thread to acquire GIL (Global Interpreter Lock) before making a Python call and releasing it afterwards. GIL is explained in more detail in the PyCon presentation by Dave Beazley.

Using GIL requires the main process (the C++ function called by Python) to first initialize and obtain the lock, using:

PyEval_InitThreads();

The main thread now holds the GIL. If it spawns any threads the threads do not automatically obtain the GIL as it is currently holding and waiting for the threads to finish, causing a hang also known as a deadlock.

The verdict on how the main process should release the GIL for the threads is subject to much debate (e.g. here and here). Where common problems are more deadlocks and segfaults.

During the last test with Python 2.7.6 the following approach worked best, where inside the algorithm the arch.evolve method is wrapped as follows:

PyEval_ReleaseLock();
arch.evolve(pop);
PyEval_AcquireLock();

Although in the future the following strategy should be used as ReleaseLock() was deprecated.

PyThreadState *_save;
_save = PyEval_SaveThread();    
arch.evolve(pop);
PyEval_RestoreThread(_save);

Finally it would be possible to simplify wrapping the callback functions making use of the Boost Python call policies, as described in this tutorial. A new policy takes care of acquiring and releasing the lock using defined precall() and postcall() function for every call back.

The definition of the police would be:

namespace boost { namespace python {
  
struct release_gil_policy
{
  // Ownership of this argument tuple will ultimately be adopted by
  // the caller.
  template <class ArgumentPackage>
  static bool precall(ArgumentPackage const&)
  {
    // Release GIL and save PyThreadState for this thread here

    return true;
  }

  // Pass the result through
  template <class ArgumentPackage>
  static PyObject* postcall(ArgumentPackage const&, PyObject* result)
  {
    // Reacquire GIL using PyThreadState for this thread here

    return result;
  }

  typedef default_result_converter result_converter;
  typedef PyObject* argument_package;

  template <class Sig> 
  struct extract_return_type : mpl::front<Sig>
  {
  };

private:
  // Retain pointer to PyThreadState on a per-thread basis here

};
}
}

Where the policy is enforced simply on each binding by:

def("myFunction", make_function(&myFunction, release_gil_policy()));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment