Skip to content

Instantly share code, notes, and snippets.

@jerith
Created April 20, 2017 12:07
Show Gist options
  • Save jerith/364e8eaba042ac5332ff142826c641cc to your computer and use it in GitHub Desktop.
Save jerith/364e8eaba042ac5332ff142826c641cc to your computer and use it in GitHub Desktop.
Concurrency is hard.
# In addition, we monkey-patch Parallel::replace_worker to add a lock around it
# so we don't sometimes have our builds hang forever in Travis.
#
# Some background for the hangs: Worker subprocesses communicate with the
# parent process through pipes, and are created with Process::fork. The new
# worker subprocess then reads from its pipe in a loop to receive tasks and
# exits when `read.eof?` indicates all write endpoints have been closed. This
# only works if the parent process holds the only open-for-writing file
# descriptor connected to a worker's pipe. In order for this to be true, new
# subprocesses need to close all inherited file descriptors connected to other
# subprocesses, and they do this by running through an array of
# `started_workers` (passed in from either `replace_worker` or
# `create_workers`) and closing all the pipes they find in it. However!
# `replace_worker` is called from multiple threads and builds the
# `started_workers` array from the set of workers it knows about when it's
# called. If the stars are aligned just wrong and the level of concurrency is
# high enough (Travis builds get 32 shared cores to play with), subprocesses
# may inherit file descriptors not in the `started_workers` they're given and
# can thus potentially keep each other alive forever while the parent waits for
# them to exit. This manifests as a set of catalog tests that never finishes
# and whoever is waiting for that build has a bad day.
# Upstream bug report: https://github.com/grosser/parallel/issues/196
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment