Skip to content

Instantly share code, notes, and snippets.

@kyletaylored
Last active September 29, 2020 18:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kyletaylored/f5e6d98cb88127e8b47128233f5f8923 to your computer and use it in GitHub Desktop.
Save kyletaylored/f5e6d98cb88127e8b47128233f5f8923 to your computer and use it in GitHub Desktop.
Asynchronous vs Parallel Bash Processes

Asynchronous vs Parallel Bash Processes

Problem

Terminus is a process-driven, Symfony console application that implements cURL requests using PHP in the background to the Terminus API. There are some performance bottlenecks in the current implementation:

  • Core (and contributed plugin) mass functions run commands serially, looping over a list of sites, and waiting for each site process to complete.
  • Terminus API implements request timeouts when too many requests are submitted at once (but does provide automated retries).
  • Terminus doesn't have a good dependency management system when creating new plugins.
    • Can't redeclare Symfony package that Terminus uses, or include packages that require newer versions of packages in Terminus core.

This currently makes it impossible to have a internal Terminus-based parallel processing solution, so we need to wrap Terminus commands with an external process manager.

Parallel vs Interval Processing

When approaching this problem, the initial thought was to implement some kind of parallel processing technique. Parallel processing, by definition, is this:

a mode of computer operation in which a process is split into parts that execute simultaneously on different processors attached to the same computer.

When we think about adding parallel processing to Terminus, this translates to running separate Terminus commands on independent processes so that multiple commands can be running at the same time, especially when related to deploying a large number of sites. Parallel processing though is generally referring to the number of available processors (CPU cores), but this is predominently meant for heavy computations where you need a whole core to do some heavy work. In this context, we're essentially sending some requests and waiting for responses, so we only need to manage light processes rather than utilizing a whole core per task.

It is important to note that both parallel and time interval based management runs jobs asynchronously, the difference is how the overall processes are managed.

Interval processes

Create a number of concurrent processes, on a specific time interval, that run in the background. For example:

  1. Create a list of tasks.
  2. Cycle through the list, initiating each task with a delay (ex, sleep 6 to create a new process every 6 seconds).
  3. Utilize the ampersand at the end of the shell execution (./deploy.sh site-name-1 &) to create a background task
  4. Depending on the process completion time, tasks will finish before creating too many new tasks.

async

While this model implements a predictable minimum runtime, the major consideration is to not overload the system with too many background jobs which will increase the actual job time. For example, if we have 300 sites to deploy, we can estimate that with a 6 second delay, the minimum processing time will be 30 minutes (300 sites x 6 seconds = 30 min). But because there are a number of tasks embedded in the deployment sequence, a single task could take anywhere from 4 minutes to 10 minutes if competing for the same processor time. See the example code below for how to implement background jobs:

# Deployment sequence wrapper
function sequence() {
  local SITE=$1
  echo -e "Starting ${SITE}";

  # Check site upstream for updates, apply
  terminus site:upstream:clear-cache $1 -q
  terminus upstream:updates:apply $DEV -q

  # Deploy code to test and live
  terminus env:deploy $TEST --cc --updatedb -n -q
  terminus env:deploy $LIVE --cc --updatedb -n -q
}

# Loop through all sites, initiating deploy sequence.
for SITE in $SITES; do
	# Add ampersand (&) at the end to send the task to the background
  sequence $SITE &
  sleep 6
done

Parallel processes

Allocating a set number of concurrent workers to run processes out of a job queue. For example:

  1. Put a list of tasks in a queue.
  2. Define a number of available workers.
  3. Each worker will take a task out of the queue and initiate a process.
  4. When the processing is a complete, the worker will pick up a new task until no more tasks are in the queue.

parallel

The goal with either approach is to have multiple processes running concurrently in the background. The parallel processing method is simpler and the available workers can be dynamically expanded based on the environment but can overwhelm Terminus when initiatiting background calls, while the interval queue is static but requires more tuning based on how long jobs take.

Solution

Using GitHub Actions, we're able to create a build process that will utilize a Pull Request workflow, and on successful merge to master, will initiate the build sequence that will fetch all sites using this custom upstream, then start a parallel process that will initiate a deploy sequence for each site.

parallel-workflow

The magic function here is using GNU Parallel to manage the process handling. Essentially, we bundle the entire deployment sequence into a single script that takes a site ID as an argument. In this script, you can implement additional error handling (for example, as restoring a backup if we don't see an exit 0), but this simplistisc example does not.

# Get list of sites
SITES=$(terminus org:site:list ${ORG_UUID} --format list --upstream ${UPSTREAM_UUID} --field name | sort -V)

# Pass sites to deployment script, run in 50 parallel processes
echo $SITES | parallel --jobs 50 ./timeout-sequence.sh {}

In the example above, we first get a list of all sites that are using this custom upstream, then pass each site ID as a task into a job queue managed by GNU Parallel. These processes will be managed in the background, and will run 50 workers at the same time - each taking a new task until the job queue is empty.

Finished site-demo-288 in 4.85 minutes
Finished site-demo-292 in 4.50 minutes
Finished site-demo-293 in 4.56 minutes
Finished site-demo-286 in 5.16 minutes
Finished site-demo-279 in 5.75 minutes
Finished site-demo-283 in 5.71 minutes
Finished site-demo-277 in 6.25 minutes
Finished site-demo-290 in 5.58 minutes
Finished site-demo-287 in 7.00 minutes

You can see in the output above that because jobs may finish at different times, the deployments are completed out of order. One important change we made was adding a timeout wrapper to each deployment as to not hold keep the build container running if there was some kind of processing issue, killing the process after a specific amount of time.

# Timeout after 15 minutes.
timeout 15m ./deploy-sequence.sh $1

Reference Links

A Pen by Kyle Taylor on CodePen.

License.

<!-- Asynchronous Processing -->
<section class="container async">
<article>
<h1 class="text-center pb-4 pt-4">Asynchronous Processes</h1>
<hr class="col-sm-12">
<div class="processes">
</div>
<div class="time-wrapper">
<div class="time-grid">
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
</div>
<div class="row pt-3 pb-3">
<div class="col-sm-4 text-left pl-0"><strong>0 min</strong></div>
<div class="col-sm-4 text-center"><strong>5 min</strong></div>
<div class="col-sm-4 pr-0 text-right"><strong>10 min</strong></div>
</div>
</div>
</article>
</section>
<hr class="mt-5 mb-2 container bg-dark" style="height: 3px">
<!-- Parallel Processing -->
<section class="container parallel">
<article>
<h1 class="text-center pb-4 pt-4">Parallel Processes</h1>
<div class="processes">
<hr class="col-sm-12">
<div class="process row">
<div class="box-wrapper col-sm-3">
<div class="box bg-primary"></div>
</div>
<div class="box-wrapper col-sm-3">
<div class="box bg-primary"></div>
</div>
<div class="box-wrapper col-sm-3">
<div class="box bg-primary"></div>
</div>
<div class="box-wrapper col-sm-3">
<div class="box bg-primary"></div>
</div>
</div>
<hr class="col-sm-12">
<div class="process row">
<div class="box-wrapper col-sm-2">
<div class="box bg-warning"></div>
</div>
<div class="box-wrapper col-sm-4">
<div class="box bg-warning"></div>
</div>
<div class="box-wrapper col-sm-1">
<div class="box bg-warning"></div>
</div>
<div class="box-wrapper col-sm-5">
<div class="box bg-warning"></div>
</div>
</div>
<hr class="col-sm-12">
<div class="process row">
<div class="box-wrapper col-sm-5">
<div class="box bg-danger"></div>
</div>
<div class="box-wrapper col-sm-7">
<div class="box bg-danger"></div>
</div>
</div>
<hr class="col-sm-12">
<div class="process row">
<div class="box-wrapper col-sm-2">
<div class="box bg-secondary"></div>
</div>
<div class="box-wrapper col-sm-3">
<div class="box bg-secondary"></div>
</div>
<div class="box-wrapper col-sm-2">
<div class="box bg-secondary"></div>
</div>
<div class="box-wrapper col-sm-3">
<div class="box bg-secondary"></div>
</div>
<div class="box-wrapper col-sm-2">
<div class="box bg-secondary"></div>
</div>
</div>
<hr class="col-sm-12">
<div class="process row">
<div class="box-wrapper col-sm-1">
<div class="box bg-dark"></div>
</div>
<div class="box-wrapper col-sm-5">
<div class="box bg-dark"></div>
</div>
<div class="box-wrapper col-sm-4">
<div class="box bg-dark"></div>
</div>
<div class="box-wrapper col-sm-2">
<div class="box bg-dark"></div>
</div>
</div>
<hr class="col-sm-12">
<div class="time-wrapper">
<div class="time-grid">
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
<div class="split">&nbsp;</div>
</div>
<div class="row pt-3 pb-3">
<div class="col-sm-4 text-left pl-0"><strong>0 min</strong></div>
<div class="col-sm-4 text-center"><strong>5 min</strong></div>
<div class="col-sm-4 pr-0 text-right"><strong>10 min</strong></div>
</div>
</div>
</div>
</article>
</section>
const randomRange = (min, max) => {
min = min || 1;
max = max || 10;
return Math.floor(Math.random() * max + min);
};
/**
* Convert a template string into HTML DOM nodes
* @param {String} str The template string
* @return {Node} The template HTML
*/
const stringToHTML = (str) => {
var parser = new DOMParser();
var doc = parser.parseFromString(str, "text/html");
return doc.body;
};
let colors = [
"primary",
"secondary",
"warning",
"danger",
"info",
"success",
"dark"
];
// Create async processes
const asyncMax = 15;
const asyncProcesses = document.querySelector("section.async .processes");
for (let m = 0; m < asyncMax + 1; m++) {
let color = colors[m % colors.length];
let width = randomRange(10, 60);
let margin = m * 4;
let markup = `<div class="process row box-wrapper"><div class="box bg-${color}" data-width="${width}" style="margin-left: ${margin}%"></div></div><hr class="col-sm-12">`;
asyncProcesses.appendChild(stringToHTML(markup));
}
function main() {
// Async
anime({
targets: "section.async .box",
delay: (el, index) => {
return index * 1000;
},
duration: (el, index) => {
let time = el.dataset.width * 100;
return time;
},
width: (el, index) => {
return el.dataset.width + "%";
},
easing: "linear"
});
// Parallel
let parallelProcesses = document.querySelectorAll(
"section.parallel .processes .process"
);
let processTimes = [];
// Loop through each process
for (i = 0; i < parallelProcesses.length; ++i) {
let boxes = parallelProcesses[i].querySelectorAll(".box-wrapper .box");
let hash = Math.random().toString(36).substring(2, 15);
processTimes[hash] = [];
anime({
targets: boxes,
delay: function (el, index) {
var sum = processTimes[hash].reduce(function (a, b) {
return a + b;
}, 0);
return sum;
},
duration: function (el, index) {
let duration =
el.parentElement.classList[1].split("-").slice(-1) * 1000;
processTimes[hash].push(duration);
return duration;
},
width: "100%",
// width: function(el, index) {
// return (index+1) * 20 + "%";
// },
easing: "linear"
});
}
}
let wait = () => {
setTimeout(main, 1500);
};
window.addEventListener("DOMContentLoaded", wait);
<script src="https://cdnjs.cloudflare.com/ajax/libs/animejs/3.2.0/anime.min.js"></script>
.time-wrapper {
margin-left: 0;
margin-right: 0;
padding: 0;
clear: both;
width: 100%;
.time-grid {
width: 100%;
clear: both;
}
.split {
border-right: 1px solid #dedede;
width: 10%;
float: left;
&:first-child {
border-left: 1px solid #dedede;
}
}
}
.box {
width: 0px;
height: 20px;
border-radius: 5px;
}
.async {
hr {
padding: 0 0 5px;
margin: 4px 0 0;
}
.box-wrapper {
padding: 0;
.box {
margin-left: 0;
margin-right: 0;
// height: 10px;
}
}
}
.parallel .processes {
.box-wrapper {
padding: 0 1px;
float: left;
border-radius: 5px;
box-sizing: border-box;
.box {
}
}
}
hr {
margin: 10px 0;
}
<link href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.5.0/css/bootstrap.min.css" rel="stylesheet" />
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment