Skip to content

Instantly share code, notes, and snippets.

@pisto

pisto/blog.md Secret

Created August 2, 2021 15:48
Show Gist options
  • Save pisto/030c237c4e59a19677c26dcd651a9297 to your computer and use it in GitHub Desktop.
Save pisto/030c237c4e59a19677c26dcd651a9297 to your computer and use it in GitHub Desktop.

Ansible parallel programming

In this blog post I will show you a technique that can be used to parallelize any task loop in Ansible.

Ansible is a great automatization platform, (relatively) simple, and very versatile. Unfortunately, it is quite lacking when it comes to parallelization. There are fundamentelly two ways in which you can parallelize operations in ansible:

  • work on multiple hosts at the same time
  • launch (and poll) multiple tasks concurrently with the poll: 0 keyword.

Unfortunately these standard solutions have drawbacks. In order to parallelize work across several hosts, you need to have different hosts in the first place: this means that if you aim to parallelize a loop of task within a single host, you are out of luck. The second option requires significant modifications of the playbook, because concurrent task are launched but they must be polled explicitly with async_status, whis is very unpractical. Most importantly, async_status is not supported for all tasks, and in particular it is not supported in tasks that include roles or other task lists. This means that there is no way to run concurrently third-party roles, which largely limits the usability of this feature.

The most reliable and effective way to parallelize computations in Ansible is indeed by using the implicit host loop, possibly paired with a large value of the --forks parameter and the free strategy. What we can (ab)use the host loop to parallelize tasks? It turns out that that is possible, with minimal modifications to the playbook file.

Virtual host slices

The idea is to be able to create multiple "virtual" host copies, from your real inventory hosts, and to distribute work among them. Several "virtual" hosts will own a slice of your original workload, and just run that, concurrently with the other virtual hosts, according to the number of forks and strategy.

Fortunately, it is possible to add new hosts during the execution of a playbook! You just use the ansible.builtin.add_host task. The new hosts can be then selected in a specific play of the playbook.

Consider this example: your Ansible playbook is configuring some hosts. The configuration consists in running some jobs or workloads with the role run-job, and the job items are are listed in some variable job_ids. Here is a playbook that implements this logic:

https://gist.github.com/c43c3b9deee1c66cb4afb520b4c69338

We want to parallelize the task Launch all job items, without editing the task files of role run-job. This requires creating some virtual copies your hosts, each holding a part of the job_ids array. Then, you define a second play in your playbook, which selects on the new host copies, and runs the Launch all job items task unmodified.

To showcase the result, I have written a ready-to-use ansible role that you can pull from the Galaxy,

https://gist.github.com/8af9b33f5fdc2fb741c3efd9d1819738

This role takes these two mandatory input:

  • payload_var: the variable holding the array that should be sliced
  • batch: the size of each slice. The number of slice hosts is implitly determined by this parameter batch. The new hosts will be part of the new group "virtual", which can be used as a host pattern to select on the new virtual hosts.

Here is how you would modify the playbook:

https://gist.github.com/c76eff0eaea30e314b08dfe46e9aff8e

The modifications are pretty simple. Now we can try to benchmark the playbooks.

Benchmarking

The code in this post is available here. The generate-jobs role simulates work IDs by creating a list of strings of length job_count. The run-job role simulates running a job by sleeping for a random delay, between job_duration_ms_min and job_duration_ms_max (milliseconds, default values respectively 10 and 200).

Let's write an inventory for three hosts, and let's generate 100 jobs for each, of a duration between 0 and 200 milliseconds:

https://gist.github.com/ec363eedb651e80ba5d955e796471d4c

We can now time the execution of the first playbook, which naively includes the run-job role in a loop:

https://gist.github.com/29ac27bab53b0031e4156bda6efc1def

That is 113 seconds! And we also notice something interesting: Ansible got 100 iterations of the loop for each task, and it is running all of them sequentially, disregarding host-level parallelization. This is because of a known bug (which is not going to be addressed anytime soon I think...) that causes Ansible break host parallelization when the loop items are not the same. Since the generate-jobs mock role creates all different job items in the format of a string {{ inventory_hostname }}-{{ job_id }}, you get this completely serialized execution.

Now, let's try to run the modified playbook with virtual hosts, with a job_batch_size of 10, and --forks 20:

https://gist.github.com/719bc0b226c7240d4e1ce3fb5cac818d

That is 21 seconds, for a 80% reduction of wallclock execution time! And we can observe that all the tasks are in parallel across all (virtual) hosts, because we can see in the console interleaced task output from different hosts.

Implementation

But what is under the hood of the pisto.virtual_slice_hosts slices? It is rather simple:

https://gist.github.com/43dd402e2c41a5e0343a816578ff50dd

The role essentially loops over all hosts, and all slices for each host of the payload variable payload_var, and creates a new host in the specified group_name with its own slice of the content of payload_var.

The important implementation detail here is that in order to copy an host into a virtual slice host, one needs to know which variables should be applied to the new host. These obviously include several connection and privilege escalation variables such as ansible_host or ansible_become. The role copies by default a set of known connection variables (specified in the role default variable implicit_copy_vars), and all variables explicitly declared in role variable copy_vars). For example, you may want to run a benchmark as the one above, but setting different amounts of work for each host. In order to do that, you can specify host-specific values for job_duration_ms_min and job_duration_ms_max in the inventory, then you need to modify the Create virtual slice hosts task into:

https://gist.github.com/9c8fbbe806015ac7ef39958c01f55c28

Of course the list of variables depends on your original playbook. You may want to override copy_vars once and for all at the playbook level with group_vars.

Conclusion

In this post I showed how you can parallelize arbitrary sections and loops of your playbooks, with minimal modifications to the code of the playbook itself. In particular, loops over items that change on a per-host basis are particularly slow to execute with a naive implementation in Ansible. The trick is to generate new hosts during the playbook execution, using the add_host task, and then execute operations in parallel across these virtual copies of the original host.

The implementation is logic is trivial, and a basic but versatile implementation is available as the role pisto.virtual_slice_hosts. The speedup to serial ansible code is significant: in the synthetic benchmarks shown here there is an 80% reduction of the wallclock execution time.

The only significant requirements for employing this technique is that the user must know which variables should be copied over to the virtual hosts, and that in most cases the ansible.builtin.free strategy must be used to unlock the performance gains, otherwise Ansible will most likely serialize your loops due to long standing limitations of Ansible (#30816 #36978).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment