In this blog post I will show you a technique that can be used to parallelize any task loop in Ansible.
Ansible is a great automatization platform, (relatively) simple, and very versatile. Unfortunately, it is quite lacking when it comes to parallelization. There are fundamentelly two ways in which you can parallelize operations in ansible:
- work on multiple hosts at the same time
- launch (and poll) multiple tasks concurrently with the
poll: 0
keyword.
Unfortunately these standard solutions have drawbacks. In order to parallelize work across several hosts, you need to
have different hosts in the first place: this means that if you aim to parallelize a loop of task within a single host,
you are out of luck. The second option requires significant modifications of the playbook, because concurrent task are
launched but they must be polled explicitly with async_status
,
whis is very unpractical. Most importantly, async_status
is not supported for all tasks, and in particular it is not
supported in tasks that include roles or other task lists. This means that there is no way to run concurrently
third-party roles, which largely limits the usability of this feature.
The most reliable and effective way to parallelize computations in Ansible is indeed by using the implicit host loop,
possibly paired with a large value of the --forks
parameter and the free strategy.
What we can (ab)use the host loop to parallelize tasks? It turns out that that is possible, with minimal modifications
to the playbook file.
The idea is to be able to create multiple "virtual" host copies, from your real inventory hosts, and to distribute work among them. Several "virtual" hosts will own a slice of your original workload, and just run that, concurrently with the other virtual hosts, according to the number of forks and strategy.
Fortunately, it is possible to add new hosts during the execution of a playbook! You just use the
ansible.builtin.add_host
task.
The new hosts can be then selected in a specific play of the playbook.
Consider this example: your Ansible playbook is configuring some hosts. The configuration consists in running some jobs
or workloads with the role run-job
, and the job items are are listed in some variable job_ids
. Here is a playbook
that implements this logic:
https://gist.github.com/c43c3b9deee1c66cb4afb520b4c69338
We want to parallelize the task Launch all job items
, without editing the task files of role run-job
. This requires
creating some virtual copies your hosts, each holding a part of the job_ids
array. Then, you define a second play
in your playbook, which selects on the new host copies, and runs the Launch all job items
task unmodified.
To showcase the result, I have written a ready-to-use ansible role that you can pull from the Galaxy,
https://gist.github.com/8af9b33f5fdc2fb741c3efd9d1819738
This role takes these two mandatory input:
payload_var
: the variable holding the array that should be slicedbatch
: the size of each slice. The number of slice hosts is implitly determined by this parameterbatch
. The new hosts will be part of the new group "virtual", which can be used as a host pattern to select on the new virtual hosts.
Here is how you would modify the playbook:
https://gist.github.com/c76eff0eaea30e314b08dfe46e9aff8e
The modifications are pretty simple. Now we can try to benchmark the playbooks.
The code in this post is available here. The generate-jobs
role simulates work IDs by creating a list of strings of length job_count
. The run-job
role simulates running a job
by sleeping for a random delay, between job_duration_ms_min
and job_duration_ms_max
(milliseconds, default values
respectively 10 and 200).
Let's write an inventory for three hosts, and let's generate 100 jobs for each, of a duration between 0 and 200 milliseconds:
https://gist.github.com/ec363eedb651e80ba5d955e796471d4c
We can now time the execution of the first playbook, which naively includes the run-job
role in a loop:
https://gist.github.com/29ac27bab53b0031e4156bda6efc1def
That is 113 seconds! And we also notice something interesting: Ansible got 100 iterations of the loop for each task,
and it is running all of them sequentially, disregarding host-level parallelization. This is because of a known
bug (which is not going to be addressed anytime soon I think...) that
causes Ansible break host parallelization when the loop items are not the same. Since the generate-jobs
mock role
creates all different job items in the format of a string {{ inventory_hostname }}-{{ job_id }}
, you get this
completely serialized execution.
Now, let's try to run the modified playbook with virtual hosts, with a job_batch_size
of 10, and --forks 20
:
https://gist.github.com/719bc0b226c7240d4e1ce3fb5cac818d
That is 21 seconds, for a 80% reduction of wallclock execution time! And we can observe that all the tasks are in parallel across all (virtual) hosts, because we can see in the console interleaced task output from different hosts.
But what is under the hood of the pisto.virtual_slice_hosts
slices? It is rather simple:
https://gist.github.com/43dd402e2c41a5e0343a816578ff50dd
The role essentially loops over all hosts, and all slices for each host of the payload variable payload_var
, and
creates a new host in the specified group_name
with its own slice of the content of payload_var
.
The important implementation detail here is that in order to copy an host into a virtual slice host, one needs to know
which variables should be applied to the new host. These obviously include several connection and privilege escalation
variables such as ansible_host
or ansible_become
. The role copies by default a set of known connection variables
(specified in the role default variable implicit_copy_vars
), and all variables explicitly declared in role variable
copy_vars
). For example, you may want to run a benchmark as the one above, but setting different amounts of work for
each host. In order to do that, you can specify host-specific values for job_duration_ms_min
and job_duration_ms_max
in the inventory, then you need to modify the Create virtual slice hosts
task into:
https://gist.github.com/9c8fbbe806015ac7ef39958c01f55c28
Of course the list of variables depends on your original playbook. You may want to override copy_vars
once and for all
at the playbook level with group_vars
.
In this post I showed how you can parallelize arbitrary sections and loops of your playbooks, with minimal modifications
to the code of the playbook itself. In particular, loops over items that change on a per-host basis are particularly
slow to execute with a naive implementation in Ansible. The trick is to generate new hosts during the playbook
execution, using the add_host
task, and then execute operations in parallel across these virtual copies of the
original host.
The implementation is logic is trivial, and a basic but versatile implementation is available as the role
pisto.virtual_slice_hosts
. The speedup to serial ansible code is significant: in the synthetic benchmarks shown here
there is an 80% reduction of the wallclock execution time.
The only significant requirements for employing this technique is that the user must know which variables should be
copied over to the virtual hosts, and that in most cases the ansible.builtin.free
strategy must be used to unlock the
performance gains, otherwise Ansible will most likely serialize your loops due to long standing limitations of Ansible
(#30816
#36978).