Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
This is a brain dump of all of the Ansible performance related things that sivel knows

Perf things that I know

As you will note while reading this, I haven't provided numbers to back up these statements. We'll get there eventually, so while I have experience with some of these, others are more "theoretical" based on my knowledge.

Additionally, this probably isn't everything. I'll add more as I think of it.

Fact gathering

  • the default of gather_subset: [all] can consume a lot of RAM, and with a higher fork count causes CPU contention processing results in the main process. The CPU penalty is lessened with deepdish in 2.7
  • [min] is largely what people need and less impactful. This can be set via ansible.cfg as a default
  • Not gathering facts when they aren't needed is a boost

imports

  • relatively no impact, other than disk io

includes

  • Memory impact is negligible since 2.5.2, however the more hosts you have the more memory is used when calculating vars
  • The more hosts you have creates CPU overhead as we deduplicate results and calculate vars
  • Loops are not as impactful as they once were on memory and CPU. 2.6 reduced the impact by caching vars
  • Choosing different files per host is a perf slowdown, purely because per file is executed serially. Hosts matched to a single file are still parallel per fork count

callback plugins

  • can drastically reduce performance as we execute all callback methods serially.
  • If you have 4 callback plugins enable that have v2_runner_on_ok that each get executed serially, waiting on the prior before running
  • If a callback method is slow, it delays spawning another worker
  • A callback that stores a lot of info can impact memory utilization
  • High fork counts can cause contention here

Module returns

  • The more returned data the more impactful
  • Diff mode increases processing time and delays worker starts
  • High fork counts can cause contention here

Python versions

  • python 3 shows big performance benefits, both CPU and Memory
  • Python 2.7.7 includes a fix to memory arenas that resolves a memory leak
  • RHEL backports the above patch in RHEL7.5 as python-2.7.5-63

Transports

  • the best is ssh+ControlPersist+pipelining
  • within the same network the above almost matches local actions
  • Paramiko is generally slow
  • Bastions by nature will slow things down due to extra network hops and potential contention for ssh login and resources (MaxStartups)
  • SFTP (the default) is slower than scp

Authentication

  • password auth is slow due to interacting with sshpass to provide passwords to prompts
  • Passworded sudo is slower than passwordless, because we have to do more inspection and provide passwords
  • Key based auth with passwordless become is fastest

Remote shell

  • configurations that run more profile code during non-interactive logins cause slowness

Dynamic inventories

  • Inventory scripts not returning _meta will be slower as ansible calls the script repeatedly for each host
  • Inventory scripts and plugins that do not use caching cause slowness

Hashing

  • modules that calculate hashes can cause remote execution to seem slower than necessary
  • Any module that relies on the underlying copy functionality will perform a remote stat via the stat module including a sha1 hash. In some cases it may be beneficial to not copy large files.

Strategy

  • although free is "faster" it has the potential to create many more problems with CPU and memory usage

Variables

  • lazy loading can cause lookups to be re-evaluated many times
  • jinja2 native can be faster, as it avoids safe_eval

Forks

  • All forks are managed by the initial process, this means that spawning, monitoring, and retrieving results is limited to a single core. A high fork count will cause CPU contention, and limit the number of forks that can be spawned and running at any time.

Host counts

  • A high host count in inventory will adversely affect performance, and fork count. This is due to state tracking, and variable calculation related to hosts. This has been made much better in 2.9 by caching host lists for the life of the play, and recalculating as few times as possible. However, some variables are always recalculated, and a high host count impacts the ability to do this in a timely manner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.