Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
This is a brain dump of all of the Ansible performance related things that sivel knows

Perf things that I know

As you will note while reading this, I haven't provided numbers to back up these statements. We'll get there eventually, so while I have experience with some of these, others are more "theoretical" based on my knowledge.

Fact gathering

  • the default of gather_subset: [all] can consume a lot of RAM, and with a higher fork count causes CPU contention processing results in the main process. The CPU penalty is lessened with deepdish in 2.7
  • [min] is largely what people need and less impactful. This can be set via ansible.cfg as a default
  • Not gathering facts when they aren't needed is a boost

imports

  • relatively no impact, other than disk io

includes

  • Memory impact is negligible since 2.5.2, however the more hosts you have the more memory is used when calculating vars
  • The more hosts you have creates CPU overhead as we deduplicate results and calculate vars
  • Loops are not as impactful as they once were on memory and CPU. 2.6 reduced the impact by caching vars
  • Choosing different files per host is a perf slowdown, purely because per file is executed serially. Hosts matched to a single file are still parallel per fork count

callback plugins

  • can drastically reduce performance as we execute all callback methods serially.
  • If you have 4 callback plugins enable that have v2_runner_on_ok that each get executed serially, waiting on the prior before running
  • If a callback method is slow, it delays spawning another worker
  • A callback that stores a lot of info can impact memory utilization
  • High fork counts can cause contention here

Module returns

  • The more returned data the more impactful
  • Diff mode increases processing time and delays worker starts
  • High fork counts can cause contention here

Python versions

  • python 3 shows big performance benefits, both CPU and Memory
  • Python 2.7.7 includes a fix to memory arenas that resolves a memory leak
  • RHEL backports the above patch in RHEL7.5 as python-2.7.5-63

Transports

  • the best is ssh+ControlPersist+pipelining
  • within the same network the above almost matches local actions
  • Paramiko is generally slow
  • Bastions by nature will slow things down due to extra network hops and potential contention for ssh login and resources (MaxStartups)
  • SFTP (the default) is slower than scp

Authentication

  • password auth is slow due to interacting with sshpass to provide passwords to prompts
  • Passworded sudo is slower than passwordless, because we have to do more inspection and provide passwords
  • Key based auth with passwordless become is fastest

Remote shell

  • configurations that run more profile code during non-interactive logins cause slowness

Dynamic inventories

  • Inventory scripts not returning _meta will be slower as ansible calls the script repeatedly for each host
  • Inventory scripts and plugins that do not use caching cause slowness

Hashing

  • modules that calculate hashes can cause remote execution to seem slower than necessary
  • Any module that relies on the underlying copy functionality will perform a remote stat via the stat module including a sha1 hash. In some cases it may be beneficial to not copy large files.

Strategy

  • although free is "faster" it has the potential to create many more problems with CPU and memory usage

Variables

  • lazy loading can cause lookups to be re-evaluated many times
  • jinja2 native can be faster, as it avoids safe_eval
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.