Skip to content

Instantly share code, notes, and snippets.

Last active February 17, 2023 09:46
Show Gist options
  • Save MrNice/89a3bbe44e218c9d2309 to your computer and use it in GitHub Desktop.
Save MrNice/89a3bbe44e218c9d2309 to your computer and use it in GitHub Desktop.
Explain how to think about ansible and how to use it


Understanding Ansible

Ansible is a powerful, simple, and easy to use tool for managing computers. It is most often used to update programs and configuration on dozens of servers at once, but the abstractions are the same whether you're managing one computer or a hundred. Ansible can even do "fun" things like change the desktop photo or backup personal files to the cloud. It can take a while to learn how to use Ansible because it has an extensive terminology, but once you understand the why and the how of Ansible, its power is readily apparent.

Ansible's power comes from its simplicity. Under the hood, Ansible is just a domain specific language (DSL) for a task runner for a secure shell (ssh). You write ansible yaml (.yml) files which describe the tasks which must run to turn plain old / virtualized / cloud computers into production ready server-beasts. These tasks, in turn, have easy to understand names like "copy", "file", "command", "ping", or "lineinfile". Each of these turns into shell commands which are run on the client server. For example, "copy" is essentially secure copy, also known as scp, and is used to move files from the ansible runner onto the client.

The output from these commands is collected and sent back to the ansible runner. This output can then effect the task execution flow. For example, if a "copy" command fails, it will by default stop task execution, but ansible can be instructed to instead ignore the failure, retry the operation, or even select a new source to copy from. In this way, ansible is very much like an imperative domain specific language. Tasks are run sequentially. If the first task copies a file to the computer, and the last removes it, it will still exist on the computer if the task pipeline fails somewhere in the middle.

However, a single task is generally declarative. It describes the state of the computer we want, and ansible ensures that the computer ends up in that state. Because ansible "gathers facts" about a system during setup and as it runs, it knows whether files it cares about exist on the computer. Instead of creating a directory with mkdir, you tell the "file" module to ensure that a certain path is set to directory mode, as everything is a file in Unix. The file module is smart enough to not do anything of the path is already a directory.

The glaring exception to this rule are the core modules "shell" and "command", though they can be treated declaratively with the creates option or the when task parameter, both of which return the no-change "OK" if a boolean flag is active.

Ansible attempts to be idempotent: when a playbook of tasks is run twice successively, or on two congruent computers, little should be different. There are many ways to subvert this in the imperative DSL, but for most ansible use cases, the same playbook should effect the computer in the same way every time. This presumption allows ansible to skip running tasks. For example, if the server already has the right node.js installed, or maybe just any version of node.js installed, the task will be "OK"'d and skipped. Note that "skipped" is a task end state for when a conditional isn't met, while "ok" is a task end state for when the computer was already in the end state.

This allows the ansible runner computer to not matter, as long as the runner has the correct files. This seemingly difficult task is fairly easy to ensure, as ansible encourages you to keep important configuration files along with ansible yaml files in source control, either as a configurable "template" or as a whole.

If every task were just a stateful function call, or a call to an object's method, then task includes statements are how you create your own function calls. A task list can include tasks which simply pass arguments to other task lists. In this way, you can compose functions of task lists, effectively giving us meta-tasks.

Tasks and meta tasks can be included in either playbooks or roles. A "role" is a description of what a computer is: "mysql", "programmer", "youtube-streamer", etc. This is what makes ansible an idempotent task runner. Remember, ansible runs tasks in order to get a computer's software into some end state. A role describes the configuration needed to take a standard computer and transform it into a home media server. But what if you want your home media server to also be, perhaps, a SteamBox? You could use a new role, but this is a case for a playbook.

Playbooks are selections of roles which are applied to specific user logins and computer ip addresses. Your media serving home computer can also be a steam box, or a "bitcoin_miner", or whatever else you may want it to be. Of course, you can create conflicting roles, but that's what virtualization and containers help manage.

The inventory file provides a mapping between a group of computers, and the login information for each computer. That's all that ansible needs into order to ssh into your "tumblr-scrapers" and get them ready for action, without touching your ever-ready "airBnB for iguanas" service server. One day the world will catch up.

So, to recap, the inventory file provides logins for computers. A playbook maps groups of logins to specific computer roles: "wordpress" or "dev2" or "abc" for the cruel hearted. A role contains everything necessary to turn a computer into a server-beast, including task lists, configuration files, and templates, as well as meta data such as "this role needs this other role in order to work". Tasks describe specific pieces of state which must be true. And modules turn tasks into ssh commands!

Roles also have special "handler" tasks, which are "globally unique" and can be notified by any other task. They are best used to restart services such as apache servers or for triggering computer reboots.

The last key piece of ansible is the humble variable system. Ansible yaml files can contain variables which control their behavior. Often these variables instruct the computer to download a new or otherwise specific program version, such as OpenSSL version 1.0.1f. They are also often used for machine specific configuration, such as naming the machine specially on DNS so everyone knows not to touch "production-load-balancer-plz-no-fail".

Variable rules are pretty simple: you define default variable values, then later you can overwrite them. There's a straightforward (if confusing) precedence order that interested parties can find in the docs. It is similar to: command line variables always win, then shell environmental ansible variables, then multiple levels of ansible yaml file rules, then finally a role's defaults/main.yml.

Because variables can be set anywhere are everywhere, this can lead to confusing and hard to debug situations with variable name clashes, until precedence rules are internalized.

A workflow for making a role

Let's walk through installing the bare essentials for any Mac OS X box: Google Chrome, Transmission torrent client, and VLC. You pay for HBO, but you want Game of Thrones anywhere, anytime, on any device.

It often makes sense to think at the role level of abstraction when writing ansible scripts. "This computer is a dev box configured with my settings, stored in environmental variables." You can use the ansible role manager (arm) application to scaffold new playbooks and roles with arm init -r {{ role_name }}. This will create the new role directory structure in the current working directory.

Once you've scaffolded the "media_mac" role, open the tasks/main.yml file (it may have the .arm suffix as well). Let's think about what needs to happen in order for the computer to be ready for use

  1. Install Google Chrome
  2. Install VLC
  3. Install Transmission

Seems straight forward. Let's list these out:

# media_mac/tasks/main.yml
- name: Install Google Chrome
- name: Install VLC
- name: Install Transmission

How should we install these three apps? Why, the homebrew_cask module is perfect for this.

# media_mac/tasks/main.yml
- name: Install Google Chrome
  homebrew_cask: name=google-chrome state=present

Remember that we are declaring a state we want, in this case, please have google-chrome installed through homebrew_cask. We can also make the yaml more git line diff friendly by taking advantage of yaml syntax.

# media_mac/tasks/main.yml
- name: Install Google Chrome
  homebrew_cask: >
   name=google-chrome state=present

Now, we must test this role. Don't bother writing out the other two installations, there's no point if the google chrome one doesn't work. In order to imprint a role onto a computer, you need a playbook and a hosts file. Ansible can configure the computer it's run on, so configure your ansible_hosts file will look like this:

# IP         special host variable settings    ansible_connection=local

Now let's make a playbook, in playbooks/test.yml. Don't scaffold with arm yet, because we need to type this path often. This playbook is tiny:

- hosts: self
  - role: media_mac

And now run ansible-playbook playbooks/test.yml... and the debugging starts. If you've installed homebrew, then used homebrew to install the cask command, then run the cask command, you set up ansible and its dependencies, and ansible hasn't changed yet, and this tutorial has all the required steps, and you're lucky, the command will work.

Let's update the role yaml to prevent you in the future from running into the homebrew problem. We're going to check to see if homebrew exists on the media_mac already. If homebrew was more programmer friendly or I was smarter, we would simply ensure homebrew's existence or install it, but right now we're going to push the problem onto future you, using the ansible stat module

The stat module lets you do light system fact checking at run time. You register the end result of the stat command, and then you can reference that result later. Here, we check to see if brew is installed, and choose to fail if it isn't.

# media_mac/tasks/main.yml
- name: check if homebrew is already installed
  stat: "path=/usr/local/bin/brew"
  register: brew_exists

- fail: msg="Please install homebrew with the ruby installer script, then cask, then run cask once for permissions reasons"
  when: brew_exists.stat.exists == False

- name: Install Google Chrome

Now that we've already started debugging, before we ever even get "hello world" working. Welcome to devops. Let's move on and hope nothing else bad happens and forces us to adjust our engineering estimate again.

Use to discover that VLC and transmission can also be installed with homebrew_cask. Other installations might require unzipping a tar archive somewhere, or running an installation script with the shell command. Luckily for us, these things all exist already.

Now that you can install everything you need, let's do some configuration. Media Macs should be friendly to everyone, even the family dog. Let's add these apps to the dock. Normally, on a mac, that's an issue of messing around with an XML file called a preference list. Preference lists (plists) are similar to Yaml, but look like HTML with all those <words> tags.

Instead let's use dockutil, a python program which can manage the dock more easily than we can. Let's use brew for this.

- name: install /usr/local/bin/dockutil to manage the dock
  homebrew: >

Note the /usr/local/bin/dockutil. This is used by the shell module to run dockutil. Prefer absolute paths if possible. Let's use dockutil to add the Google Chrome to the Dock.

- name: "add google chrome to the dock"
  shell: /usr/local/bin/dockutil --add "/opt/homebrew-cask/Caskroom/google-chrome/latest/Google"

Note that this task must run after the dockutil install command, otherwise it won't work on untouched computers. If you run this command again, there will be two Chromes. Oops. Let's fix that. First, let's collect the output of dockutil --list and then if "Google Chrome" is in that output, don't add another dock item.

- name: read defaults to know what to add to the dock
  shell: /usr/local/bin/dockutil --list
  register: dock_list

- name: "add google chrome to the dock"
  shell: /usr/local/bin/dockutil --add "/opt/homebrew-cask/Caskroom/google-chrome/latest/Google"
  when: dock_list.stdout.find("Google Chrome") == -1

Do that for the other two apps, and you're good to go. If you want to do more, check out the list of ansible modules and how to use them. Also check out the tips section below, as it illustrates how I develop with ansible.

Tips: to insure promptness

It takes a day or two to get used to ansible. This section should help past most of the ansible humps.


  1. Use the debug and assert modules to assist in debugging
  2. Use the --step CLI flag to enable interactive mode
  3. Use the --start-at-task CLI directive to skip to the step you're currently debugging
  4. Run ps aux | grep ansible on the remote host to track the ansible process.
  5. Run ps aux | grep {{ task_underlying_command }} to track the amount of CPU time a long running task has taken.
  6. Understand ssh, privilege escalation, and ssh remote agents.

Getting better at ansible

Also check out ansible galaxy, and read through some other roles to see what's possible. Favor an iterative approach when building playbooks, knocking out installation problems as you go along. Combine tasks into meta tasks, and use variables to and loops to write less and do more. Favor actions which can be "OK"'d over "CHANGED", though not always necessary or possible.

Try starting specific, then becoming more abstract as the role grows. Knock one problem down at a time, and refactor and add variables once you know your patterns.

If you have the data you need to know whether or not to run a task, and just need to get that data into ansible, there's usually a way. Aside from computer fact gathering, you can offer a prompt to a user to ask for input. You can also share encrypted data (such as ssh keys?) with ansible vault. You can control ansible with anything, as lookups allow you to communicate with external API's. If you need certain programs to be installed on the same server rack, use ansible tags to control deployment to inventories.

How to write a great Ansible role / playbook / task

I am by no means an ansible expert, but I'm working on getting there


  • Ansible is a great tool
    • Fast to script / update
    • Easy to use and understand
    • Good abstractions
  • Since it's a DSL, there's a learning curve
    • Have to understand quite a bit before it "clicks"
    • Making new roles can be daunting, but it shouldn't be
  • High Level Overview
    • Everything boils down to idempotent module calls
    • Tasks call modules
    • Meta-tasks call tasks using includes (meta is my word)
    • Roles combine tasks with metadata to raise abstraction
      • tasks for the role
      • default variables if nothing else is set
      • files which must exist for the role to work
      • templates for files which must be created and configured
      • handlers are globally unique tasks which can be notified
        • "Handlers are best used to restart services and trigger reboots"
    • An inventory creates a mapping between SSH and human readable names
    • Playbooks combine hosts and roles
    • Variables can be set anywhere, and are everywhere
      • Because it's declarative, this isn't so bad
      • Can still cause debugging issues
  • Tools to aid development
    • ps aux | grep ansible
    • ps aux | grep {{ task_name }}
    • debug module
    • arm command line tool
    • A strong understanding of ssh, privilege escalation
    • Command line flag --step lets you interactively run a playbook
  • A workflow for making a new role
    • `arm init -r {{ role_name }}
    • open {{ role_name }}/tasks/main.yml.arm, remove .arm from name
    • list every step you know you need with - name:
    • write out the first task, don't use variables
    • make shell command to run the playbook
      • if necessary, add your sudo pass to it w/ --ask-become-pass
    • run the playbook, ensure the task works, check with ssh session
      • This is "running the test suite" in TDD
    • write out the next test, repeat
    • add #TODO's for edge cases like "homebrew cask can't install vagrant"
    • when the script works perfectly, or all edge cases are discovered, you're done
    • find patterns, turn them into their own task file, use includes
    • find constants, add them as a default variable
    • adding a shell command? add a when: clause, maybe utilizing the stat module
    • need to do something conditionally? Inspect the stdout or stderr from a previously run task, using register
  • Tips: to insure promptness
    • Start Specific, Become Abstract
      • Avoid loops until you need them
    • When you use jinja templating, always add double quotes "{{ some_variable }}"
    • Use the greater than sign (>) for line-diffs in git
    • Use task includes to make meta-tasks
    • Always set a default value for a variable, because someone might use it in a conditional check and hell will break loose
    • Favor "OK" and "SKIPPED" over "CHANGED"
      • Use fact gathering and stat + register checking to your advantage
    • Write descriptive fail states to support the user self debugging
    • When developing or debugging, don't disable expensive or time consuming tasks, front load them and use --start-at-task to skip ahead of setup tasks
    • Follow the best practices around organization:
    • If you have data somewhere, but don't know how to give it to ansible, check Special Topics:
      • Data in the user who is running the playbook: use prompt
      • Data must be encrypted: use vault
      • Data in external service: use lookups
      • Data describes machine instances: use tags


Using > for git line diffs

- name: tap php cask
  homebrew_tap: >

Using creates to control shell command running

- name: install composer through php
  shell: curl -sS | php && mv composer.phar /usr/local/bin/composer
    creates: /usr/local/bin/composer

Using Check and Fail together

- name: check if vagrant is already installed
  stat: "path=/usr/local/bin/vagrant"
  register: vagrant_exists

- fail: msg="Please install vagrant with brew cask install vagrant"
  when: vagrant_exists.stat.exists == False

descriptive failure states

- name: check if vagrant is already installed
  stat: "path=/usr/local/bin/vagrant"
  register: vagrant_exists

- fail: msg="Please install vagrant with brew cask install vagrant"
  when: vagrant_exists.stat.exists == False

Also works if the user needs to make a file, can be used as a koan tool

# Prerequisite tasks to fail if there's nothing there
- name: ensure {{ ssh_key }} exists
  stat: "path=~/.ssh/{{ ssh_key }}"
  register: homestead_key

- fail: msg="Please create your {{ ssh_key }} or change your playbook variable ssh_key"
  when: homestead_key.stat.exists == False
Copy link

sivabudh commented Oct 3, 2017

Have you tried SaltStack? Do you have any opinions on it? Being a functional aficionado, I'm hung up on how some other sites (specifically: categorize Ansible as "mutable and procedural" while tout SaltStack as "mutable and declarative."

I myself have just started using Ansible in a UAT environment, so just getting my feet wet. I think choosing Ansible is the "right choice" due to its sheer ubiquity and good networking automation support.

Anyway, if you have any thoughts on this, would love to hear.

PS: Thanks for writing such a good blog post on Ansible!

Copy link

Good article

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment