Last active June 25, 2024 16:53
For when you need Ansible to give you a list of unreachable or failed hosts

How to get a list of failed or unreachable hosts

This is in response to this stackoverflow question The solution was posted by Vladimir Botka.

  • YouTube walkthough here

When Ansible runs, it's great for setting up and configuring servers as per your playbooks and roles. But have you ever needed to collect a list of failed hosts where Ansible wasn't able to connect to them? In this demo I'm going to show you how to collect a list of failed and unreachable hosts.

My setup:

I'm simply using a list of servers that don't exist, along with one remote server that I can get to.

This is my inventory file:

[ec2-user@ip-172-31-16-55 conn_check]$ cat hosts.ini



Surely ping does it?

You might think that ping would give you a list, and in a sense it does. At the end of the playbook run, you do indeed see a list on the screen of hosts where Ansible failed to run. But without cut&pasting that into a file and messing around with awk or deleting text, you won't have that in a file file.

Register ping results!

Here is a simple ping playbook that registers the ping result:

- hosts: all
  gather_facts: true
  - name: ping hosts
    register: ping_results

  - name: ping results
      var: ping_results

And this is what you get back:

[ec2-user@ip-172-31-16-55 conn_check]$ ansible-playbook -i hosts.ini ping.yml

PLAY [all] ************************************************************************

TASK [Gathering Facts] ************************************************************
fatal: [server4]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server4: Name or service not known", "unreachable": true}
fatal: [server2]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server2: Name or service not known", "unreachable": true}
fatal: [server3]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server3: Name or service not known", "unreachable": true}
fatal: [server5]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server5: Name or service not known", "unreachable": true}
fatal: [server1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server1: Name or service not known", "unreachable": true}
ok: []

TASK [ping hosts] *****************************************************************
ok: []

TASK [ping results] ***************************************************************
ok: [] => {
    "ping_results": {
        "changed": false,
        "failed": false,
        "ping": "pong"

PLAY RECAP ************************************************************************               : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
server1                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
server2                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
server3                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
server4                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
server5                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

As you can see, the only result that the register was able to show you, was the successful pong. That means you can't get unreachable nodes simple by pinging a host and registering the result.

This is DEFAULT Ansible behaviour.

You could take the output that ansible gave you and cut&paste it around to make your own list, but Ansible doesn't make it easy to get that list out because as soon as a host is unreachange, Ansible forgets it and moves on.

So how do I do it?

Ansible has this wonderful thing called magic or speacial variables. You can see them here

These two special variables are the key:

List of hosts in the current play run, not limited by the serial. Failed/Unreachable hosts are excluded from this list.

List of all the hosts that were targeted by the play

So you can write a playbook to work out the difference between these two values and produce a file containing the output.

Show me the playbook

Here is an example playbook where we do exactly that.

The first playbook is a bare-bones playbook to just give you list of unreachable hosts. This is probably all you need.

- hosts: all
  gather_facts: true
    - block:
        - debug:
            var: ansible_play_hosts_all
        - debug:
            var: ansible_play_hosts
        - set_fact:
            down: "{{ ansible_play_hosts_all|difference(ansible_play_hosts)|join('\n') }}"
        - copy:
            content: "{{ down }}"
            dest: "/opt/how2/conn_check/result.txt"
          delegate_to: localhost
      run_once: true

And the resulting file:

[ec2-user@ip-172-31-16-55 conn_check]$ cat result.txt

Update the playbook?

If you want to do a bit more with the information, like you want to put the output into a spreadsheet or you need to report on it, you can add the follwing to the playbook and create a new file that adds ",UNREACHABLE" to every node. You may not need to do this but in my use-case, I was adding this to a spreadsheet along with a load of other server facts and if you left this list off, you wouldn;t have correct data in the spreadhseet and you would be giving across the complete picture.

Here's the update playbook:

- hosts: all
  gather_facts: true
    - block:
        - debug:
            var: ansible_play_hosts_all
        - debug:
            var: ansible_play_hosts
        - set_fact:
            down: "{{ ansible_play_hosts_all|difference(ansible_play_hosts)|join('\n') }}"
        - copy:
            content: "{{ down }}"
            dest: "/opt/how2/conn_check/result.txt"
          delegate_to: localhost
      run_once: true

    - name: read contents of a file
        file_contents: "{{ lookup('file', 'result.txt') }}"
      delegate_to: localhost

    - name: Append ",UNREACHABLE" to every line
        path: "/opt/how2/conn_check/result1.txt"
        block: |
          {% for line in file_contents.split('\n') %}
          {{ line }},UNREACHABLE
          {% endfor %}
        create: yes
        - "result.txt"
      delegate_to: localhost

    - lineinfile:
        path: "/opt/how2/conn_check/result1.txt"
        regexp: '^# '
        state: absent
      delegate_to: localhost

And the results:

[ec2-user@ip-172-31-16-55 conn_check]$ cat result1.txt
