Skip to content

Instantly share code, notes, and snippets.

@dmccuk
Last active April 17, 2024 13:39
Show Gist options
  • Save dmccuk/64ce4282529953e8573abc0a3c50b826 to your computer and use it in GitHub Desktop.
Save dmccuk/64ce4282529953e8573abc0a3c50b826 to your computer and use it in GitHub Desktop.
For when you need Ansible to give you a list of unreachable or failed hosts

How to get a list of failed or unreachable hosts

This is in response to this stackoverflow question The solution was posted by Vladimir Botka.

  • YouTube walkthough here

When Ansible runs, it's great for setting up and configuring servers as per your playbooks and roles. But have you ever needed to collect a list of failed hosts where Ansible wasn't able to connect to them? In this demo I'm going to show you how to collect a list of failed and unreachable hosts.

My setup:

I'm simply using a list of servers that don't exist, along with one remote server that I can get to.

This is my inventory file:

[ec2-user@ip-172-31-16-55 conn_check]$ cat hosts.ini
server1
server2
server3
server4
server5

[ubuntu]
13.42.56.230

[ubuntu:vars]
ansible_user=ubuntu
ansible_ssh_private_key_file=~/.ssh/other.pem

Surely ping does it?

You might think that ping would give you a list, and in a sense it does. At the end of the playbook run, you do indeed see a list on the screen of hosts where Ansible failed to run. But without cut&pasting that into a file and messing around with awk or deleting text, you won't have that in a file file.

Register ping results!

Here is a simple ping playbook that registers the ping result:

---
- hosts: all
  gather_facts: true
  tasks:
  - name: ping hosts
    ping:
    register: ping_results

  - name: ping results
    debug:
      var: ping_results

And this is what you get back:

[ec2-user@ip-172-31-16-55 conn_check]$ ansible-playbook -i hosts.ini ping.yml

PLAY [all] ************************************************************************

TASK [Gathering Facts] ************************************************************
fatal: [server4]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server4: Name or service not known", "unreachable": true}
fatal: [server2]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server2: Name or service not known", "unreachable": true}
fatal: [server3]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server3: Name or service not known", "unreachable": true}
fatal: [server5]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server5: Name or service not known", "unreachable": true}
fatal: [server1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server1: Name or service not known", "unreachable": true}
ok: [13.42.56.230]

TASK [ping hosts] *****************************************************************
ok: [13.42.56.230]

TASK [ping results] ***************************************************************
ok: [13.42.56.230] => {
    "ping_results": {
        "changed": false,
        "failed": false,
        "ping": "pong"
    }
}

PLAY RECAP ************************************************************************
13.42.56.230               : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
server1                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
server2                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
server3                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
server4                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
server5                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

As you can see, the only result that the register was able to show you, was the successful pong. That means you can't get unreachable nodes simple by pinging a host and registering the result.

This is DEFAULT Ansible behaviour.

You could take the output that ansible gave you and cut&paste it around to make your own list, but Ansible doesn't make it easy to get that list out because as soon as a host is unreachange, Ansible forgets it and moves on.

So how do I do it?

Ansible has this wonderful thing called magic or speacial variables. You can see them here

These two special variables are the key:

ansible_play_hosts
List of hosts in the current play run, not limited by the serial. Failed/Unreachable hosts are excluded from this list.

ansible_play_hosts_all
List of all the hosts that were targeted by the play

So you can write a playbook to work out the difference between these two values and produce a file containing the output.

Show me the playbook

Here is an example playbook where we do exactly that.

The first playbook is a bare-bones playbook to just give you list of unreachable hosts. This is probably all you need.

---
- hosts: all
  gather_facts: true
  tasks:
    - block:
        - debug:
            var: ansible_play_hosts_all
        - debug:
            var: ansible_play_hosts
        - set_fact:
            down: "{{ ansible_play_hosts_all|difference(ansible_play_hosts)|join('\n') }}"
        - copy:
            content: "{{ down }}"
            dest: "/opt/how2/conn_check/result.txt"
          delegate_to: localhost
      run_once: true

And the resulting file:

[ec2-user@ip-172-31-16-55 conn_check]$ cat result.txt
server1
server2
server3
server4
server5

Update the playbook?

If you want to do a bit more with the information, like you want to put the output into a spreadsheet or you need to report on it, you can add the follwing to the playbook and create a new file that adds ",UNREACHABLE" to every node. You may not need to do this but in my use-case, I was adding this to a spreadsheet along with a load of other server facts and if you left this list off, you wouldn;t have correct data in the spreadhseet and you would be giving across the complete picture.

Here's the update playbook:

---
- hosts: all
  gather_facts: true
  tasks:
    - block:
        - debug:
            var: ansible_play_hosts_all
        - debug:
            var: ansible_play_hosts
        - set_fact:
            down: "{{ ansible_play_hosts_all|difference(ansible_play_hosts)|join('\n') }}"
        - copy:
            content: "{{ down }}"
            dest: "/opt/how2/conn_check/result.txt"
          delegate_to: localhost
      run_once: true

    - name: read contents of a file
      set_fact:
        file_contents: "{{ lookup('file', 'result.txt') }}"
      delegate_to: localhost

    - name: Append ",UNREACHABLE" to every line
      blockinfile:
        path: "/opt/how2/conn_check/result1.txt"
        block: |
          {% for line in file_contents.split('\n') %}
          {{ line }},UNREACHABLE
          {% endfor %}
        create: yes
      with_file:
        - "result.txt"
      delegate_to: localhost

    - lineinfile:
        path: "/opt/how2/conn_check/result1.txt"
        regexp: '^# '
        state: absent
      delegate_to: localhost

And the results:

[ec2-user@ip-172-31-16-55 conn_check]$ cat result1.txt
server1,UNREACHABLE
server2,UNREACHABLE
server3,UNREACHABLE
server4,UNREACHABLE
server5,UNREACHABLE
@ansiblelove
Copy link

Thank you so much

@dmccuk
Copy link
Author

dmccuk commented May 19, 2023

Thank you so much

No problem. I don't get many people telling me if it helped or not so thanks for letting me know!

@ansiblelove
Copy link

:) This helped me a lot in knowing 100s of servers out of 1700+ in my environment that were not reachable. GOD bless!

@Nikhil26112
Copy link

Nikhil26112 commented Jun 12, 2023

I am getting this error and I have tried this as well ansible-galaxy collection install community.general .. It is still giving me same error..
FAILED! => {"msg": "template error while templating string: No filter named 'difference'.. String: {{ ansible_play_hosts_all | difference(ansible_play_hosts) }}"}
Thank you in advance

@sagarmitkari99
Copy link

can we also get the ssh error in the 3rd field lets say

@RadhikaDutt86
Copy link

Thanks a lot , helped a lot. Broke my head for a few hours using ping module.

@Shantanugit
Copy link

Shantanugit commented Jan 9, 2024

can we not add a custom message here which can specify the reason of decline the ssh request as it cannot be "UNREACHABLE" all the times.

@pavanreddy997
Copy link

Thanks,

I was trying all kinds of things to get the unreachable hosts list out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment