This is in response to this stackoverflow question The solution was posted by Vladimir Botka.
- YouTube walkthough here
When Ansible runs, it's great for setting up and configuring servers as per your playbooks and roles. But have you ever needed to collect a list of failed hosts where Ansible wasn't able to connect to them? In this demo I'm going to show you how to collect a list of failed and unreachable hosts.
I'm simply using a list of servers that don't exist, along with one remote server that I can get to.
This is my inventory file:
[ec2-user@ip-172-31-16-55 conn_check]$ cat hosts.ini
server1
server2
server3
server4
server5
[ubuntu]
13.42.56.230
[ubuntu:vars]
ansible_user=ubuntu
ansible_ssh_private_key_file=~/.ssh/other.pem
You might think that ping would give you a list, and in a sense it does. At the end of the playbook run, you do indeed see a list on the screen of hosts where Ansible failed to run. But without cut&pasting that into a file and messing around with awk or deleting text, you won't have that in a file file.
Here is a simple ping playbook that registers the ping result:
---
- hosts: all
gather_facts: true
tasks:
- name: ping hosts
ping:
register: ping_results
- name: ping results
debug:
var: ping_results
And this is what you get back:
[ec2-user@ip-172-31-16-55 conn_check]$ ansible-playbook -i hosts.ini ping.yml
PLAY [all] ************************************************************************
TASK [Gathering Facts] ************************************************************
fatal: [server4]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server4: Name or service not known", "unreachable": true}
fatal: [server2]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server2: Name or service not known", "unreachable": true}
fatal: [server3]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server3: Name or service not known", "unreachable": true}
fatal: [server5]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server5: Name or service not known", "unreachable": true}
fatal: [server1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname server1: Name or service not known", "unreachable": true}
ok: [13.42.56.230]
TASK [ping hosts] *****************************************************************
ok: [13.42.56.230]
TASK [ping results] ***************************************************************
ok: [13.42.56.230] => {
"ping_results": {
"changed": false,
"failed": false,
"ping": "pong"
}
}
PLAY RECAP ************************************************************************
13.42.56.230 : ok=3 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
server1 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
server2 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
server3 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
server4 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
server5 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
As you can see, the only result that the register was able to show you, was the successful pong. That means you can't get unreachable nodes simple by pinging a host and registering the result.
This is DEFAULT Ansible behaviour.
You could take the output that ansible gave you and cut&paste it around to make your own list, but Ansible doesn't make it easy to get that list out because as soon as a host is unreachange, Ansible forgets it and moves on.
Ansible has this wonderful thing called magic or speacial variables
. You can see them here
These two special variables are the key:
ansible_play_hosts
List of hosts in the current play run, not limited by the serial. Failed/Unreachable hosts are excluded from this list.
ansible_play_hosts_all
List of all the hosts that were targeted by the play
So you can write a playbook to work out the difference between these two values and produce a file containing the output.
Here is an example playbook where we do exactly that.
The first playbook is a bare-bones playbook to just give you list of unreachable hosts. This is probably all you need.
---
- hosts: all
gather_facts: true
tasks:
- block:
- debug:
var: ansible_play_hosts_all
- debug:
var: ansible_play_hosts
- set_fact:
down: "{{ ansible_play_hosts_all|difference(ansible_play_hosts)|join('\n') }}"
- copy:
content: "{{ down }}"
dest: "/opt/how2/conn_check/result.txt"
delegate_to: localhost
run_once: true
And the resulting file:
[ec2-user@ip-172-31-16-55 conn_check]$ cat result.txt
server1
server2
server3
server4
server5
If you want to do a bit more with the information, like you want to put the output into a spreadsheet or you need to report on it, you can add the follwing to the playbook and create a new file that adds ",UNREACHABLE" to every node. You may not need to do this but in my use-case, I was adding this to a spreadsheet along with a load of other server facts and if you left this list off, you wouldn;t have correct data in the spreadhseet and you would be giving across the complete picture.
Here's the update playbook:
---
- hosts: all
gather_facts: true
tasks:
- block:
- debug:
var: ansible_play_hosts_all
- debug:
var: ansible_play_hosts
- set_fact:
down: "{{ ansible_play_hosts_all|difference(ansible_play_hosts)|join('\n') }}"
- copy:
content: "{{ down }}"
dest: "/opt/how2/conn_check/result.txt"
delegate_to: localhost
run_once: true
- name: read contents of a file
set_fact:
file_contents: "{{ lookup('file', 'result.txt') }}"
delegate_to: localhost
- name: Append ",UNREACHABLE" to every line
blockinfile:
path: "/opt/how2/conn_check/result1.txt"
block: |
{% for line in file_contents.split('\n') %}
{{ line }},UNREACHABLE
{% endfor %}
create: yes
with_file:
- "result.txt"
delegate_to: localhost
- lineinfile:
path: "/opt/how2/conn_check/result1.txt"
regexp: '^# '
state: absent
delegate_to: localhost
And the results:
[ec2-user@ip-172-31-16-55 conn_check]$ cat result1.txt
server1,UNREACHABLE
server2,UNREACHABLE
server3,UNREACHABLE
server4,UNREACHABLE
server5,UNREACHABLE
Thank you so much