-
-
Save john-westcott-iv/6ba0813b03406458fcbf0893a60e6006 to your computer and use it in GitHub Desktop.
In this example, we will reboot the target machine, wait for it to come up, reboot it again and then finally create the file we are looking for. | |
Lines starting with >>>>> are indicators of what was happening to the machine, not Ansible output. | |
PLAY [Try to survive and detect a reboot] ******************************************************************* | |
TASK [include_tasks] ******************************************************************* | |
included: run_check_test.yml for 192.168.0.31 | |
>>>>> BEGAN MACHINE REBOOT <<<<< | |
TASK [Check for the file] ******************************************************************* | |
FAILED - RETRYING: Check for the file (2 retries left). | |
fatal: [192.168.0.31]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.0.31 port 22: Connection refused", "skip_reason": "Host 192.168.0.31 is unreachable", "unreachable": true} | |
TASK [Sleep if the host was unreachable] ******************************************************************* | |
Pausing for 3 seconds | |
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort) | |
ok: [192.168.0.31 -> localhost] | |
TASK [set_fact] ******************************************************************* | |
ok: [192.168.0.31] | |
TASK [include_tasks] ******************************************************************* | |
included: run_check_test.yml for 192.168.0.31 | |
TASK [Check for the file] ******************************************************************* | |
fatal: [192.168.0.31]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.0.31 port 22: Operation timed out", "skip_reason": "Host 192.168.0.31 is unreachable", "unreachable": true} | |
TASK [Sleep if the host was unreachable] ******************************************************************* | |
Pausing for 3 seconds | |
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort) | |
ok: [192.168.0.31 -> localhost] | |
TASK [set_fact] ******************************************************************* | |
ok: [192.168.0.31] | |
TASK [include_tasks] ******************************************************************* | |
included: run_check_test.yml for 192.168.0.31 | |
TASK [Check for the file] ******************************************************************* | |
fatal: [192.168.0.31]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.0.31 port 22: Operation timed out", "skip_reason": "Host 192.168.0.31 is unreachable", "unreachable": true} | |
TASK [Sleep if the host was unreachable] ******************************************************************* | |
Pausing for 3 seconds | |
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort) | |
ok: [192.168.0.31 -> localhost] | |
TASK [set_fact] ******************************************************************* | |
ok: [192.168.0.31] | |
TASK [include_tasks] ******************************************************************* | |
included: run_check_test.yml for 192.168.0.31 | |
>>>>> MACHINE HAS BOOTED <<<<< | |
TASK [Check for the file] ******************************************************************* | |
FAILED - RETRYING: Check for the file (2 retries left). | |
FAILED - RETRYING: Check for the file (1 retries left). | |
>>>>> BEGIN MACHINE REBOOT <<<<< | |
fatal: [192.168.0.31]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.0.31 port 22: Connection refused", "skip_reason": "Host 192.168.0.31 is unreachable", "unreachable": true} | |
TASK [Sleep if the host was unreachable] ******************************************************************* | |
Pausing for 3 seconds | |
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort) | |
ok: [192.168.0.31 -> localhost] | |
TASK [set_fact] ******************************************************************* | |
ok: [192.168.0.31] | |
TASK [include_tasks] ******************************************************************* | |
included: /Users/jowestco/GIT/github/john-westcott-iv/ansible-survive-disconnect/run_check_test.yml for 192.168.0.31 | |
TASK [Check for the file] ******************************************************************* | |
fatal: [192.168.0.31]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.0.31 port 22: Operation timed out", "skip_reason": "Host 192.168.0.31 is unreachable", "unreachable": true} | |
TASK [Sleep if the host was unreachable] ******************************************************************* | |
Pausing for 3 seconds | |
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort) | |
ok: [192.168.0.31 -> localhost] | |
TASK [set_fact] ******************************************************************* | |
ok: [192.168.0.31] | |
TASK [include_tasks] ******************************************************************* | |
included: run_check_test.yml for 192.168.0.31 | |
TASK [Check for the file] ******************************************************************* | |
FAILED - RETRYING: Check for the file (2 retries left). | |
>>>>> CREATE FILE BEING MONITORED FOR <<<<< | |
ok: [192.168.0.31] | |
TASK [Sleep if the host was unreachable] ******************************************************************* | |
skipping: [192.168.0.31] | |
TASK [set_fact] ******************************************************************* | |
ok: [192.168.0.31] | |
TASK [include_tasks] ******************************************************************* | |
skipping: [192.168.0.31] | |
TASK [Fail if we didn't get the file] ******************************************************************* | |
skipping: [192.168.0.31] | |
TASK [Post task] ******************************************************************* | |
ok: [192.168.0.31] => { | |
"msg": "This is the post task" | |
} | |
PLAY RECAP ******************************************************************* | |
192.168.0.31 : ok=20 changed=0 unreachable=5 failed=0 skipped=8 rescued=0 ignored=0 |
@john-westcott-iv from your recent ansible post: https://www.ansible.com/blog/tolerable-ansible
What is controlling the decrement of the safety_counter. You set it to 5 in the run_check_test.yml, Im not seeing what controls decrementing it.
Wouldnt you need something like this in the main.yml
- set_fact: safety_counter: "5"
and then in the run_check_test.yml do something like:
- set_fact: safety_counter: "{{ safety_counter | int - 1 }}"
I was also thinking the same, I expected the safety_counter to be decremented somewhere.
When I test this I got ERROR! A recursion loop was detected with the roles specified. Make sure child roles do not have dependencies on parent roles
The problem was that I was using import_tasks
. When I change it to include_tasks
everything worked like a charm.
@john-westcott-iv from your recent ansible post: https://www.ansible.com/blog/tolerable-ansible
What is controlling the decrement of the safety_counter. You set it to 5 in the run_check_test.yml, Im not seeing what controls decrementing it.
Wouldnt you need something like this in the main.yml
- set_fact: safety_counter: "5"
and then in the run_check_test.yml do something like:
- set_fact: safety_counter: "{{ safety_counter | int - 1 }}"