Automation -both mechanical and computational- has become a hot topic over the last few years. Software systems and infrastructures are getting more complex than ever, making tools such as Infrastructure-as-Code (IaC) ones almost compulsory when working with non-trivial environments.
In this article, we will take a look at an interesting use-case that involves Red Hat's Ansible Tower's Automation Platform, defined as "an enterprise automation platform for the entire IT organization, no matter where you are in your automation journey".
This is not an introduction article on Ansible Tower; we will herebyI assume that you have experience with the following topics:
- Designing Ansible Tower Workflows
- Using Ansible Collections
- Working with Ansible Tower's APIs
The following use case arises from a tricky problem faced while using Ansible Tower Workflow in a challenging environment, which comprises:
- Up to 15.000 target hosts in production with different operating systems and middlewares
- Security healthchecks run in Ansible on every host
- Multiple teams involved in security analisys of reports generated by ansible playbooks
- A complex Ansible Tower Workflow
One of the main goals for our customer, in this scenario, is to keep track of the status of each single host during all Workflow executions. Unfortunately, Ansible Tower platform doesn't offer an out-of-the-box solution to manage this problem.
So far, our customer had never had a global view for the Ansible Tower Workflow execution status, which made troubleshooting tough and time-consuming. In particular, to check a single host's status, we needed to click on each Workflow Node, scroll down to the end of the log, and view the PLAY RECAP written in it! 😞
As you can imagine, working with thousands of host and Workflows composed by dozens of Worflow Nodes, the probability of something failing is not negligible, which makes an easy troubleshooting a key requirement for quick fixes and improvements.
We therefore asked ourselves: can we create some reports showing the status of the Workflow for each host at a glance?
The solution we suggest is based on an Ansible role which achieves the following:
- get the job execution status in a Workflow for every node composing the Workflow
- manipulate the data, in order to get information grouped per host
- generate a
.csv
report with this information - optionally, send this report by email
Note: all the code shown in this post can be found on this GitHub repository.
We will need:
-
Ansible Tower: Ansible Tower Automation Platform from Red Hat. The community version of this project, AWX, will also do.
-
Ansible Collection: ansible.tower, available on Ansible Automation Hub, or the community version awx.awx
The tasks for our ansible_tower_workflow_report
role are divided into 3 task groups:
https://gist.github.com/91ba0544790840c1eeb31da8fd55d1a4
This file includes all the tasks necessary to get the information needed to generate a report
- retrieve a list of Workflow Nodes Jobs IDs, calling the Ansible Tower API using lookup module with ansible.tower.tower_api.
https://gist.github.com/5e9320ef4a2d10b0d983f137f471bece
- Iterate over this list to get the Jobs IDs related to the Workflow Nodes Jobs ID.
https://gist.github.com/f391ef20e2e94eab620ea8c0812b0510
- Query Job Host Summaries and create a list of dictionaries with info for each host during Job execution.
https://gist.github.com/481db0eaa9610dbf63ce2d0fa378c225
- In conclusion, the 🎩 trick, group by
host_name
, to get all the information grouped for each host.
https://gist.github.com/ca24f5f057764f955a4261b4be7181d4
Take a moment to understand host_summaries
, a data structure composed by a list of lists where:
list[*].list[0]
contains thehost_name
list[*].list[1]
is a list of dictionaries, job_host_summaries, which contains all the information about all the Job Template executed in the Workflow for that host.
https://gist.github.com/08fbb35380b810db01ee76ace02744ed
To generate .csv
report we use a Jinja2 template.
https://gist.github.com/878453b907023bd1d5cecf379beb3634
To use this role you need an Ansible Playbook such as the following one:
https://gist.github.com/4cb1dd75687712fc7f0f2c1eff617280
Make sure you run this playbook on localhost (i.e. the Ansible Tower Machine)
Then set up Ansible Tower Platform :
- Create a Project that point on your repository
- Create a Job Template referring to the
workflow_report.yaml
playbook - Add a Survey for this Job Template named
workflow_id
of typeNumber
- Find one existing Workflow Job ID for which you want you get the report
- Finally launch the Job Template passing in Survey the Workflow Job id
Here is an example of the .csv
report generated.
Workflow id | host | job_id | job_name | success | job_id | job_name | success | job_id | job_name | success |
---|---|---|---|---|---|---|---|---|---|---|
46919 | node01.local | 46921 | prepare | True | 46923 | calibration | True | 46925 | scan | True |
46919 | node02.local | 46921 | prepare | True | 46923 | calibration | True | 46925 | scan | True |
46919 | node03.local | 46921 | prepare | True | 46923 | calibration | True | 46925 | scan | True |
46919 | node04.local | 46921 | prepare | True | 46923 | calibration | True | 46925 | scan | True |
46919 | node05.local | 46921 | prepare | True | 46923 | calibration | True | 46925 | scan | True |
As you can see, we have obtained a complete status of the Workflow per host at a glance.
The complex scenario described here paves the road to many other topics regarding Ansible Tower Workflow management, which we will describe in future posts. For instance, it is possible to implement a log analysis infrastructure using ARA.
What do you think of this approach? Have you ever tried achieving a similar goal? Your feedback is very welcome!