LucaSantopadre/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    Ansible Tower - Global Workflow Report

Automation -both mechanical and computational- has become a hot topic over the last few years. Software systems and infrastructures are getting more complex than ever, making tools such as Infrastructure-as-Code (IaC) ones almost compulsory when working with non-trivial environments.
In this article, we will take a look at an interesting use-case that involves Red Hat's Ansible Tower's Automation Platform, defined as "an enterprise automation platform for the entire IT organization, no matter where you are in your automation journey".

This is not an introduction article on Ansible Tower; we will herebyI assume that you have experience with the following topics:


Designing Ansible Tower Workflows
Using Ansible Collections
Working with Ansible Tower's APIs

The scenario

The following use case arises from a tricky problem faced while using Ansible Tower Workflow in a challenging environment, which comprises:

Up to 15.000 target hosts in production with different operating systems and middlewares
Security healthchecks run in Ansible on every host
Multiple teams involved in security analisys of reports generated by ansible playbooks
A complex Ansible Tower Workflow


An example of our customer's workflow


One of the main goals for our customer, in this scenario, is to keep track of the status of each single host during all Workflow executions. Unfortunately, Ansible Tower platform doesn't offer an out-of-the-box solution to manage this problem.
So far, our customer had never had a global view for the Ansible Tower Workflow execution status, which made troubleshooting tough and time-consuming. In particular, to check a single host's status, we needed to click on each Workflow Node, scroll down to the end of the log, and view the PLAY RECAP written in it! 😞
As you can imagine, working with thousands of host and Workflows composed by dozens of Worflow Nodes, the probability of something failing is not negligible, which makes an easy troubleshooting a key requirement for quick fixes and improvements.
We therefore asked ourselves: can we create some reports showing the status of the Workflow for each host at a glance?
The solution: Ansible Tower Workflow Report

The solution we suggest is based on an Ansible role which achieves the following:

get the job execution status in a Workflow for every node composing the Workflow
manipulate the data, in order to get information grouped per host
generate a .csv report with this information
optionally, send this report by email


Note: all the code shown in this post can be found on this GitHub repository.

Prerequisites

We will need:


Ansible Tower: Ansible Tower Automation Platform from Red Hat. The community version of this project, AWX, will also do.


Ansible Collection: ansible.tower, available on Ansible Automation Hub, or the community version awx.awx


The tasks for our ansible_tower_workflow_report role are divided into 3 task groups:
https://gist.github.com/91ba0544790840c1eeb31da8fd55d1a4
Task #1: Get workflow jobs

This file includes all the tasks necessary to get the information needed to generate a report

retrieve a list of Workflow Nodes Jobs IDs, calling the Ansible Tower API using lookup module with ansible.tower.tower_api.

https://gist.github.com/5e9320ef4a2d10b0d983f137f471bece

Iterate over this list to get the Jobs IDs related to the Workflow Nodes Jobs ID.

https://gist.github.com/f391ef20e2e94eab620ea8c0812b0510

Query Job Host Summaries and create a list of dictionaries with info for each host during Job execution.

https://gist.github.com/481db0eaa9610dbf63ce2d0fa378c225

In conclusion, the 🎩 trick, group by host_name, to get all the information grouped for each host.

https://gist.github.com/ca24f5f057764f955a4261b4be7181d4
Take a moment to understand host_summaries, a data structure  composed by a list of lists where:

list[*].list[0] contains the host_name
list[*].list[1] is a list of dictionaries, job_host_summaries, which contains all the information about all the Job Template executed in the Workflow for that host.

https://gist.github.com/08fbb35380b810db01ee76ace02744ed
Task #2: Generating a report

To generate .csv report we use a Jinja2 template.
https://gist.github.com/878453b907023bd1d5cecf379beb3634
Gluing the blocks

To use this role you need an Ansible Playbook such as the following one:
https://gist.github.com/4cb1dd75687712fc7f0f2c1eff617280

Make sure you run this playbook on localhost (i.e. the Ansible Tower Machine)

Then set up Ansible Tower Platform :

Create a Project that point on your repository
Create a Job Template referring to the workflow_report.yaml playbook
Add a Survey for this Job Template named workflow_id of type Number
Find one existing Workflow Job ID for which you want you get the report

Finally launch the Job Template passing in Survey the Workflow Job id


Final results

Here is an example of the .csv  report generated.


Workflow id
host
job_id
job_name
success
job_id
job_name
success
job_id
job_name
success


46919
node01.local
46921
prepare
True
46923
calibration
True
46925
scan
True


46919
node02.local
46921
prepare
True
46923
calibration
True
46925
scan
True


46919
node03.local
46921
prepare
True
46923
calibration
True
46925
scan
True


46919
node04.local
46921
prepare
True
46923
calibration
True
46925
scan
True


46919
node05.local
46921
prepare
True
46923
calibration
True
46925
scan
True


As you can see, we have obtained a complete status of the Workflow per host at a glance.
Conclusions

The complex scenario described here paves the road to many other topics regarding Ansible Tower Workflow management, which we will describe in future posts. For instance, it is possible to implement a log analysis infrastructure using ARA.
What do you think of this approach? Have you ever tried achieving a similar goal? Your feedback is very welcome!
Workflow id	host	job_id	job_name	success	job_id	job_name	success	job_id	job_name	success
46919	node01.local	46921	prepare	True	46923	calibration	True	46925	scan	True
46919	node02.local	46921	prepare	True	46923	calibration	True	46925	scan	True
46919	node03.local	46921	prepare	True	46923	calibration	True	46925	scan	True
46919	node04.local	46921	prepare	True	46923	calibration	True	46925	scan	True
46919	node05.local	46921	prepare	True	46923	calibration	True	46925	scan	True