judell/save-query-results-for-analysis.md

## save-query-results-for-analysis.md

      
    Raw
  

              save-query-results-for-analysis.md
            
          
    How to save Steampipe outputs for analysis

Steampipe queries persist in cache for a default 5 minutes. You can increase the TTL to keep results in cache longer, but sometimes you'll want to save query results for later analysis.
Here are some common scenarios.
I want to analyze the results of a CIS 1.5 control or benchmark

You can use steampipe check with the --export flag to capture a CSV representation of the data that powers an individual control, or a suite of controls packaged as a benchmark.
Suppose you want to capture the data for an individual control, for example Control: 2.1.2 Ensure S3 Bucket Policy is set to deny HTTP requests. Here's the command to run that control, from a clone of AWS Compliance in ~/steampipe-mod-aws-compliance.
steampipe check control.cis_v150_2_1_2

How can you find the name control.cis_v150_2_1_2 in order to run this command? If you run the complete CIS v1.5.0 benchmark in a dashboard, you can find its human-readable label: 2.1.2 Ensure S3 Bucket Policy is set to deny HTTP requests. Copy the label as shown here.

Then paste it into the search box at hub.steampipe.io to find the landing page for the control. There, along with helpful instructions for remediating an alarm, you'll find a usage note that gives you the needed steampipe check command.

Given the base command, you can use --output csv and redirect to a file.
steampipe check control.cis_v150_2_1_2 --output csv > ~/csv/cis_1_5_s3_bucket_enforces_ssl.csv

Now, using the CSV plugin, and assuming its configured paths include ~/csv, you can query that data.
select * from csv.s3_bucket_enforces_ssl where status = 'alarm'

To capture the data for a complete benchmark:
steampipe check benchmark.cis_v150_2_1 --output csv > ~/csv/cis_1_5_storage.csv

To query that data:
select * from csv.cis_1_5_storage where status = 'alarm'

To capture the entire suite of benchmarks:
steampipe check all --output csv > ~/csv/cis_1_5_all.csv

To query that data:
select * from csv.cis_1_5_all where status = 'alarm'

Alternatively you can run the benchmark with steampipe dashboard, use the download button to capture CSV output, move it to your CSV directory, and run the same query.
I want to analyze the results of a named query

Controls and benchmarks often use named queries.
To run a named query using steampipe query, you'll again need to find the name. When you're operating in a mod workspace, e.g. ~/steampipe-mod-aws-insights, you can list named queries like so.
steampipe query list

Let's look for the query behind control.cis_v150_2_1_2
select resource_name, sql from steampipe_query where resource_name ~ 's3_bucket'

Scanning the results, we find:
s3_bucket_enforces_ssl

This is an alternate, console-based way to find the named query reported on the above hub page.
To export the data:
steampipe query query.s3_bucket_enforces_ssl --export csv > ~/csv/s3_bucket_enforces_ssl.csv

The named query s3_bucket_enforces_ssl doesn't require a parameter, but many do. For example s3_bucket_public_access is parameterized by arn. If the bucket in question is arn:aws:s3:::my-bucket-27 then you pass the arn as a parameter to the query like so:
steampipe query 'query.s3_bucket_public_access("arn:aws:s3:::my-bucket-27")`

You can then use --output csv and redirect to a file as above.
I want to analyze the raw results of a query.

There are several ways to do it.
Export to CSV

steampipe query "select now(), instance_id, instance_state, account_id, region from aws_ec2_instance" --output csv > ~/csv/ec2_instance_states_2023_04_27.csv

You can then query like so:
select * from csv.ec2_instance_states_2023_04_27 where instance_state = 'running'

Create a table in the public schema

steampipe query
> create table ec2_instance_states_2023_04_27 as 
    select 
      now(), 
      instance_id, 
      instance_state, 
      account_id, 
      region
    from
      aws_ec2_instance

Now you can query like so:
> select 
    * 
  from 
    ec2_instance_states_2023_04_27
  where 
    instance_state = 'running'

Create a materialized view in the public schema.

> create materialized view ec2_instance_states as 
    select 
      now(), 
      instance_id, 
      instance_state, 
      account_id, 
      region
    from
      aws_ec2_instance

What's the difference? Use create table to save results into a table that won't change. In this example, if you're saving tables like ec2_instance_states_2023_04_27 on a daily basis, you'll never need to drop and recreate that per-day table with fresh data.
If you just want a single view of instance states that you'll update on a daily basis to record the current state of affairs, it makes sense to create a materialized view. That way, instead of dropping and recreating the table every day, you can create it once and then schedule the query steampipe query "refresh materialized view ec2_instance_states" to update the view daily.