infinitewarp/data-export-request-howto.md

## data-export-request-howto.md

      
    Raw
  

              data-export-request-howto.md
            
          
    Data Export Request API

What is this?

The "data export request API" enables a Koku customer to request an export of some of the data in Koku's database that originated from sources/providers managed by that customer. The export generally includes results from Koku's "daily" and "daily summary" database tables and are generally bound by specific requested dates. Exported files are separated by Koku account, source/provider, date, and table name. These files will be written by Koku to an AWS S3 bucket that the customer specified in the request.
Upon initial request, Koku gives the customer a synchronous response to indicate that it has started the process, but the remainder of the operations are asynchronous. The customer must check back at the API if she wants to see the current state of her data export request.
Where does it live?

/api/cost-management/v1/dataexportrequests/

HTTP GET that URL to retrieve a list of data export requests that were created by the authenticated user.
HTTP POST that URL to create a new request.

/api/cost-management/v1/dataexportrequests/{uuid}/

HTTP GET that URL to retrieve the requested data export request, assuming it is reachable by the authenticated user.

How do I create a data export request?

The customer's HTTP POST must include the following required parameters:

start_date: ISO-8601 formatted start date (inclusive)
end_date: ISO-8601 formatted end date (exclusive)
bucket_name: destination S3 bucket name

For example:
http localhost:8000/api/cost-management/v1/dataexportrequests/ \
    HTTP_X_RH_IDENTITY:"${IDENTITY}" \
    start_date='2019-09-01' \
    end_date='2019-10-01' \
    bucket_name='koku-customer-sample-bucket'
Start date is inclusive and end date is exclusive. This pattern was chosen because we expect a typical use case is for the customer to request periods like "one month", and this pattern means she doesn't have to figure out the end date of the requested month. The arguments 2019-09-01 and 2019-10-01 for start and end respectively mean the customer is effectively requesting the month of 2019-09.
The bucket name indicates a bucket that the customer (presumably) has configured to allow for Koku to list/write files. (More on this later.)
What does it do behind the scenes?

Upon initial request, Koku checks for any in-progress requests with the same date parameters as that request. If Koku is already working on an export (status is pending or processing) with the same dates, the new request is rejected with a 400 error and a message indicating as such ('A pending or processing data export already exists with the given "start_date" and "end_date".'). If either an export with the same dates does not exist or does exist with a terminal state, Koku allows the request to be created.
Koku has a periodic asynchronous job run to export data to a Red Hat-managed AWS S3 bucket every day irrespective of data export requests. When a new data export request processes, Koku simply copies the archived files from the Red Hat-managed AWS S3 bucket into the customer's specified AWS S3 bucket.
Why am I specifying an AWS S3 bucket?

Alternatively: if our customers have already given us access to their AWS account, why isn't Koku simply using that access?
The simple answer is that this API is provider-agnostic in how it writes these exported files. This distinction is important because some customers may not have an AWS source/provider configured (perhaps Azure only?), but we still need some way to send these files.
For now, this means the customer must set up an AWS S3 bucket to receive these files, but (this is hopeful speculation) this API may evolve to support sending to Azure storage or other file bucket systems.
How to I set up my AWS S3 bucket to receive files from Koku?


Create an AWS S3 bucket or choose one you wish to use for receiving data exports from Koku.
Edit this bucket's Permissions/Access Control List.
Add "Access for other AWS accounts" to allow Koku's canonical ID for "list objects" and "write objects"

Koku's AWS account canonical ID is 0674b6ccec2e5d6212b35910e6c5fac24e94cee66371847b46c981edfa3df63c.


Where does Koku write files into my AWS S3 bucket?

Files are written with hierarchical paths that include the Koku account ID, source/provider UUID, and date. An example file path looks like:
/data_archive/acct1460290/aws/4cef9a22-4302-4776-a4b7-fac050da23a0/2019/11/24/reporting_awscostentrylineitem_daily_summary.csv.gz