Skip to content

Instantly share code, notes, and snippets.

@catherinedevlin
Forked from kevinli-work/download_v2.md
Last active September 25, 2017 20:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save catherinedevlin/eed19de0d9193cfff1f17f2f85d73bd9 to your computer and use it in GitHub Desktop.
Save catherinedevlin/eed19de0d9193cfff1f17f2f85d73bd9 to your computer and use it in GitHub Desktop.
Download v2 API

Download v2 API

/v2/download/columns (GET) (NOT YET IMPLEMENTED)

Returns a list of available columns that can be requested in CSV generation for a specific type.

Request

  • type possible values: award or transaction

Response

{
    "columns": [{
            "title": "Assistance Type",
            "value": "assistance_type"
        },
        {
            "title": "Awarding Agency",
            "value": "awarding_agency"
        }
    ]
}

/v2/download/awards (POST)

Request

POST a JSON body:

{
    "filters": {},
    "columns": [
        "award_type",
        "awarding_agency_code"
    ]
}
  • filters is a standard Search v2 JSON filter object
  • columns is an array of column names (using the value string from the /v2/download/columns endpoint)
    • API should generate a CSV with columns in the same order as the array
    • An empty columns array returns all available columns

Response

{
   "status":"ready",
   "total_rows":null,
   "file_name":"5757660_968336105_awards.zip",
   "total_size":null,
   "total_columns":null,
   "message":null,
   "url":"/Volumes/exlinux/Users/catherine/werk/dataact/usaspending-api/downloads/5757660_968336105_awards.zip",
   "seconds_elapsed":null
}
  • total_size is the estimated file size of the CSV in kilobytes, or null if not finished

  • total_columns is the number of columns in the CSV, or null if not finished

  • total_rows is the number of rows in the CSV, or null if not finished

  • file_name is the name of the zipfile containing CSVs that will be generated

    • File name is a timestamp followed by _awards
  • status is a string representing the current state of the CSV generation request. Possible values are:

    • ready - job is ready to be run
    • running - job is currently in progress
    • finished - job is complete
    • failed - job failed to complete

    For this endpoint, status will always be ready, since the response is returned before generation begins

  • url - the URL for the file

  • message - a human readable error message if the status is failed, otherwise null

  • seconds_elapsed is time spent generating the CSVs; always null for this endpoint, since the response is returned before generation begins

/v2/download/transactions (POST)

Request

POST a JSON body:

{
    "filters": {},
    "columns": [
        "award_type",
        "awarding_agency_code"
    ]
}
  • filters is a standard Search v2 JSON filter object
  • columns is an array of column names (using the value string from the /v2/download/columns endpoint)
    • API should generate a CSV with columns in the same order as the array
    • An empty columns array returns all available columns

Response

{  
   "status":"ready",
   "total_rows":null,
   "file_name":"5757388_622958899_transactions.zip",
   "total_size":null,
   "total_columns":null,
   "message":null,
   "url":"/Volumes/exlinux/Users/catherine/werk/dataact/usaspending-api/downloads/5757388_622958899_transactions.zip",
   "seconds_elapsed":null
}
  • total_size is the estimated file size of the CSV in kilobytes, or null if not finished

  • total_columns is the number of columns in the CSV, or null if not finished

  • total_rows is the number of rows in the CSV, or null if not finished

  • file_name is the name of the zipfile containing CSVs that will be generated

    • File name is timestamp plus _transactions
  • status is a string representing the current state of the CSV generation request. Possible values are:

    • ready - job is ready to be run
    • running - job is currently in progress
    • finished - job is complete
    • failed - job failed to complete

    For this endpoint, status will always be ready, since the response is returned before generation begins

  • url - the URL for the file

  • message - a human readable error message if the status is failed, otherwise null

  • seconds_elapsed is time spent generating the CSVs; always null for this endpoint, since the response is returned before generation begins

/v2/download/status (GET)

Returns the current status of a download/CSV generation request.

Request

  • file_name is the file_name returned in the v2/download/[type] response

Response

{  
   "status":"finished",
   "total_rows":3317,
   "file_name":"5757388_622958899_transactions.zip",
   "total_size":3334.475,
   "total_columns":214,
   "message":null,
   "url":"/Volumes/exlinux/Users/catherine/werk/dataact/usaspending-api/downloads/5757388_622958899_transactions.zip",
   "seconds_elapsed":"0.438393"
}
  • total_size is the estimated file size of the CSV in kilobytes, or null if not finished
  • total_columns is the number of columns in the CSV, or null if not finished
  • total_rows is the number of rows in the CSV, or null if not finished
  • file_name is the name of the zipfile containing CSVs that will be generated
    • File name is timestamp followed by _transactions or _awards
  • status is a string representing the current state of the CSV generation request. Possible values are:
    • ready - job is ready to be run
    • running - job is currently in progress
    • finished - job is complete
    • failed - job failed to complete
  • url - the URL for the file
  • message - a human readable error message if the status is failed, otherwise it is null
  • seconds_elapsed is the time taken to genereate the file (if status is finished or failed), or time taken so far (if running)
@ejgullo
Copy link

ejgullo commented Sep 25, 2017

  • is does_not_exist no longer a possible option for the status field?
  • will url be an empty string when the file is not ready?
  • can message be an empty string if there is no error?
  • what is seconds_elapsed? It's not noted in the gist.
  • can we note in the /status endpoint that the type being passed in can only be award or transaction?

@catherinedevlin
Copy link
Author

catherinedevlin commented Sep 25, 2017

is does_not_exist no longer a possible option for the status field?

It isn't, but you bring up a good point - right now, if you ask for the status of a field that doesn't exist, it throws an error. It should throw a 404 with a message. I'll fix that.

will url be an empty string when the file is not ready?

No, the URL is available even before there's anything there. Maybe it shouldn't be?

can message be an empty string if there is no error?

If that's preferable, i can do that.

what is seconds_elapsed? It's not noted in the gist.

That's true - it's the time required to generate the file, or the time spent generating so far when it's still in progress. I'll add that to the gist. Unless you think it just shouldn't be there - I thought it would be useful when troubleshooting the speed.

can we note in the /status endpoint that the type being passed in can only be award or transaction?

You don't actually need to pass anything about the type, you just need the filename - is that OK?

@ejgullo
Copy link

ejgullo commented Sep 25, 2017

If that's preferable, i can do that.

If null is the standard (as it looks from the other response fields), then let's leave the message field null instead of an empty string, for consistency's sake! i got a little nitpicky i think, in the wrong direction!

That's true - it's the time required to generate the file, or the time spent generating so far when it's still in progress. I'll add that to the gist. Unless you think it just shouldn't be there - I thought it would be useful when troubleshooting the speed.

I don't mind leaving seconds_elapsed if it's for troubleshooting purposes (it could always come up!), we just do not use it right now for anything.

You don't actually need to pass anything about the type, you just need the filename - is that OK?

Did not notice the lack of type in the request for /status - my mistake! Totally fine as it is.

@ejgullo
Copy link

ejgullo commented Sep 25, 2017

No, the URL is available even before there's anything there. Maybe it shouldn't be?

We're using the status field on front end to decide whether or not to progress with loading the file for the user, so it may be moot - whichever you think makes the most sense for the situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment