pschmied/dsmpapi.org

## dsmpapi.org

      
    Raw
  

              dsmpapi.org
            
          
    Publishing a dataset using DSMP API from R

These are some notes I made to reason through / illustrate usage of
  the new DSMP API. Usage here is presented as a linear set of steps for
  pedagogical purposes rather than attempting to create a reusable set
  of functions or a library.
Creating a new R client library based around the DSMP API is a task
  for the hopefully-not-too-distant future.
R libraries and setup

library(httr)          # For HTTP requests
library(readr)         # For serializing to CSV
library(snupassr)      # Securely gets paswords from lastpass
                       # devtools::install_github("pschmied/snupassr")
library(assertthat)    # For testing success

## Where we publish to
domain <- "https://peters.test-socrata.com"

## DSMP (publishing API) base path
dsmp_base <- "/api/publishing/v1"

## Get auth info from lastpass via snupassr.
username <- snupassr::get_user(site_entry = "test-socrata.com")
password <- snupassr::get_pass(site_entry = "test-socrata.com")

## Generate a unique filename
filepath <- tempfile(fileext = ".csv")

## Write example iris dataset to our tempfile
write_csv(tail(iris), filepath)
Publishing a new dataset

Step one - create the view

This involves creating a POST request to the /api/views.json
  endpoint in order to create a dataset view and to get the
  corresponding 4x4 id of the new view. This effectively initializes an
  empty dataset on Socrata.
The /api/views.json endpoint needs a name at minimum, but a
  description is also nice.
resp_view <- POST(
    modify_url(domain, scheme = "https", path = "/api/views.json"),
    authenticate(username, password),
    body = list(
        name = "Iris dataset",
        description = "Example Iris dataset included with R"
    ),
    encode = "json",
    verbose()
)

assert_that(round(status_code(resp_view), -2) == 200)
Step two - create a replace revision

A revision is a change to a dataset. At this point we have an empty,
  unpopulated dataset. We must create a revision to signal that we
  intend to modify our empty dataset into a dataset replete with sweet,
  juicy data.
The /api/publishing/v1/revisions/{fourfour} endpoint takes a JSON
  payload, where action_type can be one of: update, replace, or
  metadata.
{'action':
 {'type': action_type}
} 
fourfour <- content(resp_view)$id

resp_new_rev <- POST(
    modify_url(domain, scheme = "https",
               path = c(dsmp_base,
                        "revision",
                        fourfour)),
    authenticate(username, password),
    body = list(
        action = list(type = "replace")
    ),
    encode = "json",
    verbose()
)

assert_that(round(status_code(resp_new_rev), -2) == 200)
Step three - create an upload

Uploading is a two-step process. We must first create an upload, which
  signals to the publishing API that we intend to upload a file. Here we
  POST to the /api/publishing/v1/upload endpoint. We supply a JSON
  payload: {filename = "filename.ext"}
upload_url_path <- content(resp_new_rev)$links$create_upload
revision_seq <- content(resp_new_rev)$resource$revision_seq

resp_upload <- POST(
    modify_url(domain, scheme = "https",
               path = upload_url_path),
    authenticate(username, password),
    body = list(
        filename = basename(filepath),
        fourfour = fourfour,            # won't ultimately be needed
        revision_seq = revision_seq     # ditto
    ),
    encode = "json",
    verbose()
)

assert_that(round(status_code(resp_upload), -2) == 200)
Step four - upload content into the upload

The upload created in step three is like an empty container. The
  previous step created the container but did not upload the content of
  the file. Here we upload that content to the
  /api/publishing/v1/upload/{uploadid} endpoint. The full api path
  with uploadid is returned in the response to our previous POST in
  step 3.
upload_url_path <- content(resp_upload)$links$bytes

resp_upload_content <- POST(
    modify_url(domain, path = upload_url_path),
    authenticate(username, password),
    body = upload_file(filepath, "text/csv"),
    verbose()
)

assert_that(round(status_code(resp_upload_content), -2) == 200)
Step five - apply the revision

Lastly, we must apply the revision. This requires a POST to
  /api/publishing/v1/revision/:fourfour/:revision_seq/apply. The JSON
  payload looks like {'output_schema_id': number}. The call made in
  step four returned a list of output schemas including their ids. Since
  R users are unlikely to be doing data transformations via the web API
  (likely munging data upstream in R), we can safely use the id of the
  first and only output schema.
apply_schema_id <- content(resp_upload_content)$resource$output_schemas[[1]]$id
revision_apply_path <- content(resp_new_rev)$links$apply

resp_apply_rev <- PUT(
    modify_url(domain, scheme = "https", path = revision_apply_path),
    authenticate(username, password),
    body = list(
        output_schema_id = apply_schema_id
    ),
    encode = "json",
    verbose()
)

assert_that(round(status_code(resp_apply_rev), -2) == 200)
API Docs

http://docs.socratapublishing.apiary.io
Existing implementations


  Python
https://github.com/socrata/publish-py