Skip to content

Instantly share code, notes, and snippets.

@pschmied
Last active June 21, 2017 20:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pschmied/d506ac45085f7573dc16468bde9ceabf to your computer and use it in GitHub Desktop.
Save pschmied/d506ac45085f7573dc16468bde9ceabf to your computer and use it in GitHub Desktop.
Using DSMP API from R

Publishing a dataset using DSMP API from R

These are some notes I made to reason through / illustrate usage of the new DSMP API. Usage here is presented as a linear set of steps for pedagogical purposes rather than attempting to create a reusable set of functions or a library.

Creating a new R client library based around the DSMP API is a task for the hopefully-not-too-distant future.

R libraries and setup

library(httr)          # For HTTP requests
library(readr)         # For serializing to CSV
library(snupassr)      # Securely gets paswords from lastpass
                       # devtools::install_github("pschmied/snupassr")
library(assertthat)    # For testing success

## Where we publish to
domain <- "https://peters.test-socrata.com"

## DSMP (publishing API) base path
dsmp_base <- "/api/publishing/v1"

## Get auth info from lastpass via snupassr.
username <- snupassr::get_user(site_entry = "test-socrata.com")
password <- snupassr::get_pass(site_entry = "test-socrata.com")

## Generate a unique filename
filepath <- tempfile(fileext = ".csv")

## Write example iris dataset to our tempfile
write_csv(tail(iris), filepath)

Publishing a new dataset

Step one - create the view

This involves creating a POST request to the /api/views.json endpoint in order to create a dataset view and to get the corresponding 4x4 id of the new view. This effectively initializes an empty dataset on Socrata.

The /api/views.json endpoint needs a name at minimum, but a description is also nice.

resp_view <- POST(
    modify_url(domain, scheme = "https", path = "/api/views.json"),
    authenticate(username, password),
    body = list(
        name = "Iris dataset",
        description = "Example Iris dataset included with R"
    ),
    encode = "json",
    verbose()
)

assert_that(round(status_code(resp_view), -2) == 200)

Step two - create a replace revision

A revision is a change to a dataset. At this point we have an empty, unpopulated dataset. We must create a revision to signal that we intend to modify our empty dataset into a dataset replete with sweet, juicy data.

The /api/publishing/v1/revisions/{fourfour} endpoint takes a JSON payload, where action_type can be one of: update, replace, or metadata.

{'action':
 {'type': action_type}
} 
fourfour <- content(resp_view)$id

resp_new_rev <- POST(
    modify_url(domain, scheme = "https",
               path = c(dsmp_base,
                        "revision",
                        fourfour)),
    authenticate(username, password),
    body = list(
        action = list(type = "replace")
    ),
    encode = "json",
    verbose()
)

assert_that(round(status_code(resp_new_rev), -2) == 200)

Step three - create an upload

Uploading is a two-step process. We must first create an upload, which signals to the publishing API that we intend to upload a file. Here we POST to the /api/publishing/v1/upload endpoint. We supply a JSON payload: {filename = "filename.ext"}

upload_url_path <- content(resp_new_rev)$links$create_upload
revision_seq <- content(resp_new_rev)$resource$revision_seq

resp_upload <- POST(
    modify_url(domain, scheme = "https",
               path = upload_url_path),
    authenticate(username, password),
    body = list(
        filename = basename(filepath),
        fourfour = fourfour,            # won't ultimately be needed
        revision_seq = revision_seq     # ditto
    ),
    encode = "json",
    verbose()
)

assert_that(round(status_code(resp_upload), -2) == 200)

Step four - upload content into the upload

The upload created in step three is like an empty container. The previous step created the container but did not upload the content of the file. Here we upload that content to the /api/publishing/v1/upload/{uploadid} endpoint. The full api path with uploadid is returned in the response to our previous POST in step 3.

upload_url_path <- content(resp_upload)$links$bytes

resp_upload_content <- POST(
    modify_url(domain, path = upload_url_path),
    authenticate(username, password),
    body = upload_file(filepath, "text/csv"),
    verbose()
)

assert_that(round(status_code(resp_upload_content), -2) == 200)

Step five - apply the revision

Lastly, we must apply the revision. This requires a POST to /api/publishing/v1/revision/:fourfour/:revision_seq/apply. The JSON payload looks like {'output_schema_id': number}. The call made in step four returned a list of output schemas including their ids. Since R users are unlikely to be doing data transformations via the web API (likely munging data upstream in R), we can safely use the id of the first and only output schema.

apply_schema_id <- content(resp_upload_content)$resource$output_schemas[[1]]$id
revision_apply_path <- content(resp_new_rev)$links$apply

resp_apply_rev <- PUT(
    modify_url(domain, scheme = "https", path = revision_apply_path),
    authenticate(username, password),
    body = list(
        output_schema_id = apply_schema_id
    ),
    encode = "json",
    verbose()
)

assert_that(round(status_code(resp_apply_rev), -2) == 200)

API Docs

http://docs.socratapublishing.apiary.io

Existing implementations

Python
https://github.com/socrata/publish-py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment