These are some notes I made to reason through / illustrate usage of the new DSMP API. Usage here is presented as a linear set of steps for pedagogical purposes rather than attempting to create a reusable set of functions or a library.
Creating a new R client library based around the DSMP API is a task for the hopefully-not-too-distant future.
library(httr) # For HTTP requests
library(readr) # For serializing to CSV
library(snupassr) # Securely gets paswords from lastpass
# devtools::install_github("pschmied/snupassr")
library(assertthat) # For testing success
## Where we publish to
domain <- "https://peters.test-socrata.com"
## DSMP (publishing API) base path
dsmp_base <- "/api/publishing/v1"
## Get auth info from lastpass via snupassr.
username <- snupassr::get_user(site_entry = "test-socrata.com")
password <- snupassr::get_pass(site_entry = "test-socrata.com")
## Generate a unique filename
filepath <- tempfile(fileext = ".csv")
## Write example iris dataset to our tempfile
write_csv(tail(iris), filepath)
This involves creating a POST request to the /api/views.json
endpoint in order to create a dataset view and to get the
corresponding 4x4 id of the new view. This effectively initializes an
empty dataset on Socrata.
The /api/views.json
endpoint needs a name at minimum, but a
description is also nice.
resp_view <- POST(
modify_url(domain, scheme = "https", path = "/api/views.json"),
authenticate(username, password),
body = list(
name = "Iris dataset",
description = "Example Iris dataset included with R"
),
encode = "json",
verbose()
)
assert_that(round(status_code(resp_view), -2) == 200)
A revision is a change to a dataset. At this point we have an empty, unpopulated dataset. We must create a revision to signal that we intend to modify our empty dataset into a dataset replete with sweet, juicy data.
The /api/publishing/v1/revisions/{fourfour}
endpoint takes a JSON
payload, where action_type
can be one of: update
, replace
, or
metadata
.
{'action':
{'type': action_type}
}
fourfour <- content(resp_view)$id
resp_new_rev <- POST(
modify_url(domain, scheme = "https",
path = c(dsmp_base,
"revision",
fourfour)),
authenticate(username, password),
body = list(
action = list(type = "replace")
),
encode = "json",
verbose()
)
assert_that(round(status_code(resp_new_rev), -2) == 200)
Uploading is a two-step process. We must first create an upload, which
signals to the publishing API that we intend to upload a file. Here we
POST to the /api/publishing/v1/upload
endpoint. We supply a JSON
payload: {filename = "filename.ext"}
upload_url_path <- content(resp_new_rev)$links$create_upload
revision_seq <- content(resp_new_rev)$resource$revision_seq
resp_upload <- POST(
modify_url(domain, scheme = "https",
path = upload_url_path),
authenticate(username, password),
body = list(
filename = basename(filepath),
fourfour = fourfour, # won't ultimately be needed
revision_seq = revision_seq # ditto
),
encode = "json",
verbose()
)
assert_that(round(status_code(resp_upload), -2) == 200)
The upload created in step three is like an empty container. The
previous step created the container but did not upload the content of
the file. Here we upload that content to the
/api/publishing/v1/upload/{uploadid}
endpoint. The full api path
with uploadid is returned in the response to our previous POST in
step 3.
upload_url_path <- content(resp_upload)$links$bytes
resp_upload_content <- POST(
modify_url(domain, path = upload_url_path),
authenticate(username, password),
body = upload_file(filepath, "text/csv"),
verbose()
)
assert_that(round(status_code(resp_upload_content), -2) == 200)
Lastly, we must apply the revision. This requires a POST to
/api/publishing/v1/revision/:fourfour/:revision_seq/apply
. The JSON
payload looks like {'output_schema_id': number}
. The call made in
step four returned a list of output schemas including their ids. Since
R users are unlikely to be doing data transformations via the web API
(likely munging data upstream in R), we can safely use the id of the
first and only output schema.
apply_schema_id <- content(resp_upload_content)$resource$output_schemas[[1]]$id
revision_apply_path <- content(resp_new_rev)$links$apply
resp_apply_rev <- PUT(
modify_url(domain, scheme = "https", path = revision_apply_path),
authenticate(username, password),
body = list(
output_schema_id = apply_schema_id
),
encode = "json",
verbose()
)
assert_that(round(status_code(resp_apply_rev), -2) == 200)