RubenVerborgh/Accessing data through APIs.md

## Accessing data through APIs.md

      
    Raw
  

              Accessing data through APIs.md
            
          
    Accessing data through APIs

Update: my blog post The lie of the API details the issues with current APIs.
Background: I'm a researcher in semantic hypermedia, at the moment comparing different APIs for accessing metadata for human and machine consumption.
Story: I am browsing a cultural website and want to retrieve the metadata of the object I'm looking at in a machine-readable format. The steps below are the actual steps that I've undertaken on different sites.
Example: Cooper-Hewitt museum

I'm looking at the object http://collection.cooperhewitt.org/objects/35460799/.

To retrieve this in JSON, I just take copy that URL and do:

$ curl -H "Accept: application/json" http://collection.cooperhewitt.org/objects/35460799/

Example: DBpedia

I'm looking at the person http://dbpedia.org/resource/Arthur_Rimbaud

To retrieve this in JSON, I just take copy that URL and do:

$ curl -L -H "Accept: application/json" http://dbpedia.org/resource/Arthur_Rimbaud


There's even RDF if I need it (same URL):
```
$ curl -L -H "Accept: text/turtle" http://dbpedia.org/resource/Arthur_Rimbaud
```
Example: Europeana

I'm looking at the object http://www.europeana.eu/portal/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html.html?start=1&query=david+ochterlony+hookah&startPage=1&rows=24

To retrieve JSON, I try

$ curl -H "Accept: application/json" http://www.europeana.eu/portal/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html.html


I try to make sense of the following output:

<html><head><title>Apache Tomcat/6.0.24 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 406 - </h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The resource identified by this request is only capable of generating responses with characteristics not acceptable according to the request "accept" headers ().</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.24</h3></body></html>


I search for the documentation.
I end up on this page and click "API documentation".
I end up on the Introduction page,
where I see that I have to register.
On the registration page, I enter my e-mail address.
I receive an e-mail and click the link.
I receive my API key.
I click through to Working with the API and take a mental note about a field named apikey.
I go to Sample code. No, that's not it.
I go to API methods and see that record.json (is it a method or a file) looks like what I need, so I click it.
I am informed that I need to use the URL template http://europeana.eu/api/v2/record/[recordID].json. This URL template has the parameters recordID, callback, profile. I only understand the second one without reading, but I don't need it (not using JSON-P).
Hoping to find the Record ID, I go back to the page I opened in the beginning. I look through the whole page and find nothing called "Record ID", but I find a field "Identifier" with string 019ADDOR0000002U00000000.
I now feel ready to make my first API call and try

$ curl http://europeana.eu/api/v2/record/019ADDOR0000002U00000000.json?apikey=xxxxxxxxx

where xxxxxxxxx is my actual API key, using the apikey field name I found earlier.
15. I try to make sense of the following output:
<html><head><title>Apache Tomcat/6.0.24 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 404 - /api/v2/record/019ADDOR0000002U00000000.json</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>/api/v2/record/019ADDOR0000002U00000000.json</u></p><p><b>description</b> <u>The requested resource (/api/v2/record/019ADDOR0000002U00000000.json) is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.24</h3></body></html>


Thinking I might have not used the API key properly, I go back to Working with the API and now see something about a wskey parameter. So the field is called apikey but the parameter wskey. I assume this is a URL query string parameter.
I try the request again:

$ curl http://europeana.eu/api/v2/record/019ADDOR0000002U00000000.json?wskey=xxxxxxxxx


I visually check whether the error output is the same:

<html><head><title>Apache Tomcat/6.0.24 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 404 - /api/v2/record/019ADDOR0000002U00000000.json</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>/api/v2/record/019ADDOR0000002U00000000.json</u></p><p><b>description</b> <u>The requested resource (/api/v2/record/019ADDOR0000002U00000000.json) is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.24</h3></body></html>


I suspect I might have gotten the identifier wrong. I go back to the original page and start looking into the source code whether I can find an identifier. I only find 019ADDOR0000002U00000000, which I have tried already.
I go back to the Working with the API page and click the link Europeana ID next to the recordID field, where I read the following explanation: _Digital records delivered to Europeana are assigned a unique identifier, Europeana ID, that serves to further identify the records when using the API. Usually, this identifier is based on the original metadata that are provided for the record and internal Europeana identifiers of the provider and the dataset containing the record. For example, a Europeana ID of an object can look as follows: /09102/_GNM_1234 where 091 is the identifier of the provider, 02 is the id of the dataset and GNM_1234 is derived from the unique identifier of the record in the context of the provider.
I inspect the URL to see whether I can find such an identifier:
http://www.europeana.eu/portal/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html.html?start=1&query=david+ochterlony+hookah&startPage=1&rows=24. Indeed, there is a part "92037/", but the thing that follows it does not look like that. I find this strange, but try it anyway:

$ curl http://europeana.eu/api/v2/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html?apikey=xxxxxxxxx


I get the error message

<html><head><title>Apache Tomcat/6.0.24 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 404 - /api/v2/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>/api/v2/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html</u></p><p><b>description</b> <u>The requested resource (/api/v2/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html) is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.24</h3></body></html>


I try to Google for "http://europeana.eu/api/v2/record" to see if anybody else got the API working.
I arrive at the npm package registry and find a JSON fragment that mentions the link http://europeana.eu/api/v2/record/08501/03F4577D418DC84979C4E2EE36F99FECED4C7B11.json?wskey=abc123.
I add my own API key to test whether I can retrieve this random object:

$ curl http://europeana.eu/api/v2/record/08501/03F4577D418DC84979C4E2EE36F99FECED4C7B11.json?wskey=xxxxxxxxx


This works; but it's not the object that I wanted. Now let's try replacing the object identifier by 92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html:

$ curl http://europeana.eu/api/v2/record/92037/_http___www_bl_uk_onlinegallery_onlineex_apac_addorimss_s_019addor0000002u00000000_html.json?wskey=xxxxxxxxx

This works.

I wonder why it didn't work in step 21, only to find out that I had not added the extension .json. I also wonder if there is any other way of getting the object ID instead of copying from the URL.