EmmanuelOga/index.md

## index.md

      
    Raw
  

              index.md
            
          
    Simpler HTTP APIs

A quick note on writing HTTP APIs in an RPC style ... also sharing my enthusiasm for RDF and RDF schemas all around :-)
Typical HTTP API design

Someone put together a crazy big decision diagram explaining which status code to return under which circumstance, etc.
In practice, many HTTP service APIs work more like a Remote Procedure Call and less like a fully conforming "Hypermedia Service". Headers are usually considered "low level", used for things like caching, ETags, cookies, CORS, etc.
The focus when designing APIs is generally in picking a combination of request method, url path and parameters, and specifying the shape of the request and response bodies (usually with JSON).
A popular convention dictates an structure like this:
GET    /resource/id : Show a resource
POST   /resource/id : Create a resource, payload in the body of the request.
DELETE /resource/id : Delete a resource
...

Conventions do not cover all possible cases and invite long winded discussions about the "correct" way of implement things like searching or deleting many things at once.
All in all API design is difficult and can become a huge time sink.
API Specs

Tools like swagger provide some conventions to help enumerate all request/response combinations.
Swagger itself deals with most of the things that are not in JSON format, for which it depends on an additional schema: JSON Schema.
Swagger is a great tool and comes with many extra goodies like code and documentation generators. But it doesn't necessarily reduce the mental overhead of mapping service operations to components of the HTTP protocol. Also a tool like swagger introduces its own share of complexity.
A simple pattern

What if our APIs were smaller and more regular?
[POST /API{/ignored-postfix} | GET /API{/ignored-postfix}](request body) => 200 response body

POST /API{/ignored-postfix} handles non cacheable requests.
GET /API{/ignored-postfix} handles cacheable requests.
Both input and output come from req/resp bodies and should use the same data format.
The input should include a name of a "procedure" and its parameters. JSON example:

{"first" : {"proc": "foo", "params" : [1,2,3] }}, "second" : {"proc": "bar", "params" : [4,5,6]}}


The ignored postfix is there for improving logging and for caching.

Could be as simple as GET /API/digest-of-body
Or have more info to for log readability: GET /API/first-1-2-3/second-4-5-6/digest-of-body


HTTP lawyers would like us to ignore GET request body but since the semantics are not defined we can use it to send GET request parameters!
Since this API format can handle multiple requests at once, it makes sense to rely only on response body and always return status 200. The server can still return 500s for a lower level errors, etc. Other paths on the server can stick to more usual HTTP semantics.
Advantages

Would be easy to dismiss this pattern as too opaque or simplistic. But it does have a bunch of nice properties:

Multiple requests can be performed on a single HTTP call!

Things like "deleting thing 1, thing 2 and thing 3" becomes trivial and requires no additional planning (see example below).


A single schema language can be used to define:

Input: the target procedures with their parameters.
Output: the procedures result.
No need to parse any other thing! (paths nor parameters)


Very convenient for exploratory programming!

Reduces the mental overhead of trying to find "the one correct HTTP API".
Migrating to a more traditional API structure is made easier after a big part of the API surface has been defined this way.


Note that an app can still handle "pretty URLs" in any way required. Using this pattern perhaps even clears the way to complete freedom in picking pretty URLs! But the pattern is mostly concerned with specifying RPC-like APIs.
Also, this pattern stays in the land of simple HTTP, as opposed to more radical solutions like gRPC.
Example: JSON Schema all the things!

Requesting 4 things in a single request:
POST("/API", { "req0" : { "name": "delete" , "params" : { "id" : "abc" } },
               "req1" : { "name": "delete" , "params" : { "id" : "xyz" } },
               "req2" : { "name": "foobar" , "params" : [42] },
               "req3" : { "name": "sum"    , "params" : { "a": 1, "b" : 2} } })
Response, 4 results in one:
{ "req0" : { "name": "delete" , "params" : { "id" : "abc" }   , "result" : "deleted" },
  "req1" : { "name": "delete" , "params" : { "id" : "xyz" }   , "result" : "unauthorized" },
  "req3" : { "name": "foobar" , "params" : [42]           },  , "result" : { "error" : "Unhandled exception." },
  "req4" : { "name": "sum"    , "params" : { "a": 1, "b" : 2} , "result" : 3 } }
Schemas are not shown, but the whole API could be specified using a single schema language like JsonSchema.
This design includes the input body on the output body. I think this is great for logging and debuggability, but not strictly necessary.
Example: RDF all the things!

I actually implemented this pattern when working with RDF. I was experimenting with an app for generating schema.org data, using SHACL for schemas (a SHACL schema for schema.org entities was readily available).
I only had to think of procedure names. This was a nice relief over the usual burden of defining the "one correct restful API".
It was interesting to have the whole application domain, including "business model" and API components, specified in the same schema language!
Here's a schema ... for schemas!
prefix sh:  <http://www.w3.org/ns/shacl#>
prefix api: <http://emmanueloga.com/myApi#>

api:ApiRequestSpec
  a sh:NodeShape ; # ApiRequestSpec is a SHACL schema for API specs.

  # API requests should be triples instances of this class
  sh:targetClass ex:ApiRequest ;

  # Input, Parameters and Output point to shapes that specify each of those.
  sh:property [ sh:path api:description ; sh:class xsd:string ; ] ;
  sh:property [ sh:path api:params      ; sh:class sh:NodeShape ; ] ;
  sh:property [ sh:path api:output      ; sh:class sh:NodeShape ; ] .
This uses SHACL's Turtle syntax ... a compact syntax is available that could be easier to read.
A request schema could look something like this:
prefix sh:  <http://www.w3.org/ns/shacl#>
prefix api: <http://emmanueloga.com/myApi#>

# This is the identifier of the request ("procedure name").
api:DB_BACKUP_SAVE a api:ApiRequestSpec ;
  api:description """Instructs the server to perform a backup
  of the data to a file with a given name.""" ;

  api:params api:NAME_PARAMETER ; # Takes a name parameter input.
  api:output api:FILES_OUTPUT ;   # Returns this output.

api:NAME_PARAMETER a sh:NodeShape .
  # ... define the shape for name parameters ...

api:FILES_OUTPUT a sh:NodeShape .
  # ... define the shape for files output ...
There are implementations of a pattern like this in GraphQL or Hydra, both of these a lot more complicated than what is proposed here. GraphQL allows for cherry picking the shape of the result in each request. The same could be accomplished by way of a RPC method accepting a SPARQL query or something similar. But I digress ... this note is about simplifying the APIs :-)
Sample code for working with RDF

Since RDF is such a great data model and goes well with the HTTP API pattern I describe, I want to show how simple working with RDF can be.
Server

Below is some example code in Kotlin. It uses RDF4J as embedded triplestore and Jooby for HTTP serving.
I could use a tidier dispatch mechanism instead of a giant switch, but you get the idea.
post("/API*") {
    val reqModel = Rio.parse(request.body(), RDFFormat.TURTLE)

    var result = ModelBuilder()

    for (st in reqModel.filter(null, API.OPERATION, null)) {
        when (st.`object`) {
            OPERATION.KILL_SERVER -> {
                result.subject(st.subject).add(OPERATION.OP_RESULT, "Server Stopped.")
                stopServerAfterServingRequest();
            }

            OPERATION.DELETE_ALL_TRIPLES -> {
                repo.connection.use { conn -> conn.deleteEverything() }
                result.subject(st.subject).add(OPERATION.OP_RESULT, "All data was erased.")
            }

            OPERATION.OP_DB_BACKUP_LOAD,
            OPERATION.OP_DB_BACKUP_SAVE -> {
                val name = reqModel.firstObjStr(st.subject, OPERATION.NAME)
                if (name.matches(BACKUP_NAME_PATTERN)) {
                    val file = config.openFile("backup/$name.trigstar")
                    if (st.`object` == OPERATION.OP_DB_BACKUP_LOAD) {
                        val outcome = if (file.isFile && backupLoad(repo, config, file)) {
                            "Backup '$file' loaded."
                        } else {
                            "Backup '$file' error."
                        }
                        result.subject(st.subject).add(OPERATION.OP_RESULT, outcome)
                    } else {
                        backupSave(repo, file)
                        result.subject(st.subject).add(OPERATION.OP_RESULT, "Backup '$file' created.")
                    }
                } else {
                    result.subject(st.subject).add(OPERATION.OP_RESULT, "Invalid backup name '$name'")
                }
            }
    }

    // Add the results to existing request model.
    result.build().forEach { st -> reqModel.add(st) }

    // Send same data as input request, PLUS the result of any handled operations.
    ctx.send(reqModel.toTurtle())
}
Sample Client Code

For completeness, here's what the client code looked like: working with RDF is not hard!.
This one was a REPL app using python with rdflib.py and prompt-toolkit.
Being able to perform SPARQL queries (Datalog!) over the response on the client side was a nice plus (as in the get_response_files method below).
from rdflib import *
from client import post

# All known operations.
OPERATION = ClosedNamespace(
    uri=URIRef("https://emmanueloga.com/my-api#"),
    terms=[
        "name",
        "op",
        "opDbBackupLoad",
        "opDbBackupSave",
    ]
)

# SPARQL query to extract some things to print in a nice format to console.
def get_response_files(model):
    """Return tuples (file, valid) from a graph response."""
    q = prepareQuery(
        initNs={"rf": OPERATION},
        queryString="""
        SELECT ?file ?valid
        WHERE { ?result rf:file ?file ; rf:valid ?valid . }
        """)
    return [(path_from_uri(row[0].value), row[1].value) for row in model.query(q)]

def get_turtle(g):
    """Serialize graph to turtle."""
    return g.serialize(format="turtle").decode("utf-8")

def new_graph():
    """Return a new graph with some predefined namespaced already bind."""
    g = Graph()
    g.bind("rf", "https://eoga.dev/api#")
    return

# Create a RDF graph, serialize it to turtle and POST it, printing the response.
def perform_backup(kind, name):
    req = URIRef("tetra:0")
    g = new_graph()
    g.add((req, OPERATION.name, Literal(name)))
    if kind == "save":
        g.add((req, OPERATION.op, OPERATION.opDbBackupSave))
    else:
        g.add((req, OPERATION.op, OPERATION.opDbBackupLoad))
    turtle = get_turtle(g)
    print_model(post(turtle))
    print("Files produced: ")
    print(get_response_files(turtle))

class AppClient:
  def handle_backup(self):
    name = prompt("Backup name: ")
    if is_backup_name(name):
      print(f"Performing backup with name {name}")
      perform_backup("save", name)
    else:
      print(f"Invalid backup name: '{name}'")

AppClient().handle_backup()