pedro/notes-apicraft-batch-apis.md

## notes-apicraft-batch-apis.md

      
    Raw
  

              notes-apicraft-batch-apis.md
            
          
    Batch APIs


Common requirements

Are they atomic?
What kind of operations are supported?


Existing approaches

Google storage

Multipart/mixed content-type
POST /batch


Facebook

JSON content-type: batch=[ {"method":"GET", "relative_url": "me" }, ...]


Neo4j
Salesforce
Route53

Batch changes: no update (delete+add again)


JSON patch?


Interesting design questions

Scope: where can a batch be done atomically?
One error per operation


Motivation for batching

Transactional property

Are they really necessary? Sometimes people think they need transactions but there are difference strategies
Applying CAP, going for consistency might cost availability

But there's control plane and service availability. Important to separate them.


A lot of data being transformed
Some resources need to be updated together, by design
Slow updates/async updates

POST creates task, GET gives you a state with a hypermedia link so clients can fetch results once it's done
Link is only available if the task/job worked
Has anyone used web hooks/queueing to handle updates?


Performance

Much harder to cache


Sometimes batches really are expressing workflow

What is going to happen after this happens?
Server-side process to do what clients would otherwise do themselves


Techniques / real world usage

When transactions are not available, update tree-like structures starting by the child

You can have a process reaping out unassociated objects


Batch on single resources: provide multiple keys eg GET /foo?id=1&id=2&id=3

Also support updates, although not atomically (it's exactly the same as calling several updates)
In practice more used for GETs


Batching requests vs batching operations (GET foo/?id&=1id=2 vs POST /batch asking for two gets)

Batching requests requests doesn't seem much better than just making multiple requests with keepalive+pipelines


User agents constraint

Consider SPDY/keepalive/pipeline support
Javascript doesn't support these atm for instance


What does SPDY give beyond regular keepalive+pipeline?

Header compression
Send associated resources (eg send javascript together with index.html, don't wait for subsequent request!)
Still basically just used internally


Creating resources with associated objects

Need to create the parent/get an id before you an create a child
Should this be represented by a single JSON with all the associated objects? Or force clients to do several requests?

Usually you should avoid having a 1:1 relationship between database and exposed API resources
Also consider not exposing every single DB id!


There's definitely a lot of import/export hacks in different APIs

Is this a hack, or missing underlying support?
Is this a higher level function? For the business or the system?


How to document?

Not RESTful
Sometimes this can be represented as basically a search by primary key
Minimize the diff from REST

Return meta info for each requested resource (eg individual status codes)
Batch only against the same resource