Skip to content

Instantly share code, notes, and snippets.

@pedro
Last active December 20, 2015 11:39
Show Gist options
  • Save pedro/6124721 to your computer and use it in GitHub Desktop.
Save pedro/6124721 to your computer and use it in GitHub Desktop.

Batch APIs

  • Common requirements

    • Are they atomic?
    • What kind of operations are supported?
  • Existing approaches

    • Google storage
      • Multipart/mixed content-type
      • POST /batch
    • Facebook
      • JSON content-type: batch=[ {"method":"GET", "relative_url": "me" }, ...]
    • Neo4j
    • Salesforce
    • Route53
      • Batch changes: no update (delete+add again)
    • JSON patch?
  • Interesting design questions

    • Scope: where can a batch be done atomically?
    • One error per operation
  • Motivation for batching

    • Transactional property
      • Are they really necessary? Sometimes people think they need transactions but there are difference strategies
      • Applying CAP, going for consistency might cost availability
        • But there's control plane and service availability. Important to separate them.
    • A lot of data being transformed
    • Some resources need to be updated together, by design
    • Slow updates/async updates
      • POST creates task, GET gives you a state with a hypermedia link so clients can fetch results once it's done
      • Link is only available if the task/job worked
      • Has anyone used web hooks/queueing to handle updates?
    • Performance
      • Much harder to cache
    • Sometimes batches really are expressing workflow
      • What is going to happen after this happens?
      • Server-side process to do what clients would otherwise do themselves
  • Techniques / real world usage

    • When transactions are not available, update tree-like structures starting by the child
      • You can have a process reaping out unassociated objects
    • Batch on single resources: provide multiple keys eg GET /foo?id=1&id=2&id=3
      • Also support updates, although not atomically (it's exactly the same as calling several updates)
      • In practice more used for GETs
    • Batching requests vs batching operations (GET foo/?id&=1id=2 vs POST /batch asking for two gets)
      • Batching requests requests doesn't seem much better than just making multiple requests with keepalive+pipelines
  • User agents constraint

    • Consider SPDY/keepalive/pipeline support
    • Javascript doesn't support these atm for instance
  • What does SPDY give beyond regular keepalive+pipeline?

    • Header compression
    • Send associated resources (eg send javascript together with index.html, don't wait for subsequent request!)
    • Still basically just used internally
  • Creating resources with associated objects

    • Need to create the parent/get an id before you an create a child
    • Should this be represented by a single JSON with all the associated objects? Or force clients to do several requests?
      • Usually you should avoid having a 1:1 relationship between database and exposed API resources
      • Also consider not exposing every single DB id!
    • There's definitely a lot of import/export hacks in different APIs
      • Is this a hack, or missing underlying support?
      • Is this a higher level function? For the business or the system?
  • How to document?

    • Not RESTful
    • Sometimes this can be represented as basically a search by primary key
    • Minimize the diff from REST
      • Return meta info for each requested resource (eg individual status codes)
      • Batch only against the same resource
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment