-
Common requirements
- Are they atomic?
- What kind of operations are supported?
-
Existing approaches
- Google storage
- Multipart/mixed content-type
- POST /batch
- Facebook
- JSON content-type: batch=[ {"method":"GET", "relative_url": "me" }, ...]
- Neo4j
- Salesforce
- Route53
- Batch changes: no update (delete+add again)
- JSON patch?
- Google storage
-
Interesting design questions
- Scope: where can a batch be done atomically?
- One error per operation
-
Motivation for batching
- Transactional property
- Are they really necessary? Sometimes people think they need transactions but there are difference strategies
- Applying CAP, going for consistency might cost availability
- But there's control plane and service availability. Important to separate them.
- A lot of data being transformed
- Some resources need to be updated together, by design
- Slow updates/async updates
- POST creates task, GET gives you a state with a hypermedia link so clients can fetch results once it's done
- Link is only available if the task/job worked
- Has anyone used web hooks/queueing to handle updates?
- Performance
- Much harder to cache
- Sometimes batches really are expressing workflow
- What is going to happen after this happens?
- Server-side process to do what clients would otherwise do themselves
- Transactional property
-
Techniques / real world usage
- When transactions are not available, update tree-like structures starting by the child
- You can have a process reaping out unassociated objects
- Batch on single resources: provide multiple keys eg GET /foo?id=1&id=2&id=3
- Also support updates, although not atomically (it's exactly the same as calling several updates)
- In practice more used for GETs
- Batching requests vs batching operations (GET foo/?id&=1id=2 vs POST /batch asking for two gets)
- Batching requests requests doesn't seem much better than just making multiple requests with keepalive+pipelines
- When transactions are not available, update tree-like structures starting by the child
-
User agents constraint
- Consider SPDY/keepalive/pipeline support
- Javascript doesn't support these atm for instance
-
What does SPDY give beyond regular keepalive+pipeline?
- Header compression
- Send associated resources (eg send javascript together with index.html, don't wait for subsequent request!)
- Still basically just used internally
-
Creating resources with associated objects
- Need to create the parent/get an id before you an create a child
- Should this be represented by a single JSON with all the associated objects? Or force clients to do several requests?
- Usually you should avoid having a 1:1 relationship between database and exposed API resources
- Also consider not exposing every single DB id!
- There's definitely a lot of import/export hacks in different APIs
- Is this a hack, or missing underlying support?
- Is this a higher level function? For the business or the system?
-
How to document?
- Not RESTful
- Sometimes this can be represented as basically a search by primary key
- Minimize the diff from REST
- Return meta info for each requested resource (eg individual status codes)
- Batch only against the same resource
Last active
December 20, 2015 11:39
-
-
Save pedro/6124721 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment