Skip to content

Instantly share code, notes, and snippets.

@korydraughn
Last active November 6, 2023 22:05
Show Gist options
  • Save korydraughn/78ec96120234659db1c2ba3235efa46c to your computer and use it in GitHub Desktop.
Save korydraughn/78ec96120234659db1c2ba3235efa46c to your computer and use it in GitHub Desktop.
iRODS C++ REST API v1.0

Can the REST API be improved?

I feel the current REST API does not present a cohesive interface for users and we need to investigate alternatives. The current interface attempts to present a simple interface, but exposes several options that may lead to incorrect usage or confusion. If we're planning on the REST API being absorbed by the iRODS server, then we better make sure it is clean and unabigious. Otherwise, it could hurt the adoption of iRODS.

An Alternative Approach

I feel a better approach would be to expose various operations for each entity in iRODS, hence /collections and /data-objects. That means each path comes with an operation parameter (which isn't shown). My hope is users find this approach easier to understand. For example, if you want to create a collection, look at /collections. If you want to schedule a delay rule, look at /rules.

The listing below presents each URL path along with the operations that it can/should expose. Many of these operations can be translated directly to iRODS API calls, which is a good thing.

  • /auth
  • /collections
    • create
    • remove
    • move
    • list
    • adjust permissions
    • stat (i.e view perms)
  • /config
    • view
    • modify (later)
  • /data-objects
    • create
    • remove
    • read
    • write
    • move
    • trim
    • replicate
    • adjust permissions
    • stat (i.e view perms, replicas, etc)
    • register
    • unregister
  • /metadata
    • add
    • set
    • modify
    • remove
    • list
  • /query
    • run general queries
    • run specific queries
    • list available specific queries
    • list available keywords
  • /resources
    • add
    • modify
    • remove
    • list
    • adjust hierarchies
    • rebalance, etc.
  • /rules
    • execute
    • list
    • remove delay rules
  • /tickets
    • create
    • modify
    • remove
  • /users
    • create
    • modify
    • remove
    • list
  • /groups
    • create
    • modify
    • remove
    • list
  • /zones
    • add
    • modify?
    • remove
    • list

Parallel Transfer

To do parallel transfer using port 1247, the steps are as follows:

  1. Open the first stream.
  2. Capture the replica access token.
  3. Open secondary streams.
    • Each stream must use its own connection
    • They must target the same replica
    • They must use the same open flags
    • They must be opened using the replica access token
  4. Send bytes across streams.
  5. Close secondary streams without updating the catalog.
  6. Close first stream.
    • The first stream is responsible for updates to the catalog
    • The first stream is responsible for triggering policy

A possible translation to REST

First, instruct the server to initialize the transfer state.

/data-objects?op=parallel-write-init&channels=4&lpath=/tempZone/home/rods/f.txt[&dst-resource=some_resc|&replica_number=3]

This returns a handle to state that is specifically needed for the upcoming transfer. That state will be shared across multiple requests. The state may contain things such as: connections, chunk sizes, offsets per stream, the replica access token, etc. The state is never exposed to the client of the REST API. This simplifies the interface.

For example, the response could resemble the following:

{
  "error_code": 0,
  "transfer_handle": "<UUID>"
}

Now, the client sends chunks that will be written to various locations within the replica. Notice the header, content-length, is used to inform the server of the body's length. The body being the bytes to write. It's possible for things such as the offset to be stored in the server. This example means there are always at least two API calls per write.

content-length: 8196

/data-objects?op=write&transfer-handle=UUID&offset=1000

Once all bytes are transferred, send a request instructing the server to shutdown the transfer.

/data-objects?op=parallel-write-shutdown&transfer-handle=UUID

Each operation returns a JSON response containing an iRODS error code and an error message if available. For example:

{
  "error_code": 0,
  "error_message": ""
}
@trel
Copy link

trel commented Apr 10, 2023

agreed - i really like the noun/verb formalism.

@korydraughn
Copy link
Author

korydraughn commented Apr 10, 2023

From @trel:

  • consider linking the steps between the parallel transfer impls
  • consider bad clients: apply a timeout to the parallel transfer connections
    • what happens if someone opens several connections, but never closes them?
  • error_code vs return_code (normal and boring)
  • consider returning {"irods_response": {"code": 0, "message": ""}}
  • is it possible to return detailed messages (i.e. the payload?) when there's an http error?
  • use v0.9.5

@korydraughn
Copy link
Author

From @alanking:

  • control plane support
    • moving the delay server
    • starting/stopping servers in a zone

@korydraughn
Copy link
Author

I think we can improve parallel writes even more.

In the text, I mentioned that the parallel writes (as described) would result in two API operations (i.e. seek -> write). We can adjust rxDataObjWrite to take an optional offset which would bring the number of network calls down to one. Doing that also avoids triggering the REPF for seek operations. Yet another performance win.

Not only that, but all clients would be able to take advantage of that optimization too.

The downside is that any policy associated with seek will not be triggered. I don't see this being an issue as long as users understand that. The option to embed the offset into the write operation is purely an opt-in situation.

Thoughts?

I'm going to write this up as an issue if there are no objections.

I just realized that OpenedDataObjInp already contains an offset input value, so the data structure doesn't need to be modified at all. We just need to verify what offset means for rxDataObjWrite. See the following:

@korydraughn
Copy link
Author

More thoughts:

  • Consider merging the /groups endpoint into the /users endpoint
    • Perhaps these are replaced by /users-groups? Hmm
  • Should /metadata be absorbed into multiple endpoints? (i.e. /data-objects, /collections, /resources, /users, /groups)
    • At the moment, /metadata supports one operation and is just a pass through for the atomic API
    • If more operations are added to /metadata, then it's likely better to keep it separate
      • What would the other operations be though? Query operations? That feels more suited for /query

@trel
Copy link

trel commented Apr 16, 2023

feels more natural to have metadata operations 'per noun'... but that's different than what we've always had.

so exploring a little...

/metadata could support only the atomic endpoint where you can do 'lots' at once

and all the rest below could just be convenience/sugar that point to the atomic call internally...

/data-objects/metadata/add
/data-objects/metadata/set
/data-objects/metadata/modify
/data-objects/metadata/remove
/collections/metadata/add
/collections/metadata/set
/collections/metadata/modify
/collections/metadata/remove
/resources/metadata/add
/resources/metadata/set
/resources/metadata/modify
/resources/metadata/remove
/users/metadata/add
/users/metadata/set
/users/metadata/modify
/users/metadata/remove

agreed, /query feels the correct home for searching through metadata.

not opposed to /users-groups - need to map all functionality out to see if they're consistent enough (I suspect they are)

@korydraughn
Copy link
Author

/metadata could support only the atomic endpoint where you can do 'lots' at once

Yes, that is what the rewrite does today.

and all the rest below could just be convenience/sugar that point to the atomic call internally...

Well, add/set/modify/remove do not work with how the atomic API is meant to be used. And the only benefit to providing a /metadata underneath each noun is that the client doesn't have to define what the type of the entity is. The type is implied. This would lead to more work needing to be done within the REST API server.

The downside of introducing more levels (e.g. /data-objects/metadata) is the number of available URLs and the number of (route, handler) pairs grow by N. I don't see any real benefit to introducing more levels.

Right now, the rewrite is all about making it work and learning Boost.Beast. Things are going well. The pattern is mostly:

  • route -> handler -> if-ladder-of-operations -> execute-iRODS-API-calls -> send-response

By leveraging an op or action parameter, no one has to learn new URLs. They simply tweak the parameters and arguments passed.

To see how the experimental rewrite is shaping up, see the following:

@trel
Copy link

trel commented Apr 17, 2023

add/set/modify/remove do not work with how the atomic API is meant to be used

I'll request discush on this point - not sure what you mean here.

Yes, extra 'levels' would mean more work/parsing/routing in the server (later, not necessarily now), but could be more approachable/expected for the programmer / user of the API.

@korydraughn
Copy link
Author

Ideas:

  • Approval test suite
    • JSON file containing inputs and outputs
    • Libraries built on top of this can reuse the JSON file

Also, /data-objects needs to handle checksums.

@korydraughn
Copy link
Author

korydraughn commented Apr 25, 2023

Apparently, HTTP states that custom status codes can be returned as long as they don't conflict with the standardized ones and the class of the status code is maintained. See the following:

Also, according to https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#server_error_responses, HTTP servers are required to implement support for GET and HEAD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment