I feel the current REST API does not present a cohesive interface for users and we need to investigate alternatives. The current interface attempts to present a simple interface, but exposes several options that may lead to incorrect usage or confusion. If we're planning on the REST API being absorbed by the iRODS server, then we better make sure it is clean and unabigious. Otherwise, it could hurt the adoption of iRODS.
I feel a better approach would be to expose various operations for each entity in iRODS, hence /collections and /data-objects. That means each path comes with an operation
parameter (which isn't shown). My hope is users find this approach easier to understand. For example, if you want to create a collection, look at /collections. If you want to schedule a delay rule, look at /rules.
The listing below presents each URL path along with the operations that it can/should expose. Many of these operations can be translated directly to iRODS API calls, which is a good thing.
- /auth
- /collections
- create
- remove
- move
- list
- adjust permissions
- stat (i.e view perms)
- /config
- view
- modify (later)
- /data-objects
- create
- remove
- read
- write
- move
- trim
- replicate
- adjust permissions
- stat (i.e view perms, replicas, etc)
- register
- unregister
- /metadata
- add
- set
- modify
- remove
- list
- /query
- run general queries
- run specific queries
- list available specific queries
- list available keywords
- /resources
- add
- modify
- remove
- list
- adjust hierarchies
- rebalance, etc.
- /rules
- execute
- list
- remove delay rules
- /tickets
- create
- modify
- remove
- /users
- create
- modify
- remove
- list
- /groups
- create
- modify
- remove
- list
- /zones
- add
- modify?
- remove
- list
To do parallel transfer using port 1247, the steps are as follows:
- Open the first stream.
- Capture the replica access token.
- Open secondary streams.
- Each stream must use its own connection
- They must target the same replica
- They must use the same open flags
- They must be opened using the replica access token
- Send bytes across streams.
- Close secondary streams without updating the catalog.
- Close first stream.
- The first stream is responsible for updates to the catalog
- The first stream is responsible for triggering policy
First, instruct the server to initialize the transfer state.
/data-objects?op=parallel-write-init&channels=4&lpath=/tempZone/home/rods/f.txt[&dst-resource=some_resc|&replica_number=3]
This returns a handle to state that is specifically needed for the upcoming transfer. That state will be shared across multiple requests. The state may contain things such as: connections, chunk sizes, offsets per stream, the replica access token, etc. The state is never exposed to the client of the REST API. This simplifies the interface.
For example, the response could resemble the following:
{
"error_code": 0,
"transfer_handle": "<UUID>"
}
Now, the client sends chunks that will be written to various locations within the replica. Notice the header, content-length, is used to inform the server of the body's length. The body being the bytes to write. It's possible for things such as the offset to be stored in the server. This example means there are always at least two API calls per write.
content-length: 8196
/data-objects?op=write&transfer-handle=UUID&offset=1000
Once all bytes are transferred, send a request instructing the server to shutdown the transfer.
/data-objects?op=parallel-write-shutdown&transfer-handle=UUID
Each operation returns a JSON response containing an iRODS error code and an error message if available. For example:
{
"error_code": 0,
"error_message": ""
}
I think we can improve parallel writes even more.
In the text, I mentioned that the parallel writes (as described) would result in two API operations (i.e. seek -> write). We can adjust
rxDataObjWrite
to take an optional offset which would bring the number of network calls down to one. Doing that also avoids triggering the REPF for seek operations. Yet another performance win.Not only that, but all clients would be able to take advantage of that optimization too.
The downside is that any policy associated with seek will not be triggered. I don't see this being an issue as long as users understand that. The option to embed the offset into the write operation is purely an opt-in situation.
Thoughts?
I'm going to write this up as an issue if there are no objections.
I just realized that
OpenedDataObjInp
already contains anoffset
input value, so the data structure doesn't need to be modified at all. We just need to verify whatoffset
means forrxDataObjWrite
. See the following: