-
-
Save amaltaro/72599f995b37a6e33566f3c749143154 to your computer and use it in GitHub Desktop.
# OPTION A: | |
{"wf_A": {"timestamp": 0000 | |
"primary": ["list of transfer ids"], | |
"secondary": ["list of transfer ids"]}, | |
"wf_B": {"timestamp": 0000 | |
"primary": [], | |
"secondary": []}, | |
} | |
# OPTION B: | |
{"wf_A": {"timestamp": 0000 | |
"primary": {"dset_1": ["list of transfer ids"]}, | |
"secondary": {"PU_dset_1": ["list of transfer ids"]}, | |
"wf_B": {"timestamp": 0000 | |
"primary": {"dset_1": ["list of transfer ids"], | |
"parent_dset_1": ["list of transfer ids"]}, | |
"secondary": {"PU_dset_1": ["list of transfer ids"], | |
"PU_dset_2": ["list of transfer ids"]}, | |
"wf_C": {"timestamp": 0000 | |
"primary": {}, | |
"secondary": {}, | |
} | |
# OPTION C (the chosen one!) - it assumes we store all the transfer information within the same Couch document: | |
{"wf_A": [{"timestamp":000, "dataset":"/a/b/c", "dataType": "primary", "transferIDs": [1,2,3]}, | |
"timestamp":000, "dataset":"/a/b/c", "dataType": "secondary", "transferIDs": [4]}], | |
"wf_B": [{"timestamp":000, "dataset":"/a/b/c", "dataType": "primary", "transferIDs": [1,2,3]}, | |
"timestamp":000, "dataset":"/a/b/c", "dataType": "parent", "transferIDs": [4,5,6]}], | |
"wf_C": [], | |
} | |
# OPTION D - it assumes a new document is created for every request: | |
{"workflowName": "blah, | |
"lastUpdate": 000, # just as timestamp above | |
"transfers": [{"dataset":"/a/b/c", "dataType": "primary", "transferIDs": [1,2,3], "campaignName": "blah2017", "completion": [0.0]}, | |
{"dataset":"/a/b/c", "dataType": "secondary", "transferIDs": [4], "campaignName": "blah2018", "completion": [0.0]}, | |
{"dataset":"/a/b/c", "dataType": "parent", "transferIDs": [4,5,6], "campaignName": "blah2017", "completion": [0.0]}] | |
} |
In general timestamp should be float since we'll use time.time()
and it is a float number. But of course we can cast it to int. Everything else is correct.
@vkuznet Valentin, I created the Option D for the case where we want to store a new document for each workflow. I believe that's going to be our best option TBH.
Alan, your option D is almost identical to my original proposal (the difference that I proposed records per each dataset and you group them for given workflow) and it is a good compromise, i.e. it represents a single entity (in this case workflow) and we do not need to compose gigantic single dictionary.
Ok, let's hope nothing else changes. Let's proceed with option D then, one record/document per workflow.
@vkuznet Valentin, I added a completion
parameter to the option D, such that we can persist the transfer completion every time it gets calculated (and persist it).
@vkuznet this last option looks reasonable. I added it to the list of formats as option C.
Just to clarify the fields:
timestamp
: (integer type) timestamp for when this request + dataset was acted on (created/updated)dataset
: (string type) string with the dataset name (block names are not supported and it must be mapped to the dataset name)dataType
: (string type) string with the dataset type. It can be one of the following valuesprimary | secondary | parent
transferIDs
: (list of integer) list containing the transfer identificationAs a general rule, a workflow can have the following data:
0-1 primary dataset, 0-1 parent dataset, 0-N secondary dataset (normally up to 2)