Skip to content

Instantly share code, notes, and snippets.

@amaltaro
Last active August 19, 2019 13:03
Show Gist options
  • Save amaltaro/72599f995b37a6e33566f3c749143154 to your computer and use it in GitHub Desktop.
Save amaltaro/72599f995b37a6e33566f3c749143154 to your computer and use it in GitHub Desktop.
Data structure for the MS Transferor document
# OPTION A:
{"wf_A": {"timestamp": 0000
"primary": ["list of transfer ids"],
"secondary": ["list of transfer ids"]},
"wf_B": {"timestamp": 0000
"primary": [],
"secondary": []},
}
# OPTION B:
{"wf_A": {"timestamp": 0000
"primary": {"dset_1": ["list of transfer ids"]},
"secondary": {"PU_dset_1": ["list of transfer ids"]},
"wf_B": {"timestamp": 0000
"primary": {"dset_1": ["list of transfer ids"],
"parent_dset_1": ["list of transfer ids"]},
"secondary": {"PU_dset_1": ["list of transfer ids"],
"PU_dset_2": ["list of transfer ids"]},
"wf_C": {"timestamp": 0000
"primary": {},
"secondary": {},
}
# OPTION C (the chosen one!) - it assumes we store all the transfer information within the same Couch document:
{"wf_A": [{"timestamp":000, "dataset":"/a/b/c", "dataType": "primary", "transferIDs": [1,2,3]},
"timestamp":000, "dataset":"/a/b/c", "dataType": "secondary", "transferIDs": [4]}],
"wf_B": [{"timestamp":000, "dataset":"/a/b/c", "dataType": "primary", "transferIDs": [1,2,3]},
"timestamp":000, "dataset":"/a/b/c", "dataType": "parent", "transferIDs": [4,5,6]}],
"wf_C": [],
}
# OPTION D - it assumes a new document is created for every request:
{"workflowName": "blah,
"lastUpdate": 000, # just as timestamp above
"transfers": [{"dataset":"/a/b/c", "dataType": "primary", "transferIDs": [1,2,3], "campaignName": "blah2017", "completion": [0.0]},
{"dataset":"/a/b/c", "dataType": "secondary", "transferIDs": [4], "campaignName": "blah2018", "completion": [0.0]},
{"dataset":"/a/b/c", "dataType": "parent", "transferIDs": [4,5,6], "campaignName": "blah2017", "completion": [0.0]}]
}
@vkuznet
Copy link

vkuznet commented May 27, 2019

In general timestamp should be float since we'll use time.time() and it is a float number. But of course we can cast it to int. Everything else is correct.

@amaltaro
Copy link
Author

@vkuznet Valentin, I created the Option D for the case where we want to store a new document for each workflow. I believe that's going to be our best option TBH.

@vkuznet
Copy link

vkuznet commented Jul 25, 2019

Alan, your option D is almost identical to my original proposal (the difference that I proposed records per each dataset and you group them for given workflow) and it is a good compromise, i.e. it represents a single entity (in this case workflow) and we do not need to compose gigantic single dictionary.

@amaltaro
Copy link
Author

Ok, let's hope nothing else changes. Let's proceed with option D then, one record/document per workflow.

@amaltaro
Copy link
Author

@vkuznet Valentin, I added a completion parameter to the option D, such that we can persist the transfer completion every time it gets calculated (and persist it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment