Skip to content

Instantly share code, notes, and snippets.

@dangunter
Last active August 29, 2015 14:18
Show Gist options
  • Save dangunter/96ebcc0dd734422ab6cb to your computer and use it in GitHub Desktop.
Save dangunter/96ebcc0dd734422ab6cb to your computer and use it in GitHub Desktop.
data redesign discussion notes 4/1/2015
# Weaknesses
* Sharing/granularity
* Versioning
* Narratives together with bio objs
* References
- shock vs. W.S. vs. CDS
- permissions relaxed on refs
- all refs hierarchical/one-way
- data copies may not be complete
* Types
- ownership/namespacing
- extensibility and evolution
- polymorphism
* CDMI API doesn't work
* Reference data updates
* Search
- Validation of raw data
- Uniformity across data sources
* Multiple copies of the data (ref. data in part.)
- central store
- WS copy
- WS indexable copy
- flat files (Solr)
* Lack of subset capability -- always need to get entire obj
* Sharing is all or nothing
# Model
## Information we need in objects
* Object metadata
- id (hash)
- owner
* Datum (bytes)
- Type
* Metadata (non-computable)
* Links/references
- Derived-from
- Other, named, links
* Provenance (computable) -- special metadata in addition to derived-from tree
* Path, replacing current 'name' with: owner/project[/subprojects..]/name
## Sharing
Issue: When user Alice shares X with Bob, and Bob derives Y from X and shares Y with Carol, then in order to see the full provenance of Y the system needs to let Carol see X, whether or not Alice has shared it with her directly.
Solution: Sharing is transitive across Derived-from links without regard to object permissions, so that in the example above Carol can always see X if she has permission to see Y. This preserves transparency. However, users have the option to do another kind of sharing, which we called "radius 1" but I would now call "non-transitive". If Alice shared X non-transitively with Bob, then Bob could create Y but could not share Y with anyone else.
Note that in this discussion, the "object" could be some collection of objects, e.g. a subpath of the Path described above.
It probably makes sense to store the transitivity of sharing as a property of the object (or subpath) itself, rather than making this a property of each individual sharing. This would make it easier to preserve this property in copies, for example.
We noted that all this sharing stuff is eminently circumventable by simply downloading the raw data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment