darobin/dependency.md

## dependency.md

      
    Raw
  

              dependency.md
            
          
    In npm you can express the basic informaion about a project and its dependencies in the following manner:
{
  "name": "dahut",
  "version": "1.42.17",
  "dependencies": {
    "cryptozoology": "^0.9.2",
    "jsdom": "1.2.7 || >=1.2.9 <2.0.0",
    "web-verse": "1.0.1"
  }
}
The manner in which the version specifiers on the right of the dependency map are resolved are detailed in
npm semver.
We have a use case concerning data citations. Today researchers who publish data sets don't get much credit for it, notably they
do not get cited in the way that they would if they published an article. This is in part due to the fact that people don't really
know how to do it properly. One of the issues is versioning. Data sets get updated (some of them a lot). If you provide no way of
addressing a version, you lose reproducibility. But if the relationship is too strict you lose the ability to state that your analysis
can be expected to be resilient to various degrees of changes to the data. Labelling data sets with semver and
matching with npm semver is a good match.
My immediate concern is dependency from article to data; but it also applies to dependencies from articles to software (and then
between any pair of software, data, article, and likely a bunch of other things).
The kind of thing I had in mind was (note that this is assuming that isBasedOnUrl gets generalised to isBasedOn, there may be a
better way to capture the idea):
For data
{
  "@type":  "Dataset",
  "isBasedOn": [
    {
      "@type":  "DependencyRole",
      "isBasedOn": {
        "@type":  "DataDownload",
        "contentUrl": "http://..."
      },
      "matchVersion": "^1.1.0",
    }
  ]
}
For code:
{
  "@type":  "SoftwareSourceCode",
  "isBasedOn": [
    {
      "@type":  "DependencyRole",
      "isBasedOn": {
        "@type":  "SoftwareSourceCode",
        "codeRepository": "http://..."
      },
      "matchVersion": "~0.9.0",
    }
  ]
}