When I create a dat, it gives me a dat.json file. The dat.json file currently describes a single dat. For example,
From the existing dat.json spec:
{
"url": "dat://37ad8eb8d7f...",
"title": "This is My Dataset",
"description": "This is a one or two line description of the archive for Human People",
"author": "Karissa McKelvey <karissa@mckelvey.edu>"
}
A dat.json file is great for compiling the metadata about a dat in a single file. It is also good if you want to send the dat to another person -- you can just send them the file, or add the file to github, or an HTTP server, or anywhere else, and they can see what it is.
What if I create a project that includes downloading multiple dats? For example, using the census combined with another dataset I collected from Twitter. That means I need a way to specify that there many dats in one package of data. For those familiar with software development, we call this a 'dependency.'
We want to add a new field to the dat.json file that will make it easy to add other dats as dependencies.
A dat dependency could be either:
- an HTTP URL containing the
Hyperdrive-Key
header - or a Dat URL
- the dat.json file (seen above)
You map the folder name to download the data with the dat dependency URL:
{
"dats": {
"tweet-data": "https://datproject.org/karissa/more-tweets-more-votes",
"data-in-kind": "dat://37ad8eb8d7f#d7e7ehff",
"another-dataset": {
"url": "dat://37ad8eb8d7f...",
"title": "This is My Dataset",
"description": "This is a one or two line description of the archive for Human People",
"author": "Karissa McKelvey <karissa@mckelvey.edu>"
}
}
}
Then someone can type dat .
and when there is a dat.json
file and it would download the data to the given folders.