Notes from Documenting History, Loughborough University, 5-6 September 2016
The following text represents my notes rather than precisely what was said on the day and should be taken in that spirit.
On GitHub
Draft Concordat on Open Research Data:
Research Data are quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, interview or other methods. Data may be raw or primary (e.g. direct from measurement or collection) or derived from primary data for subsequent analysis or interpretation (e.g. cleaned up or as an extract from a larger data set). The purpose of open research data is to provide the information necessary to support or validate a research project's observations, findings or outputs. Data may include, for example, statistics, collections of digital images, sound recordings, transcripts of interviews, survey data and fieldwork observations with appropriate annotations.
Digital Curation Centre data management plan: what/how data will be created, how it will be appropriate, restrictions. It is kind of common sense but worth articulating: it isn't exciting, but it is needed. Responsibility tied to public money.
Why not have a plan? Why not plan out how we go about doing our research? Why not be efficient? But beside the carrots (efficiency!) there are sticks: permissions must work with the circumstances in which you want data to be used, mandates may be in place.
And yet: be flexible! The landscape will change! Be aware of differences between international jurisdictions depending on who your collaborators are.
3 copies of the data, on 2 different types of media, 1 of which is geographically separate from the others.
Be realistic and don't over think it: data management plans are mostly common sense, combined with ensuring you have the relevant expertise supporting the project (either existing at your HE or from hires)
Good resource: DMPOnline
Definition exercise (on what is DPA, CC, Copyright, Data Formats et al): our responses
FAIR (Findable, Accessible, Interoperable, and Re-usable) Data Principles: FORCE11 Website
Aim to 1) define what it is and what problems it solves 2) what you gain 3) some tools to ease the pain!
Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. dx.doi.org/10.2200/S00334ED1V01Y201102WBE001 http://linkeddatabook.com/editions/1.0/
The web designed for people to share (and publish) information, not for machines to share information.
Linked data takes data out of silos.
Give all the things a name .. make names and concepts and everything unique (eg, John Smith, person, city, London) as statements such as has_name
is_name
et al. With these together we have a linked data graph. So finally we also need to make explicit the meaning of things: assign types to things and put them in a hierarchy.
Albert Meroño-Peñuela, Ashkan Ashkpour, Marieke van Erp, Kees Mandemakers, Leen Breure, Andrea Scharnhorst, Stefan Schlobach, Frank van Harmelen, 'Semantic Technologies for Historical Research: A Survey' SWJ (2012) http://www.semantic-web-journal.net/content/semantic-technologies-historical-research-survey
Summary:
- lots of metadata
- focus on time, geography, people
- vocab/terminologies to describe historical things, processes, events (especially when there are variants)
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Ways in which semantic technologies are being used for historical research - @albertmeronyo #dochist pic.twitter.com/iIA2Jdcj2B
— Anne Welsh (@AnneWelsh) September 5, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Different resources created using Linked Data - @albertmeronyo #dochist pic.twitter.com/8NsSZTSD8L
— Anne Welsh (@AnneWelsh) September 5, 2016
Why?
- efficiency
- better described
- easier to find
- provenance
Three different practical problems
- Creating Linked Data
- Publishing Linked Data
- Accessing Linked Data
Main purpose of CLARIAH to solve these problems without needing to write, for example, SPARQL queries.
OpenRefine makes LOD from spreadsheets http://openrefine.org/
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Some sections reworked from James Baker , "Preserving Your Research Data," Programming Historian (30 April 2014), http://programminghistorian.org/lessons/preserving-your-research-data
Exceptions: quotations and embeds to and from external sources