Skip to content

Instantly share code, notes, and snippets.

@drjwbaker
Last active September 5, 2016 16:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drjwbaker/dcdba269e8020439265b0c8f3aa0049a to your computer and use it in GitHub Desktop.
Save drjwbaker/dcdba269e8020439265b0c8f3aa0049a to your computer and use it in GitHub Desktop.
Documenting History Workshop, 5-6 September 2016

Documenting History Workshop

Notes from Documenting History, Loughborough University, 5-6 September 2016

The following text represents my notes rather than precisely what was said on the day and should be taken in that spirit.


Programme

On GitHub


Day 1


Data Management as Part of a Research Workflow (Dr Gareth Cole, Loughborough University)

Draft Concordat on Open Research Data:

Research Data are quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, interview or other methods. Data may be raw or primary (e.g. direct from measurement or collection) or derived from primary data for subsequent analysis or interpretation (e.g. cleaned up or as an extract from a larger data set). The purpose of open research data is to provide the information necessary to support or validate a research project's observations, findings or outputs. Data may include, for example, statistics, collections of digital images, sound recordings, transcripts of interviews, survey data and fieldwork observations with appropriate annotations.

Digital Curation Centre data management plan: what/how data will be created, how it will be appropriate, restrictions. It is kind of common sense but worth articulating: it isn't exciting, but it is needed. Responsibility tied to public money.

Why not have a plan? Why not plan out how we go about doing our research? Why not be efficient? But beside the carrots (efficiency!) there are sticks: permissions must work with the circumstances in which you want data to be used, mandates may be in place.

And yet: be flexible! The landscape will change! Be aware of differences between international jurisdictions depending on who your collaborators are.

3 copies of the data, on 2 different types of media, 1 of which is geographically separate from the others.

Be realistic and don't over think it: data management plans are mostly common sense, combined with ensuring you have the relevant expertise supporting the project (either existing at your HE or from hires)

Good resource: DMPOnline


Knowing the Vocabulary – Data Management & Grant Capture (Dr Gareth Cole, Loughborough University)

Definition exercise (on what is DPA, CC, Copyright, Data Formats et al): our responses

FAIR (Findable, Accessible, Interoperable, and Re-usable) Data Principles: FORCE11 Website


Linked Data for the Documenting Historian (Dr Albert Meroño Peñuela, CLARIAH)

Aim to 1) define what it is and what problems it solves 2) what you gain 3) some tools to ease the pain!

Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. dx.doi.org/10.2200/S00334ED1V01Y201102WBE001 http://linkeddatabook.com/editions/1.0/

The web designed for people to share (and publish) information, not for machines to share information.

Linked data takes data out of silos.

Give all the things a name .. make names and concepts and everything unique (eg, John Smith, person, city, London) as statements such as has_name is_name et al. With these together we have a linked data graph. So finally we also need to make explicit the meaning of things: assign types to things and put them in a hierarchy.

Albert Meroño-Peñuela, Ashkan Ashkpour, Marieke van Erp, Kees Mandemakers, Leen Breure, Andrea Scharnhorst, Stefan Schlobach, Frank van Harmelen, 'Semantic Technologies for Historical Research: A Survey' SWJ (2012) http://www.semantic-web-journal.net/content/semantic-technologies-historical-research-survey

Summary:

  • lots of metadata
  • focus on time, geography, people
  • vocab/terminologies to describe historical things, processes, events (especially when there are variants)

Ways in which semantic technologies are being used for historical research - @albertmeronyo #dochist pic.twitter.com/iIA2Jdcj2B

— Anne Welsh (@AnneWelsh) September 5, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Different resources created using Linked Data - @albertmeronyo #dochist pic.twitter.com/8NsSZTSD8L

— Anne Welsh (@AnneWelsh) September 5, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Why?

  • efficiency
  • better described
  • easier to find
  • provenance

Three different practical problems

  • Creating Linked Data
  • Publishing Linked Data
  • Accessing Linked Data

Main purpose of CLARIAH to solve these problems without needing to write, for example, SPARQL queries.

OpenRefine makes LOD from spreadsheets http://openrefine.org/


Some admin...

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Some sections reworked from James Baker , "Preserving Your Research Data," Programming Historian (30 April 2014), http://programminghistorian.org/lessons/preserving-your-research-data

Exceptions: quotations and embeds to and from external sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment