Skip to content

Instantly share code, notes, and snippets.

@drjwbaker
Last active August 29, 2015 13:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drjwbaker/10422453 to your computer and use it in GitHub Desktop.
Save drjwbaker/10422453 to your computer and use it in GitHub Desktop.
'Reusing digital content: towards making research using this content limited by what is possible rather than what is permissible', Mining Digital Repositories: Challenges and Horizons event at the KB, The Hague, 11 April 2014

###'Reusing digital content: towards making research using this content limited by what is possible rather than what is permissible'

Notes from a short talk I gave at the Mining Digital Repositories: Challenges and Horizons event at the KB, The Hague, 11 April 2014

The following text represents my notes rather than precisely what was said on the day and should be taken in that spirit.

Slides: http://slidesha.re/R71gBz Notes: https://gist.github.com/drjwbaker/10422453


Intro

Background of team, multi-disciplinary team with broad skill set S sense of importance of open S ethos of more than resource discovery, think of digital research with respect to societal change S deluge of data et al, too much to read Sx2 category of research we support S new contexts for scholarship in the humanities and social sciences. S changing libraries: places full of different things (web content), performing different roles (offering open linked data, running digital research labs S Labs competition open...), driving change (teams of humanities researchers, practitioners and scholars).


####Re-use scenarios

S Microsoft Live Books Search (2006-2008) ... 68k volumes from the BL digitised ... when project wound up MS gave content to us and we dedicated it into the Public Domain for unrestricted use and reuse ... access solution through catalogue. PDFs of books.

Last year we, we here Digital Research Team and Andrew Mellon Funded British Library Labs team, started investigating better access to this content, access that would encourage reuse. We began by wondering what else is in there apart from text? ... found 1 million images in the OCR ... @MechCurBot story - from playing (faces), to idea (exhibiting), to Tumblr (serendipitous publication) ... http://mechanicalcurator.tumblr.com/

S Flickr story ... Improve discovery and research through distributed metadata generation (interesting side issue: what does this method of metadata collection mean for researchers?) http://www.flickr.com/photos/britishlibrary Since we released the collection in early December: over 132 million image views, over 81,000 tags. Built sets that refresh daily of the least seen and least tagged images: every image has now been seen, only around 70,000 images have had fewer than 5 views, and no more than 50k images have no human generated tags.

All this has encouraged reuse.

Automated reuse of the data:

S Wikimedia

S Digital NZ

Manual, creative reuse of the content:

S Nicola Demonte has been using the images in his art history - 'Memory Lane' - classes for students with memory loss, brain injuries and Alzheimer's today at Summerwood of Chanahssen, a senior living institution in Minnesota.

S Michael Hancher, University of Minnesota ('Doing things with a million British Library book illustrations' http://blog.lib.umn.edu/mh/dh2/2014/01/doing-things-with-a-million-british-library-book-illustrations.html and exercise at https://docs.google.com/file/d/0B08KKnzlfYMNRmVuMjl1SEFSdjA/edit ) Getting students to sample the book illustrations and assess the experience of coming up with informative tags for several dozen images. - For each image, he has asked them to be prepared to discuss the appropriateness of the tags and the questions that they raise. Such as, for this image: - Do the buildings include churches? - Seated man or seated woman? An adult, not a child? Evidence for that? - What kind of hat? - Is that really a sketch pad? - What does the signature say? - Does the text in this book relate in a significant way to this picture?

S Secret santa map...

S Comics and memes...


####Architecture

Mention of the text that surrounds the images brings me to architecture and to the evolving role of libraries. The MS books were low-hanging fruit, a consolidated collection unrestricted by copyright or licensing conditions in an otherwise fragmented and complex landscape. We used Tumblr and Flickr as means of enabling reuse because both are deployed architectures we could exploit with minimal engineering, because they have APIs for machine readable access, and because both present visual material relatively well to humans ... (to stress, were it needed, we are not wedded to Tumblr or Flickr - or for that matter Yahoo services - as platforms for our content.)

As we explore a vision for architecture and infrastructure around our digital collections, we bring this desire to exploit off the shelf technologies with us (we don't want to continually reinvent the wheel.) And we have chosen this approach in response to a set of problems we have identified:

  • That infrastructures are restrictive and proscriptive.
  • That assets are distributed unevenly across organisations and systems.
  • That access restrictions unpredictably limit where, how and who can use items.

Alongside these problems we are experiencing changing demand from researchers with respect to digital content:

  • For scalable access to large quantities of digital content; be that text, images, sound, video, data.
  • For the ability to bring their own tools, work in whatever way they want, use any workflow, address any sort of problem.
  • For the ability to work across collections irrespective of content owner or licence terms.

Deployed technologies and services are available to create interoperable infrastructures and virtual research environments that would address these problems and meet these needs. And outside of the humanities, they are being used; at the European Bioinformatics Institute, at CERN.

We see this infrastructure vision as a priority because, as suggested independently but almost simultaneously a few months back by Andrew Prescott (DH KCL ... and in the audience) and Bob Nicholson (a periodicals scholar at Edge Hill in the UK), the present situation is that research that uses digital assets is limited by what is permissible as much as what is possible.

This isn't good enough.

For research using digital content at scale to thrive and flourish in the humanities, we need to somehow bridge the gap between an notion (though it of course is an illusion) that anything is possible with traditional, hand-crafted, 'non-digital' humanities research approaches, and a digital research landscape shaped by unevenly distributed and restricted digital content; a landscape that restricts creative, novel and unexpected reuse of the digital assets we have invested substantial funds, time and effort to create; digital assets that have the potential to change the understanding of and engagement with past experience, our shared heritage; the stuff heritage institutions have spent so much time, effort and money trying to capture.

S [slides and notes]


Some admin...

This work is licensed under a Creative Commons Attribution 3.0 Unported License. Creative Commons License

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment