Skip to content

Instantly share code, notes, and snippets.

@drjwbaker
Last active November 7, 2016 16:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drjwbaker/0e2a195788fc6cd224318615e6595295 to your computer and use it in GitHub Desktop.
Save drjwbaker/0e2a195788fc6cd224318615e6595295 to your computer and use it in GitHub Desktop.
British Library Labs Symposium 2016, British Library, 7 November 2016

British Library Labs Symposium 2016, British Library, 7 November 2016

Live notes, so an incomplete, partial record of what actually happened.

Tag: bldigital


Melissa Terras , Unexpected Repurposing: The BL's digital collections and UCL teaching, research and infrastructure

Mandates require you to make things available. But digital materials really fragile: look at what was funded circa 2000 by the Lottery New Opportunities Fund. What has improved this: standards, licensing, repos/infrastructure. All coalesced into openglam: have a go, have a play ethos.

Work with C19 books. We can do more with working with texts one by one. Scaling up with Flickr release of images, Mechanical Curator, Jisc Historical Texts.

Lots of stuff: skills + tech.

Learning: British Library Bid Data Experiment dx.doi.org/10.5281/zenodo.18567

Testing HPC infrastructure. No-one at UCL used the HPC, until recently! http://dh2016.adho.org/abstracts/230 Not established to work with data humanists use: lots of metadata! Work on text and work on images.

What is different here? We need to document what we do. We need to not iterate datasets without fixing versions in time. We need to normalise.

.@melissaterras advocating the training of library staff to run query "recipes" against common text databases for researchers #BLdigital

— M. H. Beⓐls (@mhbeals) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

British Library Labs Competition Awards (2016)

Roly: Labs influenced Living Knowledge https://www.bl.uk/projects/living-knowledge-the-british-library-2015-2023

Winner: SherlockNet http://blogs.bl.uk/digital-scholarship/2016/08/sherlocknet-tagging-and-captioning-the-british-librarys-flickr-images.html


Hannah-Rose Murray, Black Abolitionist Performances in Britain

PhD Student, Nottingham .. http://blogs.bl.uk/digital-scholarship/2016/11/black-abolitionist-performances-and-their-presence-in-britain-an-update.html .. 1830s-1890s .. Newspaper data. Inspired by Katrina's meetings mapper project.

Want to humanise data. Eg Frederick Douglass: freed slave and public speaker who came over from the US.

Used on research archive as a training dataset.

Loving the finding of hidden stories uncovered during searches, including a lion tamer and ant wars :D #bldigital

— Steph Taylor (@CriticalSteph) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Worked on some OCR corrected. Both handkeyed and using training sets.

You can see more of @Hannah_RoseM's work at https://t.co/7xPOyG9lGN #bldigital

— Jane Winters (@jfwinters) November 7, 2016

This is about finding voices in the archives.


Karen Wang, Ludo Zhao, & Brian Do, SherlockNet

Winning team for 2016 BL Labs competition now, Wang, Zhao & Do on using machine learning to automatically tag & caption BL images #bldigital

— Paul Gooding (@pmgooding) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Super exciting what SheclockNet managed to do use neural networks to mass tag the BL Flickr collection. Dream come true. #bldigital

— Ernesto Priego (@ernestopriego) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Convolutional Neural Networks (CNNs) are optimised for image analysis. Manually annotated 10k images into categories.

Classified 1x10^6 images into 12 categories, training their machine. 200k images were classified as people. #bldigital

— Daniel Pett (@DEJPett) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

More on @BL_Labs winner SherlockNet's machine learning approach to tag 1 Million @britishlibrary Images. https://t.co/ghlKOIJTw5 #bldigital

— Nora McGregor (@ndalyrose) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Then used surrounding text to make sense of the images. Lots of noise. So pooled surrounding text from similar images (similarity measured by vectoring)

Captioning. Common practice in books, et cetera. Autogenerating these is an active area of CS research. 10k images tagged in an hour of compter time.

Category tags are now on Flickr BL set.

IIRC SherlockNet project used Visual Geometry Group’s Deep ConvNet for BM Prints & Drawings data: https://t.co/CXEAw8WIn3 #bldigital

— Giles Bergel (@GilesBergel) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Neural networks can find similar images that go beyond composition.

#SherlockNet - trying to use Neural Networks to automatically tag and caption images from #bldigital https://t.co/68Gu7H9Sco

— ostephens (@ostephens) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Hypothesis generation by using neural networks to did out connections between images

Like the idea of computationally-generated hypotheses, but NNets need to become more responsive. What are they thinking? #bldigital

— Giles Bergel (@GilesBergel) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

New BL Labs projects

Michael Takeo Magruder, Imaginary Cities

Magruder: we’re going to take historical information about cities in the BL collections to create new artworks. #BLdigital

— Alastair Horne (@pressfuturist) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Jennifer Batt, Datamining verse in C18 newspapers

Lots of poems in newspapers.


Stella Wisdom, Off the Map

Partnership with GameCity / National Videogame arcade. Been going since 2013. Shakespeare theme for 2016.

New #BLdigital blog post about the 2016 Off the Map winners @TheNVA @gamecity @dmuleicester @tombattey https://t.co/71NhSyzSLv

— Digital Scholarship (@BL_DigiSchol) October 28, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Next year on Victorian Entertainments.


Rachel Foss, Kureishi Archive

I seriously think hybrid archives will be with us for a long time. Skills needed to manage both elements are v important. #bldigital

— Steph Taylor (@CriticalSteph) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Rachel Foss identifies user education and expectations as a major challenge for libraries delivering born digital data #bldigital

— Jane Winters (@jfwinters) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Nice comment by Rachel Foss about there being a history of technological obsolescence in the variety of formats, media archived #bldigital

— Ernesto Priego (@ernestopriego) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Research Award

Runner-up: Paul Fyfe et al (images in Victorian newspapers) https://ncna.dh.chass.ncsu.edu/imageanalytics/

Images made from engraved lines can be hard for computers to see #bldigital

— Rodger Kibble (@rodgerkibble) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Winner: Melodee Beals https://github.com/mhbeals/scissorsandpaste/ We know there was lots of copying going on but we don't know how to deal with it. Unpicking structures of the C19 newspaper network.

#bldigital @mhbeals use of plagiarism detection software if super innovative. For more see @IHRDigHist seminar talk https://t.co/2ZXiKaxzMU

— James Baker (@j_w_baker) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Very slow memes with paper, ink, scissors, paste & postal service :) brilliant project #bldigital

— Steph Taylor (@CriticalSteph) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Commercial

Runner-up: Poetic Places http://www.poeticplaces.uk/

Poetic Places app was built using GoodBarber - details about how this was chosen at https://t.co/LIyXRhGtRK #bldigital

— ostephens (@ostephens) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Winner: Bibliolabs. Digital collections go mobile. https://www.biblioboard.com/


Artistic Award

Accidental/unplanned engagement with artists..

Brilliant artistic digital output @BL_Labs this year, love Fashion Utopias, runner up this year #bldigital

— Maja Maricevic (@MajaMaricevic) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

You can watch the rather lovely Hey There, Young Sailor animated video here: https://t.co/25NvxMO9Y6 #BLdigital Using BL Flickr images!

— Alastair Horne (@pressfuturist) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Teaching and Learning

Winner: Library Carpentry http://librarycarpentry.github.io/


Alan Turing Institute

Andrew Blake: data science .. partnership with BL still looking for a big project (maybe web archives) .. AI not machine learning

Jane Winters - Working with the archived web, 1996-2013: .uk domain crawl 2014 was 56TB .. big bucket of stuff ..

Fantastic work being done on mind-bogglingly large dataset! #bldigital https://t.co/D75H0ZbgXz

— M. H. Beⓐls (@mhbeals) November 7, 2016

.. future work: overlap between archives, linguistic communities, transnational studies

Stunning how important the UK Web Archive is. Nearly 50% of archived websites from 2014 are no longer on open web. @jfwinters #bldigital

— Paul Gooding (@pmgooding) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Contemplating the multiple archives of the UK web space ... @jfwinters #bldigital pic.twitter.com/3j5S67IVfT

— david bawden (@david_bawden) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Kenji Takeda: cloud as enabling experimental research, quick access to infinite compute .. microsoft.com/tate .. aka.ms/academicgraph (the academic graph collection held by Microsoft) .. making the data available and making APIs you can build on: Microsoft Cognitive Services ..

Microsoft Cognitive APIs https://t.co/ujks7fUl9O #bldigital

— Gary Greeeeeeen (@ggnewed) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Scott Hale (OII): differences between archived and live web complex .. representativeness is more important than completeness for research (surely?!)

.@computermacgyve analysis of tripadvisor shows bias towards prominent attractions in the web archive. Q of representativeness #bldigital

— Paul Gooding (@pmgooding) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Staff award

Important to know that not all @britishlibrary books are in the online catalogue &how https://t.co/v9AewXstRI is combatting this #bldigital

— Amara Thornton (@amalexathorn) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Some admin...

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Exceptions: embeds to and from external sources, and direct quotations from speakers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment