Live notes, so an incomplete, partial record of what actually happened.
Tag: bldigital
Melissa Terras , Unexpected Repurposing: The BL's digital collections and UCL teaching, research and infrastructure
Mandates require you to make things available. But digital materials really fragile: look at what was funded circa 2000 by the Lottery New Opportunities Fund. What has improved this: standards, licensing, repos/infrastructure. All coalesced into openglam: have a go, have a play ethos.
Work with C19 books. We can do more with working with texts one by one. Scaling up with Flickr release of images, Mechanical Curator, Jisc Historical Texts.
Lots of stuff: skills + tech.
Learning: British Library Bid Data Experiment dx.doi.org/10.5281/zenodo.18567
Testing HPC infrastructure. No-one at UCL used the HPC, until recently! http://dh2016.adho.org/abstracts/230 Not established to work with data humanists use: lots of metadata! Work on text and work on images.
What is different here? We need to document what we do. We need to not iterate datasets without fixing versions in time. We need to normalise.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>.@melissaterras advocating the training of library staff to run query "recipes" against common text databases for researchers #BLdigital
— M. H. Beⓐls (@mhbeals) November 7, 2016
Roly: Labs influenced Living Knowledge https://www.bl.uk/projects/living-knowledge-the-british-library-2015-2023
Winner: SherlockNet http://blogs.bl.uk/digital-scholarship/2016/08/sherlocknet-tagging-and-captioning-the-british-librarys-flickr-images.html
PhD Student, Nottingham .. http://blogs.bl.uk/digital-scholarship/2016/11/black-abolitionist-performances-and-their-presence-in-britain-an-update.html .. 1830s-1890s .. Newspaper data. Inspired by Katrina's meetings mapper project.
Want to humanise data. Eg Frederick Douglass: freed slave and public speaker who came over from the US.
Used on research archive as a training dataset.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Loving the finding of hidden stories uncovered during searches, including a lion tamer and ant wars :D #bldigital
— Steph Taylor (@CriticalSteph) November 7, 2016
Worked on some OCR corrected. Both handkeyed and using training sets.
You can see more of @Hannah_RoseM's work at https://t.co/7xPOyG9lGN #bldigital
— Jane Winters (@jfwinters) November 7, 2016
This is about finding voices in the archives.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Winning team for 2016 BL Labs competition now, Wang, Zhao & Do on using machine learning to automatically tag & caption BL images #bldigital
— Paul Gooding (@pmgooding) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Super exciting what SheclockNet managed to do use neural networks to mass tag the BL Flickr collection. Dream come true. #bldigital
— Ernesto Priego (@ernestopriego) November 7, 2016
Convolutional Neural Networks (CNNs) are optimised for image analysis. Manually annotated 10k images into categories.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Classified 1x10^6 images into 12 categories, training their machine. 200k images were classified as people. #bldigital
— Daniel Pett (@DEJPett) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>More on @BL_Labs winner SherlockNet's machine learning approach to tag 1 Million @britishlibrary Images. https://t.co/ghlKOIJTw5 #bldigital
— Nora McGregor (@ndalyrose) November 7, 2016
Then used surrounding text to make sense of the images. Lots of noise. So pooled surrounding text from similar images (similarity measured by vectoring)
Captioning. Common practice in books, et cetera. Autogenerating these is an active area of CS research. 10k images tagged in an hour of compter time.
Category tags are now on Flickr BL set.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>IIRC SherlockNet project used Visual Geometry Group’s Deep ConvNet for BM Prints & Drawings data: https://t.co/CXEAw8WIn3 #bldigital
— Giles Bergel (@GilesBergel) November 7, 2016
Neural networks can find similar images that go beyond composition.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>#SherlockNet - trying to use Neural Networks to automatically tag and caption images from #bldigital https://t.co/68Gu7H9Sco
— ostephens (@ostephens) November 7, 2016
Hypothesis generation by using neural networks to did out connections between images
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Like the idea of computationally-generated hypotheses, but NNets need to become more responsive. What are they thinking? #bldigital
— Giles Bergel (@GilesBergel) November 7, 2016
Michael Takeo Magruder, Imaginary Cities
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Magruder: we’re going to take historical information about cities in the BL collections to create new artworks. #BLdigital
— Alastair Horne (@pressfuturist) November 7, 2016
Jennifer Batt, Datamining verse in C18 newspapers
Lots of poems in newspapers.
Partnership with GameCity / National Videogame arcade. Been going since 2013. Shakespeare theme for 2016.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>New #BLdigital blog post about the 2016 Off the Map winners @TheNVA @gamecity @dmuleicester @tombattey https://t.co/71NhSyzSLv
— Digital Scholarship (@BL_DigiSchol) October 28, 2016
Next year on Victorian Entertainments.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>I seriously think hybrid archives will be with us for a long time. Skills needed to manage both elements are v important. #bldigital
— Steph Taylor (@CriticalSteph) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Rachel Foss identifies user education and expectations as a major challenge for libraries delivering born digital data #bldigital
— Jane Winters (@jfwinters) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Nice comment by Rachel Foss about there being a history of technological obsolescence in the variety of formats, media archived #bldigital
— Ernesto Priego (@ernestopriego) November 7, 2016
Runner-up: Paul Fyfe et al (images in Victorian newspapers) https://ncna.dh.chass.ncsu.edu/imageanalytics/
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Images made from engraved lines can be hard for computers to see #bldigital
— Rodger Kibble (@rodgerkibble) November 7, 2016
Winner: Melodee Beals https://github.com/mhbeals/scissorsandpaste/ We know there was lots of copying going on but we don't know how to deal with it. Unpicking structures of the C19 newspaper network.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>#bldigital @mhbeals use of plagiarism detection software if super innovative. For more see @IHRDigHist seminar talk https://t.co/2ZXiKaxzMU
— James Baker (@j_w_baker) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Very slow memes with paper, ink, scissors, paste & postal service :) brilliant project #bldigital
— Steph Taylor (@CriticalSteph) November 7, 2016
Runner-up: Poetic Places http://www.poeticplaces.uk/
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Poetic Places app was built using GoodBarber - details about how this was chosen at https://t.co/LIyXRhGtRK #bldigital
— ostephens (@ostephens) November 7, 2016
Winner: Bibliolabs. Digital collections go mobile. https://www.biblioboard.com/
Accidental/unplanned engagement with artists..
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Brilliant artistic digital output @BL_Labs this year, love Fashion Utopias, runner up this year #bldigital
— Maja Maricevic (@MajaMaricevic) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>You can watch the rather lovely Hey There, Young Sailor animated video here: https://t.co/25NvxMO9Y6 #BLdigital Using BL Flickr images!
— Alastair Horne (@pressfuturist) November 7, 2016
Winner: Library Carpentry http://librarycarpentry.github.io/
Andrew Blake: data science .. partnership with BL still looking for a big project (maybe web archives) .. AI not machine learning
Jane Winters - Working with the archived web, 1996-2013: .uk domain crawl 2014 was 56TB .. big bucket of stuff ..
Fantastic work being done on mind-bogglingly large dataset! #bldigital https://t.co/D75H0ZbgXz
— M. H. Beⓐls (@mhbeals) November 7, 2016
.. future work: overlap between archives, linguistic communities, transnational studies
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Stunning how important the UK Web Archive is. Nearly 50% of archived websites from 2014 are no longer on open web. @jfwinters #bldigital
— Paul Gooding (@pmgooding) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Contemplating the multiple archives of the UK web space ... @jfwinters #bldigital pic.twitter.com/3j5S67IVfT
— david bawden (@david_bawden) November 7, 2016
Kenji Takeda: cloud as enabling experimental research, quick access to infinite compute .. microsoft.com/tate .. aka.ms/academicgraph (the academic graph collection held by Microsoft) .. making the data available and making APIs you can build on: Microsoft Cognitive Services ..
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Microsoft Cognitive APIs https://t.co/ujks7fUl9O #bldigital
— Gary Greeeeeeen (@ggnewed) November 7, 2016
Scott Hale (OII): differences between archived and live web complex .. representativeness is more important than completeness for research (surely?!)
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>.@computermacgyve analysis of tripadvisor shows bias towards prominent attractions in the web archive. Q of representativeness #bldigital
— Paul Gooding (@pmgooding) November 7, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Important to know that not all @britishlibrary books are in the online catalogue &how https://t.co/v9AewXstRI is combatting this #bldigital
— Amara Thornton (@amalexathorn) November 7, 2016
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Exceptions: embeds to and from external sources, and direct quotations from speakers