##British Library Labs Symposium, 3 November 2014
Live notes, so an incomplete, partial record of what actually happened.
Tags: #bldigital #bl_labs
See also notes from Melodee Beals https://github.com/mhbeals/Conference-Notes/commit/bed0efda5dc9c9eea863eab79a59a5736d34bf6d
Worrying in public about a different issue. As data gets big and our ability to store and work with that data improve, profound issue of scale.
The macroscope should allow the researcher to see a data point both in the connect of scale and on its own.
Gulid and Armitage - Paper Machines and History Manifesto - calling for the longue duree - but instead of looking both the large and the small scale at the same time, they stop looking at the small scale.
Scott Weingart, Shawn Graham, Ian Milligan - Historian's Macroscope - important text Weingart on 'Moral Role of DH in a Data Driven world' - large scale view made intelligible through network analysis. Convincing arguement for NA.
Both authors moving the humanities towards a formal conversation with social science - trying to get us a place at their table.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>@scott_bot is getting discussed at BL Labs Symposium by @TimHitchcock The blog post: http://t.co/gP4VpaKFf3
— Daniel Powell (@djp2025) November 3, 2014
But only Ben Schmidt uses the macroscope to its best effect - looking for anachronisms in texts, programmes - and gets results: language of Man Men - arc from overstated 60s masculinity to overstated 70s masculinity.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>@timhitchcock exploring the ramifications of macroscopic approaches to the humanities at #bllabs
— Andrew Prescott (@Ajprescott) November 3, 2014
Are we losing touch with what humanists do best? Forgetting what we spent much of the late-20th century discovering - complexity.
Humanists shouldn't speak to power. They should enable to powerless to speak.
For all the promise of balancing the large and small, the small is often ignored.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Humanists, don't forget skills of close reading & small data in rush to adopt new methods- Tim Hitchcock #bldigital pic.twitter.com/TE5PC7TFGF
— Vicky Garnett (@Vickstar79) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>If all distant reading needs close reading, then shouldn't all close reading need distant reading? #bl_labs #bllabs
— Adam_Crymble (@Adam_Crymble) November 3, 2014
Assumption with big data that the signal will come through. There will be a strong enough signal the bigger you get.
Call for digital tools that allow us to see small.
Call for radical contextualisation of every word ever.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>#bl_labs Concentrating on the small isnt ignoring big data. we need to see both near & far. Hitchcock calls for "radical contextualization"
— Katie McGettigan (@KatieMcGettigan) November 3, 2014
In the rush to big data, to encompass the scale, we're excluding most of the data - the weather when a brush stroke was made.
'Banal world of the average'
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Really liking the conversation - historians' chemists & museum ppl all represented in the discussion and we're still on first talk! #bl_labs
— Steph Taylor (@CriticalSteph) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Keynote Q&A at #bl_labs across multiple disciplines and sectors interesting follow up to our domain analysis #citylis lecture last week
— Alison Pope (@alisonpope) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>BL Labs Tim Hitchcock - fantatsic idea of the Macroscope - that's what we need for @TheContentMine
— Peter Murray-Rust (@petermurrayrust) November 3, 2014
Three parts: The problem, how to automate it, GUI
Why link texts to images?
We digitise to bring our stuff to people. But images on their own are dull.
One solution to put text over the top of an image: Google Books, for example.
Another to put the text next to the image. But hard to keep the image and text in sync.
#bl_labs Now Desmond Schmidt on his tool to connect MS images to transcriptions and so make them available to online audiences
— Katie McGettigan (@KatieMcGettigan) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
Another is markup...
But this doesn't work for transcriptions. What TILT does is context transcriptions to something other than the content (the words). Geojson contexts transcription to shapes in the original image. TILT keeps looking through polygons until it finds one what should match a transcription.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>'Sexy polygons': hug the contours of the words. #bl_labs #citylis #TexttoImageLinkingTool
— Caitlin Moore (@MsCaitlinMoore) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Alignment problems corrected by using anchors (words in image) to update links to text. #bl_labs #citylis #TexttoImageLinkingTool
— Caitlin Moore (@MsCaitlinMoore) November 3, 2014
Trying to avoid recognising characters in the source.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>The live demo shows TILT works remarkably well! 95% successful for typed manuscript, 80% for a 'clear' handwritten one. #bl_labs
— TIME/IMAGE (@time_image) November 3, 2014
Thai would be interesting due to use of spaces as punctuation - going away from characters enables the tool to work across many languages.
Not doing training sets, machine learning.
TILT is interesting because it doesn't care about meaning. Links digitised image to transcription by treating words as polygons #bl_labs
— Alison Pope (@alisonpope) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Website for @bltilt: http://t.co/6pNvWfzulL #bl_labs #citylis
— Caitlin Moore (@MsCaitlinMoore) November 3, 2014
###Bob Nicholson, Victorian Meme Machine
Warning: this presentation will contain some Victorian jokes. They don't seem funny. But they were to the Victorians. Context so important.
Mixed quality OCR okay for an archive. But not for publishing jokes.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Excellent presentation on liberating Victorian jokes from books & newspapers by Bob Nicholson at #bl_labs symposium
— Lotte Wilms (@Lottewilms) November 3, 2014
The VMM uses Omeka with a Scripto plugin.
Often quicker to transcribe than fix broken OCR.
Joke structure as features - title, content, attribution.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Nice use of Omeka (and student labour) for Victorian Meme Machine transcription http://t.co/4ni9SKPrMK #bl_labs
— artefacto (@artefactors) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>@DigiVictorian Reusing British Library images on Flickr for the Victorian Meme Machine project. #bl_labs #citylis #openaccess
— Caitlin Moore (@MsCaitlinMoore) November 3, 2014
Putting bits of collections together to bring fresh perspectives.
Mechanical Comedian is the wayward brother of the Mechanical Curator.
Idea to track how people use the jokes.
Trying to examine 19th century culture of reprinting through 21st century culture of retweeting.
Experimenting with publishing and reinvention of the jokes.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Conference notes: @Digivictorian's presentation at @BL_Labs event: https://t.co/mLW7qVllaW
— M. H. Beals (@mhbeals) November 3, 2014
###Beatrice Alex, Palimpsest: an Edinburgh Literary Cityscape.
http://palimpsest.blogs.edina.ac.uk/
Abstract: We are creating a literary cityscape through text mining vivid, evocative and dramatic extracts of Edinburgh-based literary works from the early modern period to the twentieth century. Palimpsest enables users to explore the dimensions of literary Edinburgh through their encounters with geo-located extracts of literary works either via the web resource or in the city streets via a smartphone or tablet. The project is producing an interactive website and database of Edinburgh - based literary texts which is revealing a range of cityscapes that will newly engage scholars and the public with the urban environment and its literature.
Had to find a method of identifying Ediburghness.
Using BL 19th century books + HathiTrust.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Beatrice Alice talks about mining data from literature set in Edinburgh to create interactive maps. #citylis #bl_labs
— Dom Allington-Smith (@domallsmi) November 3, 2014
Start with digitised collections they have access to, apply filtering (metadata) for Edinburghness, then curate sub-list.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Begbie from Irvine Welsh's Trainspotting quoted extensively, colourful language and all, to illustrate Edinburgh Palimpsest project #bl_labs
— Neil Stewart (@neilstewart) November 3, 2014
Looking to end up with close data. Need a human in the loop for that.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Palimpsest project, Beatrice Alex, who has swears on her slides http://t.co/ydxT7nVV2E ;) I note @suchprettyeyes is involved ;) #BL_Labs
— Steph Taylor (@CriticalSteph) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>#bl_labs Now listening to presentation on Palimpsest - Mapping Literary Edinburgh: http://t.co/zCoWMjmFZP #gis
— John Levin (@anterotesis) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Fab text mining project mapping mentions of Edinburgh in literature to places in map of city. Trainspotting = swears, of course ;) #BL_Labs
— Steph Taylor (@CriticalSteph) November 3, 2014
Discovery paramount - and text mining brought annotators to new texts.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>End product of #Palimpsest: web-based visualisations and "literature-on-the-go" mobile app, useful for pub crawls! #citylis #bl_labs
— Dom Allington-Smith (@domallsmi) November 3, 2014
Turning big data into small data.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>#bl_labs More on #LitPalimpsest just presented by @bea_alex: http://t.co/7B1X0eqtXH via @LitPalimpsest #digitalhumanities
— Max Kaiser (@maxkaiser) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>@LitPalimpsest thanks. It is interesting... legal barriers to mining (even for 'NC') are still significant, e.g. RightsLink #bl_labs
— Ernesto Priego (@ernestopriego) November 3, 2014
###Digital Music Lab
Developing software and methods for analysing large-scale music collections. Creating datasets useable by researchers.
Visualising text - extract 'object' data on musical elements - dynamics, timing, chord.
Similarity metrics.
Using BL Music, CHARM database, and ILikeMusic. Heterogeneous datasets.
Generating tags to browse collections.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>BL Music Collections has properties of a good dataset: large volume, broad coverage, detailed metadata and legal access framework #bl_labs
— Alison Pope (@alisonpope) November 3, 2014
###Peter Balman, ViS
ICTomorrow funding from Technology Strategy Board - BL the challenge partner.
Feedback to measure use and impact of public domain materials.
Using BL FLickr collections http://www.flickr.com/photos/britishlibrary
What are people talking about when using this data?
What is the value of investment in releasing 1 million images?
###Ben O'Steen
Bridge building. Getting researchers to the data in meaningful ways. Making things that may not live but serve a purpose. Starting with something that gets the job done. Trying not to be too perfect.
Researchers say give me everything, but there is always context to that that changes the request... We want to understand that context. And making it accessible in an understandable way is the key to doing that.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Facilitated by @benosteen, the chief "bridge-builder", filterer and anti-perfectionist #citylis #bl_labs
— Dom Allington-Smith (@domallsmi) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>@benosteen the simplest questions can be the most complex #informationretrieval #bl_labs #citylis
— David Phillips (@dpp202) November 3, 2014
All search engines are built on the assumption that the most relevant words are what you want. All services are made with these compromises.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Healthy reminder about the compromises and assumptions made with search engines #bl_labs
— Lawrence (@chilesl) November 3, 2014
What about services that fit the model I want all the works that apply to something...
What is instead of searching for results you searched for trends. Sample Generator suggests should we really just use the digital corpus? Makes clear to us the chasm between digital and print.
People need tools, see the library as an impediment to work with, and don't like APIs.
Machine Learning 101: turn the data into numbers; process the data; annotate ... or you just skip to the end, use off-the-shelf solutions.
Search engines and machine learning are important because the work always starts with the bulk data, directing it at someone, making decisions.
13 December 2013 we put 1 million images on Flickr and people came. They built sets. They added 100k tags. Hundreds contributors. Started a debate (Jonathan Jones)
Working with people and finding out what they actually need. Small bridges that become larger bridges. We do the first bit.
#bl_labs @benosteen: @britishlibrary Flickr images; 20M hits/month, 100K+ tags, hundreds of contributors, crowdsourcing ongoing #OpenGlam
— Max Kaiser (@maxkaiser) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>@benosteen #bl_labs #bigdata project with @uclcs, Great journey of discovery for humanities with #compsci @azure4research
— Kenji Takeda (@ktakeda1) November 3, 2014
###Adam Farquhar, Lessons from the Labs
Digital Scholarship survey: 82% says the BL plays an important role in digital research.
Adam Farquahar presenting results of British library reader survey wch shows imp of data and computing for the library’s readers #bl_labs
— Andrew Prescott (@Ajprescott) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>figures presented by Farquahar important for rise of dh. 1 in 3 readers use data for research; 1 in 6 programme. Lot of scripting. #bl_labs
— Andrew Prescott (@Ajprescott) November 3, 2014
Money and conservation needs are the main biases in the digital content that researchers have available to use.
BL has about 9m 'items' ... but it under-represents - a single book is an item. Volume in size of data incrementally growing - up to 450000GB .. but then undercounting due to compression.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>#bl_labs Adam Farquhar: @britishlibrary's digital library: about 9M objects, 450TB (in DVDs: 1,5 times the height of the Shard...)
— Max Kaiser (@maxkaiser) November 3, 2014
But the variety is the important bit.
All this has put quite a bit of pressure on the BL. Changed type of researcher who come to us.
Digital Scholarship Training Programme http://britishlibrary.typepad.co.uk/digital-scholarship/2014/10/british-library-digital-scholarship-training-programme-round-up-of-resources-you-can-use.html Infrastructure work (Big Data Experiment, http://figshare.com/articles/Interoperable_Infrastructures_for_Digital_Research_A_proposed_pathway_for_enabling_transformation/1092550 )
David Normal heard about #bl_labs Flickr images from guitarist of punk band Flipper. Peer to peer marketing wins!
— Tom Miles (@tommilesz) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>"Reinvention in new context" David Normal on @britishlibrary collage project, R @librarywalled project similar aims, but textbased #BL_Labs
— artefacto (@artefactors) November 3, 2014
Off the Map - engaging a new generation and audience with historical content we have and mixing it with tech.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>Fonthill Abbey inspired game called Nix using Oculus Rift. http://t.co/0n3ZWMjGmx #bl_labs #citylis #immersive
— Alison Pope (@alisonpope) November 3, 2014
Key lessons...
- More is more.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>The more digital content and the more accessible it is, the more digital scholarship there will be. #bl_labs #citylis
— Caitlin Moore (@MsCaitlinMoore) November 3, 2014
We struggle to meet all the expectations, but we are trying!
-
Less is more... in terms of tools and services. Lack of consensus in DH around which tool is best et cetera is okay and we embrace that. But that does conflict with how we try to build tools and services.
-
Bring your own tools. We can't (and shouldn't )force people to use things.
-
Be creative
-
Enable folks to start small and finish big.
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>#bl_labs Adam Farquhar going through the key lessons from Labs. Some of those are on our #dh2014 poster http://t.co/4QucJWDVqE
— James Baker (@j_w_baker) November 3, 2014
Taken together, embracing this is a radically retooling of what we do, offer, support.
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Exceptions: embeds to and from external sources