Skip to content

Instantly share code, notes, and snippets.

@drjwbaker
Last active August 29, 2015 14:08
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drjwbaker/fe66934864179ba7b401 to your computer and use it in GitHub Desktop.
Save drjwbaker/fe66934864179ba7b401 to your computer and use it in GitHub Desktop.
British Library Labs Symposium, 3 November 2014

##British Library Labs Symposium, 3 November 2014

Live notes, so an incomplete, partial record of what actually happened.

Tags: #bldigital #bl_labs

See also notes from Melodee Beals https://github.com/mhbeals/Conference-Notes/commit/bed0efda5dc9c9eea863eab79a59a5736d34bf6d


Tim Hitchcock, Big Data, Small Data and Meaning: The Conundrums of Tool use in the Humanities

Worrying in public about a different issue. As data gets big and our ability to store and work with that data improve, profound issue of scale.

The macroscope should allow the researcher to see a data point both in the connect of scale and on its own.

Gulid and Armitage - Paper Machines and History Manifesto - calling for the longue duree - but instead of looking both the large and the small scale at the same time, they stop looking at the small scale.

Scott Weingart, Shawn Graham, Ian Milligan - Historian's Macroscope - important text Weingart on 'Moral Role of DH in a Data Driven world' - large scale view made intelligible through network analysis. Convincing arguement for NA.

Both authors moving the humanities towards a formal conversation with social science - trying to get us a place at their table.

@scott_bot is getting discussed at BL Labs Symposium by @TimHitchcock The blog post: http://t.co/gP4VpaKFf3

— Daniel Powell (@djp2025) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

But only Ben Schmidt uses the macroscope to its best effect - looking for anachronisms in texts, programmes - and gets results: language of Man Men - arc from overstated 60s masculinity to overstated 70s masculinity.

@timhitchcock exploring the ramifications of macroscopic approaches to the humanities at #bllabs

— Andrew Prescott (@Ajprescott) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Are we losing touch with what humanists do best? Forgetting what we spent much of the late-20th century discovering - complexity.

Humanists shouldn't speak to power. They should enable to powerless to speak.

For all the promise of balancing the large and small, the small is often ignored.

Humanists, don't forget skills of close reading & small data in rush to adopt new methods- Tim Hitchcock #bldigital pic.twitter.com/TE5PC7TFGF

— Vicky Garnett (@Vickstar79) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

If all distant reading needs close reading, then shouldn't all close reading need distant reading? #bl_labs #bllabs

— Adam_Crymble (@Adam_Crymble) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Assumption with big data that the signal will come through. There will be a strong enough signal the bigger you get.

Call for digital tools that allow us to see small.

Call for radical contextualisation of every word ever.

#bl_labs Concentrating on the small isnt ignoring big data. we need to see both near & far. Hitchcock calls for "radical contextualization"

— Katie McGettigan (@KatieMcGettigan) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

In the rush to big data, to encompass the scale, we're excluding most of the data - the weather when a brush stroke was made.

'Banal world of the average'

Really liking the conversation - historians' chemists & museum ppl all represented in the discussion and we're still on first talk! #bl_labs

— Steph Taylor (@CriticalSteph) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Keynote Q&A at #bl_labs across multiple disciplines and sectors interesting follow up to our domain analysis #citylis lecture last week

— Alison Pope (@alisonpope) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

BL Labs Tim Hitchcock - fantatsic idea of the Macroscope - that's what we need for @TheContentMine

— Peter Murray-Rust (@petermurrayrust) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Desmond Schmidt (Queensland), Text to Image Linking Tool

Three parts: The problem, how to automate it, GUI

Why link texts to images?

We digitise to bring our stuff to people. But images on their own are dull.

One solution to put text over the top of an image: Google Books, for example.

Another to put the text next to the image. But hard to keep the image and text in sync.

#bl_labs Now Desmond Schmidt on his tool to connect MS images to transcriptions and so make them available to online audiences

— Katie McGettigan (@KatieMcGettigan) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Another is markup...

But this doesn't work for transcriptions. What TILT does is context transcriptions to something other than the content (the words). Geojson contexts transcription to shapes in the original image. TILT keeps looking through polygons until it finds one what should match a transcription.

'Sexy polygons': hug the contours of the words. #bl_labs #citylis #TexttoImageLinkingTool

— Caitlin Moore (@MsCaitlinMoore) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Alignment problems corrected by using anchors (words in image) to update links to text. #bl_labs #citylis #TexttoImageLinkingTool

— Caitlin Moore (@MsCaitlinMoore) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Trying to avoid recognising characters in the source.

The live demo shows TILT works remarkably well! 95% successful for typed manuscript, 80% for a 'clear' handwritten one. #bl_labs

— TIME/IMAGE (@time_image) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Thai would be interesting due to use of spaces as punctuation - going away from characters enables the tool to work across many languages.

Not doing training sets, machine learning.

TILT is interesting because it doesn't care about meaning. Links digitised image to transcription by treating words as polygons #bl_labs

— Alison Pope (@alisonpope) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Website for @bltilt: http://t.co/6pNvWfzulL #bl_labs #citylis

— Caitlin Moore (@MsCaitlinMoore) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

###Bob Nicholson, Victorian Meme Machine

Warning: this presentation will contain some Victorian jokes. They don't seem funny. But they were to the Victorians. Context so important.

Mixed quality OCR okay for an archive. But not for publishing jokes.

Excellent presentation on liberating Victorian jokes from books & newspapers by Bob Nicholson at #bl_labs symposium

— Lotte Wilms (@Lottewilms) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

The VMM uses Omeka with a Scripto plugin.

Often quicker to transcribe than fix broken OCR.

Joke structure as features - title, content, attribution.

Nice use of Omeka (and student labour) for Victorian Meme Machine transcription http://t.co/4ni9SKPrMK #bl_labs

— artefacto (@artefactors) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

@DigiVictorian Reusing British Library images on Flickr for the Victorian Meme Machine project. #bl_labs #citylis #openaccess

— Caitlin Moore (@MsCaitlinMoore) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Putting bits of collections together to bring fresh perspectives.

Mechanical Comedian is the wayward brother of the Mechanical Curator.

Idea to track how people use the jokes.

Trying to examine 19th century culture of reprinting through 21st century culture of retweeting.

Experimenting with publishing and reinvention of the jokes.

Conference notes: @Digivictorian's presentation at @BL_Labs event: https://t.co/mLW7qVllaW

— M. H. Beals (@mhbeals) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

###Beatrice Alex, Palimpsest: an Edinburgh Literary Cityscape.

http://palimpsest.blogs.edina.ac.uk/

Abstract: We are creating a literary cityscape through text mining vivid, evocative and dramatic extracts of Edinburgh-based literary works from the early modern period to the twentieth century. Palimpsest enables users to explore the dimensions of literary Edinburgh through their encounters with geo-located extracts of literary works either via the web resource or in the city streets via a smartphone or tablet. The project is producing an interactive website and database of Edinburgh - based literary texts which is revealing a range of cityscapes that will newly engage scholars and the public with the urban environment and its literature.

Had to find a method of identifying Ediburghness.

Using BL 19th century books + HathiTrust.

Beatrice Alice talks about mining data from literature set in Edinburgh to create interactive maps. #citylis #bl_labs

— Dom Allington-Smith (@domallsmi) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Start with digitised collections they have access to, apply filtering (metadata) for Edinburghness, then curate sub-list.

Begbie from Irvine Welsh's Trainspotting quoted extensively, colourful language and all, to illustrate Edinburgh Palimpsest project #bl_labs

— Neil Stewart (@neilstewart) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Looking to end up with close data. Need a human in the loop for that.

Palimpsest project, Beatrice Alex, who has swears on her slides http://t.co/ydxT7nVV2E ;) I note @suchprettyeyes is involved ;) #BL_Labs

— Steph Taylor (@CriticalSteph) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

#bl_labs Now listening to presentation on Palimpsest - Mapping Literary Edinburgh: http://t.co/zCoWMjmFZP #gis

— John Levin (@anterotesis) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Fab text mining project mapping mentions of Edinburgh in literature to places in map of city. Trainspotting = swears, of course ;) #BL_Labs

— Steph Taylor (@CriticalSteph) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Discovery paramount - and text mining brought annotators to new texts.

End product of #Palimpsest: web-based visualisations and "literature-on-the-go" mobile app, useful for pub crawls! #citylis #bl_labs

— Dom Allington-Smith (@domallsmi) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Turning big data into small data.

#bl_labs More on #LitPalimpsest just presented by @bea_alex: http://t.co/7B1X0eqtXH via @LitPalimpsest #digitalhumanities

— Max Kaiser (@maxkaiser) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

@LitPalimpsest thanks. It is interesting... legal barriers to mining (even for 'NC') are still significant, e.g. RightsLink #bl_labs

— Ernesto Priego (@ernestopriego) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

###Digital Music Lab

http://dml.city.ac.uk/

Developing software and methods for analysing large-scale music collections. Creating datasets useable by researchers.

Visualising text - extract 'object' data on musical elements - dynamics, timing, chord.

Similarity metrics.

Using BL Music, CHARM database, and ILikeMusic. Heterogeneous datasets.

Generating tags to browse collections.

BL Music Collections has properties of a good dataset: large volume, broad coverage, detailed metadata and legal access framework #bl_labs

— Alison Pope (@alisonpope) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

###Peter Balman, ViS

ICTomorrow funding from Technology Strategy Board - BL the challenge partner.

Feedback to measure use and impact of public domain materials.

Using BL FLickr collections http://www.flickr.com/photos/britishlibrary

What are people talking about when using this data?

What is the value of investment in releasing 1 million images?


###Ben O'Steen

Bridge building. Getting researchers to the data in meaningful ways. Making things that may not live but serve a purpose. Starting with something that gets the job done. Trying not to be too perfect.

Researchers say give me everything, but there is always context to that that changes the request... We want to understand that context. And making it accessible in an understandable way is the key to doing that.

Facilitated by @benosteen, the chief "bridge-builder", filterer and anti-perfectionist #citylis #bl_labs

— Dom Allington-Smith (@domallsmi) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

@benosteen the simplest questions can be the most complex #informationretrieval #bl_labs #citylis

— David Phillips (@dpp202) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

All search engines are built on the assumption that the most relevant words are what you want. All services are made with these compromises.

Healthy reminder about the compromises and assumptions made with search engines #bl_labs

— Lawrence (@chilesl) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

What about services that fit the model I want all the works that apply to something...

What is instead of searching for results you searched for trends. Sample Generator suggests should we really just use the digital corpus? Makes clear to us the chasm between digital and print.

People need tools, see the library as an impediment to work with, and don't like APIs.

Machine Learning 101: turn the data into numbers; process the data; annotate ... or you just skip to the end, use off-the-shelf solutions.

Search engines and machine learning are important because the work always starts with the bulk data, directing it at someone, making decisions.

13 December 2013 we put 1 million images on Flickr and people came. They built sets. They added 100k tags. Hundreds contributors. Started a debate (Jonathan Jones)

Working with people and finding out what they actually need. Small bridges that become larger bridges. We do the first bit.

#bl_labs @benosteen: @britishlibrary Flickr images; 20M hits/month, 100K+ tags, hundreds of contributors, crowdsourcing ongoing #OpenGlam

— Max Kaiser (@maxkaiser) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

@benosteen #bl_labs #bigdata project with @uclcs, Great journey of discovery for humanities with #compsci @azure4research

— Kenji Takeda (@ktakeda1) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

###Adam Farquhar, Lessons from the Labs

Digital Scholarship survey: 82% says the BL plays an important role in digital research.

Adam Farquahar presenting results of British library reader survey wch shows imp of data and computing for the library’s readers #bl_labs

— Andrew Prescott (@Ajprescott) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

figures presented by Farquahar important for rise of dh. 1 in 3 readers use data for research; 1 in 6 programme. Lot of scripting. #bl_labs

— Andrew Prescott (@Ajprescott) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Money and conservation needs are the main biases in the digital content that researchers have available to use.

BL has about 9m 'items' ... but it under-represents - a single book is an item. Volume in size of data incrementally growing - up to 450000GB .. but then undercounting due to compression.

#bl_labs Adam Farquhar: @britishlibrary's digital library: about 9M objects, 450TB (in DVDs: 1,5 times the height of the Shard...)

— Max Kaiser (@maxkaiser) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

But the variety is the important bit.

All this has put quite a bit of pressure on the BL. Changed type of researcher who come to us.

Digital Scholarship Training Programme http://britishlibrary.typepad.co.uk/digital-scholarship/2014/10/british-library-digital-scholarship-training-programme-round-up-of-resources-you-can-use.html Infrastructure work (Big Data Experiment, http://figshare.com/articles/Interoperable_Infrastructures_for_Digital_Research_A_proposed_pathway_for_enabling_transformation/1092550 )

David Normal heard about #bl_labs Flickr images from guitarist of punk band Flipper. Peer to peer marketing wins!

— Tom Miles (@tommilesz) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

"Reinvention in new context" David Normal on @britishlibrary collage project, R @librarywalled project similar aims, but textbased #BL_Labs

— artefacto (@artefactors) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Off the Map - engaging a new generation and audience with historical content we have and mixing it with tech.

Fonthill Abbey inspired game called Nix using Oculus Rift. http://t.co/0n3ZWMjGmx #bl_labs #citylis #immersive

— Alison Pope (@alisonpope) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Key lessons...

  1. More is more.

The more digital content and the more accessible it is, the more digital scholarship there will be. #bl_labs #citylis

— Caitlin Moore (@MsCaitlinMoore) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

We struggle to meet all the expectations, but we are trying!

  1. Less is more... in terms of tools and services. Lack of consensus in DH around which tool is best et cetera is okay and we embrace that. But that does conflict with how we try to build tools and services.

  2. Bring your own tools. We can't (and shouldn't )force people to use things.

  3. Be creative

  4. Enable folks to start small and finish big.

#bl_labs Adam Farquhar going through the key lessons from Labs. Some of those are on our #dh2014 poster http://t.co/4QucJWDVqE

— James Baker (@j_w_baker) November 3, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Taken together, embracing this is a radically retooling of what we do, offer, support.


Some admin...

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Creative Commons License

Exceptions: embeds to and from external sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment