Skip to content

Instantly share code, notes, and snippets.

Last active April 5, 2016 14:58
Show Gist options
  • Save drjwbaker/64ce831dcc0cc479a5d8f10caf35afcf to your computer and use it in GitHub Desktop.
Save drjwbaker/64ce831dcc0cc479a5d8f10caf35afcf to your computer and use it in GitHub Desktop.
British Library Labs Roadshow, Sussex Humanities Lab, 4 April 2016

British Library Labs Roadshow, Sussex Humanities Lab, 4 April 2016

Live notes, so an incomplete, partial record of what actually happened.

Tags: bldigital

My asides in {}



James Baker, Born digital big data and approaches for history and the humanities

The paper archive has been replaced by the hard disk – a new format that requires historians, archivists, and humanists to think and act afresh. In just 35 years most people – in Britain and worldwide – have come to create text and data in a fundamentally new way. Alongside this, the published record has migrated from paper to screen, from physical to digital. In this talk James will consider what this means for our work. In doing so, he will introduce 'Born digital big data and approaches for history and the humanities', an AHRC-funded network that brings together researchers and practitioners from a range of stakeholder groups, to discern if there is a genuine humanities approach to born-digital data, and to establish how this might inform, complement and draw on other disciplines and practices.

My text

Alice Eldridge, Ecoacoustics: Machine Listening in the Wild, Developing Acoustic Methods for Biodiversity Monitoring

Numerous major multi-lateral initiatives aim to promote and protect biodiversity -- at the governmental level biodiversity needs to be incorporated into national accounting by 2020 yet cost effective tools necessary to achieve this remain elusive. Operating within the conceptual and methodological framework of the burgeoning field of Ecoacoustics, (Sueur and Farina, 2015) we are investigating the potential for the acoustic environment or soundscape - as a resource from which to infer ecological information. In this talk Alice will give an overview of work at Sussex researching machine listening tools for ‘listening’ to the acoustic environment and invite consideration of how similar tools might be usefully applied in digital archives.

How do we monitor the whole planet? Audio as an assessment of phenomena. Soundscape approach to diversity. Soundscapes as sites of interaction. The maths is working out but we don't really know what this means? Pull out components of a soundscape to isolate species.

#bldigital Alice Eldridge, on doc-acoustics - just wild sounds...

— Tim Hitchcock (@TimHitchcock) April 4, 2016
<script async src="//" charset="utf-8"></script>

Possible historical uses of ecoacoustics: identifying regional accents, identifying music playing machines, etc. #bldigital

— John Levin (@anterotesis) April 4, 2016
<script async src="//" charset="utf-8"></script>

Julie Weeds, More Meaningful Text Analysis

Increasing access to large volumes of machine-readable text requires users to have more sophisticated ways of searching, classifying and clustering documents at their fingertips. Julie will give an overview of technology being developed for this purpose by members of the Sussex Humanities Lab in Informatics. Method52 is a piece of in-house software which primarily allows users to rapidly build custom document classifiers using a technique called active learning. Distributional semantics refers to a collection of techniques where the meaning of words (or phrases) are represented in terms of their co-occurrences, hence allowing for searching, classification and clustering to be carried out in a more meaning-sensitive way. It also opens up the possibility of exploring the differences between collections or clusters of documents in terms of the meanings of the words within them.

Automated Thesaurus generation. Using TAGLab to study impact of academic work/publication.

Hana Lewis / Mahendra Mahey, British Library Labs

The British Library Labs project supports and inspires scholars to use the British Library’s incredible digital collections in exciting and innovative ways for their research, through various activities such as competitions, awards, events and projects. Labs will highlight some of the work that they and others are doing around digital content in libraries and also talk about ways to encourage researchers to engage with the British Library. They will present information on the annual BL Labs Competition, which closes this year on 11th April 2016. Through the Competition, Labs encourages researchers to submit their important research question or creative idea which uses the British Library’s digital content and data. Two Competition winners then work in residence at the British Library for five months and then showcase the results of their work at the annual Labs Symposium in November 2016. Labs will also discuss the annual BL Labs Awards which recognises outstanding work already completed, that has used the British Library’s digital collections and data. This year, the Awards will commend work in at least four key areas: Research, Artistic, Commercial and Teaching / Learning. The deadline for entering the BL Labs Awards this year is 5th September 2016.

Deadline for BL Labs Competition next week -- 7 November 2016 British Library Labs Symposium -- Adam Crymble's Crowdsource Arcade a previous winner Gaming to generate tags (active and by absence) -- BL Labs Awards are a show us what you've already done category (deadline 5 September)

Analysis of BL digitized images shows that "nothing spells trouble like a hat on the floor"! #bldigital

— John Levin (@anterotesis) April 4, 2016
<script async src="//" charset="utf-8"></script>

Katrina Navickas, The Political Meetings Mapper

Katrina Navickas, one of the winners of the BL Labs Competition (2015) will talk about her project Political Meetings Mapper, a tool for text mining and geo-locating the records of political meetings, enabling anyone to access the maps and data on an interactive website.

Trying to find a way to automating the discovery of places mentioned the Northern Star (a Chartist newspaper) -- Abbyy Finereader 12 results good, you only need a student to do a little QAing -- locating places that don't exist -- locations can then lead back out into thinking contextually

Ben O'Steen, Overview projects that have used British Library’s Digital Content and data

Labs will further present information on various projects such as the ‘Mechanical Curator’ and other interesting experiments using the British Library’s digital content and data.

Next up, @benosteen talking about 'Farces and failures', on miscommunications #bldigital

— John Levin (@anterotesis) April 4, 2016
<script async src="//" charset="utf-8"></script>

We have our own remit: about connecting researchers to the data -- working on a specific problem with a researchers to help the library better understand what it does next for all researchers -- researchers want to grab the data and run -- digital archives could always do with more documentation -- [] approximations in library data, approximations and handwaiving that we always need to keep in mind -- working through a specific problem to solve a generic issue (often with access) -- Flickr images now have over 330m views (amazing sustained use) -- Crowdsourcing means different things to different people; try to find out who your experts are early to stop all crowdsourcing only being done by 1% of people

Crowdsourcing is managing a preexisting or newly formed community where a small % will carry out the work #bldigital

— Sharon Webb (@wsharon145) April 4, 2016
<script async src="//" charset="utf-8"></script>

Jody Butterworth, Endangered Archives Programme

The Endangered Archives Programme aims to contribute to the preservation of archival material that is in danger of destruction, neglect or physical deterioration world-wide and to make the material available for scholarly research. To date, we have over 5 million images online via our website. A recent example of a project that is now available is the wonderful archive of photographs taken by Annemarie Heinrich - our first project to give us a CC BY NC license. We also have close to 10,000 sound recordings available through BL Sounds. -- 283 projects in 80 countries -- lots of projects in Mali, Nepal. -- nothing leaves the country (even digitisation material so as to help other materials) -- originally a digital repository at the library, but became an online portal - first CC BY-NC EAP project: 'EAP755: A modern gaze on old cultural practices in Argentina: relocation and preservation of the 'Heinrich Sanguinetti Archive' (1930-1956)'

Audio collections getting gradually online:

Typically pre-modern.

Mahendra Mahey, Examination of British Library data and previous Labs ideas

Labs will be coming along with terabytes of the British Library’s digital data on the day which the team will give an overview of, highlighting some of the challenges faced when working with “messy” data. They will also give a brief outline of the various ideas and projects which explore working with the British Library’s digital content and data.

Ideas Lab

Delegates will then have the opportunity to work in small groups and come up with their own ideas. The team and Sussex Humanities Lab staff will be on hand to help and advise.

Pitching ideas to the panel

Each group will pitch their ideas to the Labs and Sussex Humanities Lab panel who will give feedback on how they might be implemented - and there’s even the chance to win a goody bag! -- -- opendata -- ftp -n --

`cat BL_Labs_EThOS_File_150301.csv | csvfilter -f 0,1,5,7 | egrep '([A-Z][A-Z][A-Z])' | cat | egrep -o '.{0,3}([A-Z][A-Z][A-Z]).{0,3}'

egrep -o '.{0,3}\(SOS\).{0,3}'

Some admin...

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Exceptions: embeds to and from external sources, and direct quotations from speakers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment