Skip to content

Instantly share code, notes, and snippets.

@drjwbaker
Last active January 28, 2018 22:35
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drjwbaker/65764f82a70f0c01ffbb0cf9bfb955f3 to your computer and use it in GitHub Desktop.
Save drjwbaker/65764f82a70f0c01ffbb0cf9bfb955f3 to your computer and use it in GitHub Desktop.
After the Digital Revolution: Bringing together archivists and scholars to preserve born- digital records and produce new knowledge, 25-26 January

After the Digital Revolution: Bringing together archivists and scholars to preserve born-digital records and produce new knowledge, 25-26 January

Live notes, so an incomplete, partial record of what actually happened.

Tags: afterdigrev

My asides in {}

http://www.afterthedigitalrevolution.com/


Day 1

Processing Born-digital Record

Jenny Bunn (University College London): Processing and Making Sense of Born-digital Records

Archivists worry about accuracy of data science approaches to - eg - similarity matching .. archivists extract sense and then form it within the archive, researchers extract sense of form it in their articles/paper/books .. Victoria Lemieux Archival Science: making sense of how archivists make sense of archival work through their arrangements - time, form, materiality - by asking them to say what they are thinking when sorting.

{sorting/clustering by form .. chimes we with things I care about .. and how people engage with objects: Kress (Gunther Kress, Multimodality: A Social Semiotic Approach to Contemporary Communication (London [u.a.: Routledge, 2010)) on multi-modality, text isn't how we engage with documents, we engage with the whole page .. not that - of course - we want to import forward the paper mindset}

James Lappin (Loughborough University): The Routine Deletion of Correspondence from UK Government Email Accounts

Want a stat on how many emails civil servants put an email into a doc management system compared with how many they send .. email does not pass from person to person as seamlessly as physical documents

#afterdigrev: Email accounts are the first form of public record where we cannot guarantee that they will be handed over after their creator leaves the organisation,

— Andrew Prescott (@Ajprescott) January 25, 2018

Guy Baxter (University of Reading): Joining the Dots: How do we make “archives” and “digital archives” synonymous?

Too many solutions to get from what we've always done, to dealing with digital - and from too many sectors .. archives worry about digital preservation, asset management first .. {sense here that processing born digital archives is always embedded in organisational history and change}

Q&A

Legalities in the UK compared to US: in the UK data protection clashes with the principles of keeping everything from some people, per the US Govt Capstone archival policy (though, DPA does state that keeping records for research is legitimate) .. what are the lines between professional, personal, and private? .. will we be so surprised by what is being captured as our behaviours change .. materiality as way of contextualising bulk ..

We're discussing visualization of information at micro and macro levels to create representation of born digital archives #AfterDigRev

— Amy Hildreth Chen (@AmyHildrethChen) January 25, 2018

.. do we even need series level descriptions if we can find a way of describing them in semi-automated ways? .. 'violence to the original order'

Discovering born-digital records

Jonathan Pledge and Eleanor Dickens (British Library): The Complexities of Providing Access to Personal Born-digital Archives

Since 2015: curator led processing, research into born digital led by digital preservation team ..

Most born digital archives @britishlibrary migrated into PDFs. Interested in more discussion around the use of PDFs (positives/negatives) #afterdigrev

— Amy Hildreth Chen (@AmyHildrethChen) January 25, 2018

.. Case study of Anne McLaren archive: hybrid archive, 453 digital objects (cds, floppies, disks), forensic capture + extracted capture + metadata capture, use FTK Imager (though this is poor for CD) and Kryoflux

Jonathan Pledge talking about the brilliant work the British Library is doing with born digital (and hybrid) archives - focusing on the Anne McLaren Archive https://t.co/G0jlnOtYAF #AfterDigRev

— Jane Winters (@jfwinters) January 25, 2018

#AfterDigRev Processing workflow for the Anne McLaren archive @britishlibrary pic.twitter.com/uIarVJDvXc

— James Baker (@j_w_baker) January 25, 2018

Migration: eg PPT via MacPro to .ps to PDF/A.

Pledge: interesting decision to include corrupted files (PowerPoint and JPEGs) in the final archival arrangement, with an explanatory note #AfterDigRev

— Jane Winters (@jfwinters) January 25, 2018

But, so many exceptions it is hard to put together a coherent workflow ..

Pledge: accessing migrated digital archives through only one physical location replicates (& worsens) restrictions imposed by paper records. #afterdigrev

— Amy Hildreth Chen (@AmyHildrethChen) January 25, 2018

#AfterDigRev The British Library has made a major leap forward in releasing three born digital archives within their Reading Rooms but migration of files into individual PDF/As has in many cases frustrated researchers who expect cross search functionality. [Pledge]

— UEA Archives (@UEAArchives) January 25, 2018

Users want to search across content.

Isabelle Le Pape (Bibliothèque Nationale de France): Contemporary Literature in the Web Archives: Access at the Bibliothèque nationale de France

Legal deposit in France stems from the medieval period .. web legal deposit from 2006, priority to bulk harvesting (heritrix) .. new GDPR https://www.eugdpr.org/ .. 29 billion sites in BnF web collection .. but, you have to go to reading rooms to use them ..

Learning from Isabelle Le Pape from @laBnF how forward thinking the French National Library is concerning digital legal deposit, access to collections, and web archiving. A model organization https://t.co/axhIMRJ93U #AfterDigRev

— Matt Huculak (@jmhuculak) January 25, 2018

Sustained effort at archiving has gathered researchs towards the collections.

Q&A

Does this change work with depositors? Locked down nature of the digital record.

Food for thought: « Many digital resources have LESS functionality than a medieval manuscript » (because of locked-down nature of digital files/legal restrictions/ etc.) Jonathan Pledge #AfterDigRev

— Matt Huculak (@jmhuculak) January 25, 2018

Conversation at #afterdigrev now focused on stats for online vs. physical visitors at national libraries and then smaller, academic repositories

— Amy Hildreth Chen (@AmyHildrethChen) January 25, 2018

Restrictions on access to the digital are creating an known-on effect of libraries/archives to open more local reading rooms ..

#AfterDigRev Fascinating discussion here about how neoliberal economic structures undermines cultural heritage institutions.

— Chris Prom (@chrisprom) January 25, 2018

(Lise Jaillant): The Cycle of Born-Digital Records: Process, Discover and Use

#AfterDigRev @lisejaillant created this project to bring scholars and archivists together to debate these issues. She has succeeded.

— Guy Baxter (@archivesgb) January 25, 2018

Interdisciplinary Projects and Networks

#### Nicola Wilson (University of Reading): Feminism and Labour at the Modernist Archives Publishing Project

The Modernist Archives Publishing Project (MAPP) .. critical digital archive ..

Now hearing from Nicola Wilson about the Modernist Archives Publishing Project https://t.co/TfDJGwEenU #AfterDigRev

— Jane Winters (@jfwinters) January 25, 2018

#AfterDigRev @Nicola_LWilson explores the “craftedness of the archive”. We’re very proud to be making the @UniRdg_SpecColl available

— Guy Baxter (@archivesgb) January 25, 2018

Data Model based around Drucker (2014) on book as a intersecting events, material et al, not a thing ..

Old scholarly model: Robert Darton "The Communications Circuit" 1982 -Read here: https://t.co/RzwdwvEgjP #afterdigrev

— Amy Hildreth Chen (@AmyHildrethChen) January 25, 2018

Looking a bookshop order books, transcribing and machine reading .. working on mapping to linked data .. building in more library/librarian voices (from an academic base)

Julie McLeod (Northumbria University): RecordDNA: Delivering a Research Agenda to Ensure the Future Survival of the Digital Evidence Base

The container is no longer the record .. instead we have granular, scattered, distributed, linked objects .. but there is a string from hardware to software that connect around an object, 'DNA' metaphor useful here ..

Fixity and uniqueness challenged in #borndigital records - more about liquidity/changability now. #afterdigrev

— Amy Hildreth Chen (@AmyHildrethChen) January 25, 2018

Julie McLeod is sharing the findings of the Record DNA project, which explored the concept of the record in the digital age https://t.co/ZcYgipYwcA #AfterDigRev

— Jane Winters (@jfwinters) January 25, 2018

.. ICT vision of records: driven by content, just stuff. Constrast with societal vision of records: connected with society and trust ..

#borndigital record risks: exponential growth, loss of intelligence on location of storage, corrupt/missing metadata, lost linked components, adapting to changing tech #afterdigrev

— Amy Hildreth Chen (@AmyHildrethChen) January 25, 2018

.. what research/practice is needed: trust (of machines, how it is manifested), ethics, how do we know what has gone, embedding data protection .. top ranked: personal data management and privacy (though all very close) ..

"How do you design the archive of the deleted, the missing & the unknown?" From the questions brought to Julie McLeod #afterdigrev

— Amy Hildreth Chen (@AmyHildrethChen) January 25, 2018

Serenity Sutherland (SUNY Oswego): Leveraging Digital Archive Experience for Employment and Future Project Implementation

Giving students in group projects proper titles is good pedagogical practice ..

Cited by @serenitys37: @mkirschenbaum's "Done: Finishing Projects in DH" https://t.co/BhIjEelBOg #afterdigrev

— Amy Hildreth Chen (@AmyHildrethChen) January 25, 2018

Q&A

Is saying that the basic unit of DH is 'the project' shooting ourselves in the foot: more need to be seen as services.

Peter Chan and Josh Schneider (Stanford University): Discovering and Accessing Email of Potential Historical or Cultural Value with ePADD

Free, open source software that runs in browser .. how can it help? Lots! eg, screen email for sensitive, restricted, or legally protected info (which can make donors more confident) .. ePADD phase 1 was 2013-2015, phase 2 runs until October 2018 .. Appraisal, Processing, Discovery, Delivery .. loads in mbox formats (can convert from pst to mbox) or can just use imap protocol to pull across an account .. lots of named entity processing baked in by default ..

Impressed with @StanfordLibs #EPADD. Tool for #archivists and special collections #librarians to manage email. Includes de-duplication, named entity extraction, viz & privacy controls https://t.co/d8oueZda7V Here’s Creeley emails: https://t.co/e7LqAyXdqr #AfterDigRev

— Matt Huculak (@jmhuculak) January 25, 2018

.. entity matching against DBpedia .. create/edit lexicons so you can sculpt categories that work for your collection ..

We are all spellbound by the mesmerising possibilities of ePadd for ingesting, appraising and adding metadata of all types to email archives. #afterdigrev

— Andrew Prescott (@Ajprescott) January 25, 2018

.. export options are many: eg, headers, so you can do analysis .. eg public archive http://epadd.stanford.edu/epadd/collection-detail?id=ePADD%20archive%20of%20Robert%20Creeley-Discovery - idea here is that although email content is redacted, you can decide from afar if you want to come to the archive and see the collection {this seems to me a killer app} ..

#AfterDigRev @e_padd Query Generator another super-useful feature: find emails matching named entities found w/in any text block.

— Chris Prom (@chrisprom) January 25, 2018

.. future work focus on better screening of sensitive information

It came up at #DPCEmail2 yesterday, it is coming up at #afterdigrev today: given the volume/complexity of email that archives need to review, there is a vital need to develop/research AI/machine learning approaches to sensitive review that fit the needs of archives.

— James Baker (@j_w_baker) January 25, 2018

Keynote (David McKnight, University of Pennsylvania): All together Now: Accessing Publishers' Archives in the Digital Age - A Modest Proposal

What are the impediments to access? Copyright, Privacy, Conservation, Geography .. Copyright: not absolute, donors can renegotiate or reverse a decision .. Privacy: archivist don't want to impose access restrictions, but donors may wish to do so. Reminding donors that researchers cannot publish verbatim without seeking copyright clearance from the donor (or their estate) is important .. Conservation: fragility is an impediment .. Geography: dispersal

Should we place hope in moving wall of copyright and embargoes, advances in conservation, and federation of online search?

See move from Penn in Hand to OPenn: from turning a page on a webpage to downloading machine readable data .. {that is, demands shift fast?} .. OPenn values utilitarian content and data over beauty and discovery, should it replace Penn in Hand? ..

David McKnight, Director of Rare Books and MSS @upennlib, shows OPENN, a platform of researchers to access cultural heritage objects & "machine-readable descriptive and technical metadata.” Excellent model for GLAMs #AfterDigRev https://t.co/ySnsh61FWg @upennlib

— Matt Huculak (@jmhuculak) January 25, 2018

.. 'More Product Less Process' model gets a nod {Mark A. Greene and Dennis Meissner, “More Product, Less Process: Revamping Traditional Archival Processing,” The American Archivist 68 (2005), http://www.archivists.org/prof-education/pre-readings/IMPLP/AA68.2.MeissnerGreene.pdf.}


Day 2

Using Born-digital Records

Paul Gooding, Jos Smith and Justine Mann (University of East Anglia): The Forensic Imagination: Co-Developing Interdisciplinary Approaches to Writers’ Born-digital Archives

British Archive of Contemporary Writing .. hooked around Doris Lessing archive .. "store-house" model of loan given that UEA doesn't purchase: can be returned at short notice, targeting archives of immediate use .. pushed to digital as working with archives much earlier in their career - opportunity to influence how writers are retaining their material ..

Justine Mann starts off day 2 of #AfterDigRev with an introduction to the British Archive for Contemporary Writing https://t.co/Yy7pO4R5md

— Jane Winters (@jfwinters) January 26, 2018

.. Naomi Alderman archive: send early drafts via word .. Jos Smith: genetic criticism (French tradition, post-structuralist, language speaks) - embedding play into literary research, less emphasis on resolution, finding out stuff .. can this be aligned with building per DH?

Gooding: the author is a node in a wider network including publisher, editor, proofreader, printer, bookseller, family, friends. This is equally true of the dependencies inherent in use of software packages by writers. #afterdigrev

— Andrew Prescott (@Ajprescott) January 26, 2018

.@pmgooding identifies three discrete stages for born-digital archiving – creating, curating, using #afterdigrev

— Jane Winters (@jfwinters) January 26, 2018
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

Yunhyong Kim (University of Glasgow): Preservation, Forensics and Intelligence: Co- development Promoting “Provably Beneficial” Knowledge

Or, "My Life as a One-Person Information Band" ..

Yunhyong offers an overview of digital preservation over the last 25 years: she sees 2003 as a point where we start to be overwhelmed by the digital #afterdigrev

— Andrew Prescott (@Ajprescott) January 26, 2018
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

Yunhyong compares the history of AI development and the history of archival theory. AI was coined in 1956 at the Dartmouth Meeting & when Schellenberg's Modern Archives updated the Dutch manual #AfterDigRev

— Amy Hildreth Chen (@AmyHildrethChen) January 26, 2018

Maria Castrillo (Senate House Library, London): Born-digital Records and Public Engagement Programmes

Exhibiting born digital can be inhibited by copyright (eg, in emulation) and data protection .. micro-site exhibitions as born digital records that can (should) be preserved

Q&A

If embedded in teaching materials and archive withdrawn, challenges: written into agreement that PhDs will still have access if ongoing and collection withdrawn .. we're learning so much so worry about withdrawal not - at present - a big issue .. big challenge in what we are discussing re archival theory: we kinda broke the idea of having series (per Schellenberg), what is the object is broken down, and here we have loss of deposit ..

.@UEAArchives are absolutely innovative to combine short term loans for authors with institutional affinities with MPLP patron-driven arrangement and description to get contemporary literature out for research and teaching. #AfterDigRev

— Amy Hildreth Chen (@AmyHildrethChen) January 26, 2018

.@archivesgb notes the breakdown of archival theory over the past day at #AfterDigRev: repository, series, and record challenged. He asks if this is a paradigm shift in our field?

— Amy Hildreth Chen (@AmyHildrethChen) January 26, 2018

Practical Solutions

Amy Chen (University of Iowa): The Problem of Interoperability: ArchiveGrid as an Archival Discovery Platform

“Born digital content requires thinking about users AND algorithms” when it comes to archival networking. #AfterDigRev @AmyHildrethChen

— Matt Huculak (@jmhuculak) January 26, 2018

Born Digital: large extent (but not physical space), content is not disaggregated, privacy (hard), condition (digital doesn't change but ability to use it does), access (emulation expenses but replicates habitat), location (we have to go to the archive - industry standard).

But this is thinking that doesn't connect particular archives we other archives, it doesn't put them in conversation .. algorithms need standardised metadata

ArchiveGrid is a discovery system for finding archives across the USA .. limited by inconsistent metadata, but all stuff squished together ..

Rough quote: "If we are going to have an international, linked-open data resource for archives, we need to address inconsistent practices concerning MARC & Finding Aids in archives. Algorithms need standardized metadata.” @AmyHildrethChen #AfterDigRev

— Matt Huculak (@jmhuculak) January 26, 2018

.@AmyHildrethChen now discussing ArchiveGrid, which includes data from over 1000 archival institutions https://t.co/b90N6ISTa5 #afterdigrev

— Jane Winters (@jfwinters) January 26, 2018

.. Eg, on Alan Ginzburg. Standard has the most stuff (by volume/size). And yet inconsistent metadata doesn't surface them to the top.

. @AmyHildrethChen hinting at a key topic: how better data structures and metadata consistency would drive better user experience in digital archives. #AfterDigRev

— Chris Prom (@chrisprom) January 26, 2018

Important point by @AmyHildrethChen: Amazon ux los fantastic to users by comparison with library and archive sites, but Amazon has data provided for it, so can concentrate on algorithmic potential. Libraries and archives have to prepare the data first. #afterdigitalrev

— Andrew Prescott (@Ajprescott) January 26, 2018

I think the main problem here is the variety of extent units allowed by archival 'standards': feet, photographs, and megabytes as different as apples, elephants, and hammers. #afterdigrev

— Chris Prom (@chrisprom) January 26, 2018

Gareth Cole (Loughborough University): Making Digital Objects FAIR: Findable, Accessible, Interoperable, and Reusable

Findable, Accessible, Interoperable, and Reusable .. making something human and machine readable at the same time is hard ..

So who should make data FAIR? Whose job is it? Huge human and financial resource question @DrGarethCole #AfterDigRev

— Amy Hildreth Chen (@AmyHildrethChen) January 26, 2018

What objects should have persistent identifier? Each collection? Each letter? Each e-mail? Fascinating subject. Current discussion about manuscript identifiers are probably not sensitive enough to complications of archives and of born-digital materials. #afterdigrev

— Andrew Prescott (@Ajprescott) January 26, 2018

.. we have identifiers, we have documentation, we have the things we need .. be social implementation is hard

Talking about data interoperability reminds me of how simplicity can be surprisingly difficult. #AfterDigRev

— Amy Hildreth Chen (@AmyHildrethChen) January 26, 2018
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

J. Matthew Huculak (University of Victoria): Building Library Infrastructure and Social Relations for Capturing Born Digital Scholarly Communication at the University of Victoria.

The Endings project. Based on University of Victoria .. DH Projects in Canada have a preservation mandate without enforcement .. One aim of the project was to look at what was left over from DH project: not a lot! .. Aim to build sustainable model for bringing projects to closure. Often because they never really end! ..

.@jmhuculak talking about a really interesting initiative to think about what happens when digital projects finish - the Endings Project https://t.co/nHa5ale7G3 #afterdigrev

— Jane Winters (@jfwinters) January 26, 2018

.. if every DH prototype is an argument, then their loss suggests that there are many bad projects out there .. one model is freezing a project via Archive-it / Wayback Machine ..

Is part of the problem in contemplating sustainability or ending of DH projects the fact that we make the front end so prominent in the digital edition or resource? Maybe we should just focus on the data (harks back to the Openn approach yesterday). #afterdigrev

— Andrew Prescott (@Ajprescott) January 26, 2018

..collaboration with the library a realistic way of preserving projects .. creating a Digital Asset Management System (using Stanford software that was Hydra) that projects can build their sites on top of.

Q&A

Idealisation of the prototype rubs up against reality of a need to preserve .. is DH work so much different to work in the sciences? .. need for cross-disciplinary standards .. humanities data is different? More likely to be an aggregation than a creation? .. Are the FAIR principles FAIR? .. whose interoperability do we mean? what should it mean? how interoperable do we want things?

The Case of Emails: From Preservation and Discovery to Use

Ruth Panofsky (Ryerson University): Accessing Born Digital Content in Archival Repositories

Literary scholar .. uses born digital ..

Literary historians may know that individuals, families, repositories, and archivists all shape archival silences but they do not consistently consider how gaps can be intentional in order to make a more friendly record #afterdigrev

— Amy Hildreth Chen (@AmyHildrethChen) January 26, 2018
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

.. email exchange differs extensively from letters .. subject line modelled after business ..

Email based on business memoranda - might be more conversational but it favors "precision and brevity." #afterdigrev So how does this change how we think about contemporary correspondence? More data set, less close reading?

— Amy Hildreth Chen (@AmyHildrethChen) January 26, 2018

Grateful to Ruth Panofsky for citing my TLS article on Ian McEwan's emails: https://t.co/f9uoDCYpJk #AfterDigRev

— Lise Jaillant (@lisejaillant) January 26, 2018

.. email threads complicated and tricky to navigate in print .. telephone calls an absense rather than a presence in the archive .. perception of lack of significance a risk to preservation.

James Baker (University of Sussex): Outlook: Email Archives, 1990-2007

Me! Deck

Chris Prom (University of Illinois at Urbana-Champaign) and Kate Murray (Library of Congress): The Task Force on Technical Approaches to Email Archives: Project Briefing

{See my notes from Email Preservation: How Hard Can It Be? 2}

Just in case you didn’t get it, @chrisprom reminds us #afterdigrev pic.twitter.com/XUNYCaI5a5

— Amy Hildreth Chen (@AmyHildrethChen) January 26, 2018

Email is extensible. So new functions - eg calendar - keep being added.

### Jane Winters (School of Advanced Study, University of London): Negotiating Web Archives: A Problem of Search?

Whether or not we like it, the archives we collect have changed. They are now digital. .. Web archives don't have any of the contextual information that informs Google search algorithms. So search is poor .. archiving webpage makes it 'reborn digital' (Niels Brugger) .. in the UK, web archived per web page .. In 2014: 2.5 billion page, ~5TB (inc GBs of viruses!) .. multimedia information missing, text has been privileged .. archived web pages patched together - 'temporal incoherence' - webpages that never existed made to look like a real page .. so, search is super hard .. new beta UK web archive search https://beta.webarchive.org.uk/ .. but super hard and dispiriting volume of responses, so most people do qualitative work ..

.@jfwinters notes problem of some @internetarchive page captures is that parts of page captured at different times—so historians have to be careful when viewing a page that may have not actually existed (frankensteined from different parts) #thepageisthemonster #AfterDigRev

— Matt Huculak (@jmhuculak) January 26, 2018

.. trends are hard to interpret due to duplication, but neologisms work well: chav, steampunk.

Andrew Prescott (University of Glasgow): Is Search the Right Way?

Deck .. Development of the web took place over a long period of time .. hierarchical form of the web curiously recalls the structure of the corporate archive: guides to good stuff on the web rather than search (eg, Yahoo) .. 'faced with the complexity of the web, Google met our expectations' more so - that is - than guides .. interdependence between the shape of the information and how we structure it: think back to biblical concordances, for example .. library catalogues important in shaping our ideas of how search would work: keyword .. but 'keyword enough' didn't take enough account of how important the underlying subject catalogue was .. C19 archivists were worried about a "sea of print", we worry about an "ocean of data" ..

.@Ajprescott shows powerful image from @guardian/ @AP to show how millions of emails can be visualized and reveal useful patterns (apparently tool adapted from Glastonbury festival viz.) https://t.co/Pxa5SykVpn #AfterDigRev

— Matt Huculak (@jmhuculak) January 26, 2018

.. AI is already - in part - embedded into our work: eg Old Bailey Online. So we shouldn't be scared of AI .. should we speak to the archive like we do Alexa?


Some admin...

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Exceptions: embeds to and from external sources, and direct quotations from speakers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment