Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Email Preservation: How Hard Can it Be?, National Archives, 6 July 2017

Email Preservation: How Hard Can it Be?, National Archives, 6 July 2017

Live notes, so an incomplete, partial record of what actually happened.

Tags: dpc_email

My asides in {}

Stream/Deck: http://dpconline.org/events/past-events/email-preservation-2017


Talks

1040 – Introductory talk with Chris Prom, University of Illinois Urbana-Champaign and Kate Murray, Library of Congress.

Chris leads Archival Connections Project. Three goals:

  1. To evaluate, test, and implement the wide range of methods, data models, and tools that can help archivists and librarians complete the essential tasks of archival acquisition, arrangement, description, and access for born-digital materials.

  2. To foster collaborative research by archivists, functional experts, subject librarians, and teaching faculty at the University of Illinois, to inform the design and implementation of these emergent tools.

  3. To increase collaboration and sharing between the many external standards groups and technology development projects that seek to cope with the advent of ‘cloud-based’ archival materials and records.

Goals for the Mellon/DPC taskforce: survey what is being done, create a framework, make an agenda .. Not legal/policy work .. about technical solutions .. focus on what is different about email? .. not building tools/services .. make case that email is a cultural document worthy of preservation .. email as habitat ..

quick case studies of 'rescuing' an email archive of Nobel scientist - a disk image that included an image of an earlier disk #dpc_email

— William Kilbride (@WilliamKilbride) July 6, 2017

Email is an object, several objects & a verb says Email Pres. Task Force. All adds to the complexity of preserving email data #DPC_email

— Edith Halvarsson (@EdithHalvarsson) July 6, 2017

.. Email super interoperable .. compatibility across systems makes it work .. but archivists get to work with what is leftover at the end as part of a process of transformation between sending and receiving .. on authenticity: archives can control what they get but can't deal with what happens before that .. how people use email - eg use deleted emails as an archive!!! - can often be unique to them (and not seem odd because they don't know much about the practice of others) ..

Great to hear Google are keen to respond to digital preservation issues re gmail - bodes well for my Google Drive work #dpc_email

— Jenny Mitcham (@Jenny_Mitcham) July 6, 2017

1130 – Case study 1: Collecting email archives with Jonathan Pledge, British Library

Since 2000 .. pioneered use of digital forensics .. move to curator lead born-digital work from May 2015 .. using DROID .. creating PDF-As accessable via a FTP server in the Reading Rooms .. John Berger archive has .pst files; extracted .msg ..

#dpc_email Jonathan Pledge - Using Aid4mail Forensic and ePADD - as core technology in workflow

— Tim Gollins (@timgollins) July 6, 2017

.. email is personal and so UK Data Protection kicks in and so lots of manual checking and most can't be made available anyway (because whole threads of emails are removed if one email is deemed too personal)

#dpc_email Jonathan Pledge - Now discussing DPA 1998 (processing personal data) - this brings challenging issues - results in closed records

— Tim Gollins (@timgollins) July 6, 2017

1155 – Case study 2: Email and the record of government with Anthea Seles, The UK National Archives, and Greg Falconer, UK Government Cabinet Office

The challenge around email isn't preservation, it is capture (choosing what to get and what can be got) and presentation (meaningful to researchers, even those who want to compute over it) .. email capture in organisations is subject to bloat .. finding good ways to present email can help both users and things like sensitivity review .. looking at ePADD to deal with accessibility .. can archives use email archives to understand what was important to government at a particular moment and therefore organise capture around the themes that emerge? .. Capstone: created by NARA as a concept, capture senior management level stuff

agreed - #dpc_email (to put back on #tag) - as ever recording the basis of selection of the collection is key to future interpretation. https://t.co/AAMYRF449j

— Tim Gollins (@timgollins) July 6, 2017

1220 – Case study 3: Michael Hope, Preservica Email Preservation

#dpc_email now Michael Hope from @Preservica - asserting that some issues of preservation have been addressed

— Tim Gollins (@timgollins) July 6, 2017

Preservica's approach to email preservation looks neat! #dpc_email

— Jenny Mitcham (@Jenny_Mitcham) July 6, 2017

1400 – Email Task force themes with Kate Murray, Library of Congress, and Technology Road map with Chris Prom, University of Illinois Urbana-Champaign

We're still on the upswing around what's coming with email ..

#dpc_email - @chrisprom & @fileformatology - Interesting that curation tools are driving need to "pre-migrate" email to enable processing

— Tim Gollins (@timgollins) July 6, 2017

Tools: ePADD, BitCurator, DArcMail, then proprietary stuff such as Access Data FTK and Preservica and Emailchemy .. MBOX as preservation target format .. ePADD just connects entities and doesn't disambiguate .. ePADD can be used by the donor to redact stuff they don't want archived ..

1530 – Review and discussion, chaired by William Kilbride, DPC

GDPR regulations will have a big impact across Europe; might be easier for risk averse organisations to delete than retain; so the email archives we are talking about might not exist!; need for fine grained redaction until the end of someone's life; chain of processing/custody needs to be documented; comes in May 2018 .. use of email client API to capture email during use rather than at the end of use ..


Some admin...

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Exceptions: embeds to and from external sources, and direct quotations from speakers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment