Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Web Archiving Week, 12-16 June 2017

Web Archiving Week, 12-16 June 2017

Live notes, so an incomplete, partial record of what actually happened.

Tags: WAWeek2017

My asides in {}


RESAW (A Research Infrastructure for the Study of Archived Web Materials) Conference

Wednesday 14 June


Session 1A

Valérie Schafer & Francesca Musiani: Do web archives have politics?

Yes! Because somewhere like the BnF collections 'political events' but has to choose what is political .. choices of topics, hashtags et al are political choices of things that are political .. all archives are arrangements of power, so politics .. methods of access to get around the complexity of web archives tend to focus on web archives as big data: but we don't need to do this

Richard Deswarte: What can web link analysis reveal about the nature and rise of Euroscepticism in the UK?

Example is historian struggling to work with web archives: mixed methods approach - don't forget the qualitative .. timeframe of the web useful dovetails nicely with the rise of euroscepticism {but, is it a problem that all your quant charts/counts start with the start of the web and can't go earlier: historical analysis shaped/constrained by the source not the questions?} .. you need to know your context, non-UKIP on web

Tatjana Seitz: Digital desolation

90s homepages that are still alive .. much early web can be read as digital folklore (eg Geocities) .. look/feel/colours/gifs .. pluck the human out of the layout, the narrative it presented .. homepages have 'grammars of actions': designs that are designed to make us act in certain ways .. if we think in the logic of search we struggle to deal with early web pages .. Grab Them All plugin good for getting screenshots from a list of URLs .. not just about text but pictures

#WAweek2017 One rare conference where you can have a PPT with plenty of gifs & colors and be taken seriously, as colleagues have the same :)

— Valérie Schafer (@valerie_schafer) June 14, 2017

Ralph Schroeder: Web archives and theories of the web

Analogy with newspapers: historians don't want the whole thing, they want what is important .. we want a web archive that enables history from below that captures what people got from the web, 'web audiences studies' .. little in terms of 'theories of information seeking' .. what did people get from the web that they didn't/couldn't get from any other medium? .. user studies, but attention, information source - passive

tweets from other sessions

"We need less English language web archive content!" Jefferson Bailey presenting current work of @internetarchive #WAweek2017

— Naomi Wells (@NASWells) June 14, 2017

. @jfwinters stresses the importance of the role of oral histories in the history of technology #absolutely #WAweek2017

— Sally Chambers (@schambers3) June 14, 2017

Session 2A

Peter Webster: Utopia, dystopia and Christian ethics: early religious understandings of the web

How people ceased to tell their life stories in religious terms .. but lots of religious language lingers .. 1996-1999: burst of writing around spiritual implications of WWW .. Dystopia relied on reaction to fear of imposition of someone else's utopia .. use writing about the web and writing about the writing about the web .. 'Space' significant metaphor for the web ..

#WAweek2017 @pj_webster has given us a reading list of people talking about the web in religious terms. Thanks Peter!

— James Baker (@j_w_baker) June 14, 2017

.. feeling of web has having an ontological reality .. presence of sense that cyberspace has a soul .. pioneers / founding fathers histories of the WWW have a religious, priestly tone .. cyberspace pitched as a world that we have made (Jennifer Cobb, Cybergrace (1998)) .. not saying these are influential; more orthodox Christians less engaged with the idea that the web was a manifestation of god, conscious, working towards autonomy et cetera .. dystopian visions made a distinction between what machines/computers could do and what they should be allowed to do .. christian principles brought to bear on ethical discussion of computing .. ethical discussion of web infused with religious language.

Fascinating presentation from @pj_webster on religious understandings of the web - was not expecting talk of teilhard de chardin #WAWeek2017

— Jane Winters (@jfwinters) June 14, 2017

Sally Chambers, Peter Mechant, Sophie Vandepontseele & Nadège Isbergue: Aanslagen, Attentats, Terroranschläge: developing a special collection for the academic study of the archived web related to the Brussels terrorist attacks in March 2016

Web archiving in Belgium in motion .. PROMISE project will work on pilot access and use of Belgian web archive ..

. @schambers3 is now giving a presentation about #webarchive and the #BrusselsAttacks in March 2016. Sadly familiar... #WAweek2017

— Dépôt légal web BnF (@DLWebBnF) June 14, 2017

.. Royal Library of Belgium: problem of web being published but because it has to be collected rather than deposited (legal deposit library) then it had to be added to the mission statement of the library to justify collection ..

@schambers3 discussing new Belgium WA program - DNS Belgium has lots of data about the Belgium web! #WAweek2017

— Abbie Grotke (@agrotke) June 14, 2017

Sally Chambers talking about archiving the Belgian terrorist attacks - starting from just 10 seeds they collected 60GB of data #WAWeek2017

— Jane Winters (@jfwinters) June 14, 2017

Lucien Castex: The web as a memorial: real-time commemoration of the November 2015 Paris attacks on Twitter

Post-mortem digital identities .. studying grieving online .. challenges with fine grain, minute-by-minute Twitter based research: correct times, capturing links, accurate geolocation ..

Castex: growing challenges of Twitter-based research: misidentification of language, date problems, conversations hard to follow #WAWeek2017

— Jane Winters (@jfwinters) June 14, 2017

.. temporal analysis help us see the switch from private to public to commemorative communications via Twitter; or information, to helping, to commemorating, to reconstruction .. lots of ethic issues around using post-mortem data

Very interesting presentation by @LucienCastex about an analysis of Tweets around the Paris terrorist attacks #WAWeek2017

— Sally Chambers (@schambers3) June 14, 2017

Gareth Millward: Lessons from ‘Lessons from failure with the UK Web Archive’ – the MMR crisis, 1998-2010

MMR (Measles, Mumps, Rubella) vaccine ..

.@MillieQED demonstrating how 'documentary historians', broadly defined, can integrate web archives into their research #WAWeek2017

— Jane Winters (@jfwinters) June 14, 2017

.. web is a public cited as a problem by health professionals during the crisis .. use .gov pages on MMR to gauge what people felt and what government felt people felt .. 'the live web and the old web coming together'

#waweek2017 @MillieQED I am not a "web historian" but to do the history of the 1990s onwards, I need webarchive !

— sophie gebeil (@SophieGebeil) June 14, 2017

Great presentation by @MillieQED about how he used web archives alongside other sources for his research on the MMR crisis #WAWeek2017

— Sally Chambers (@schambers3) June 14, 2017

tweets from other sessions

Capturing the Web at Large
A Critique of Current Web Referencing Practices #WAWeek2017 - this is for #wikicite too

— Raffaele Messuti (@atomotic) June 14, 2017

Session 3A

Chris Wemyss: Tracing the virtual community of Hong Kong Britons through the archived web

Focus on Brits in Hong Kong either side of 1997 handover .. sharp decrease in British population after 1997 .. 2001: 18-21k .. but significant community as they bridged colonial and decolonial world, more than just an expat class of business people et al .. aspects of virtual community to their presence that borders on diaspora .. use of three example websites to understand phenomena: using Wayback Machine to look at changes over time .. {combination of analysing web archives with oral histories with those who created and used websites} .. sites that tried to build something rather than just be a platform for ephemeral conversation .. if a website doesn't change in 16 years what do we make of it?

David Geiringer & James Baker: The home computer and networked technology: encounters in the Mass Observation Project archive, 1991-2004

Q&A: similar/same phenomena with typewriters? what do people say about archiving their lives in digital age? (anything on this beyond paper by Microsoft Research?) {more on studies and rooms in relation to class}

Thursday 15 June


John Sheridan, Web archiving the government

Archiving government on the web .. TNA as the archive of government ..

.@johnlsheridan: #webarchiving experience has shown TNA it's future as an archive #WAweek2017

— Sara Day Thomson (@sdaythomson) June 15, 2017

.. government looking at us, trying to understand us; TNA gives researchers the opportunity to see what the state saw .. Doomsday Book: arguably the UKs first dataset .. 20 year web archive, 100 TB, 2k websites .. 'we care about the records of government that government is publishing' .. web has changed what is the published record, complicated what was a simple division between published stuff at the BL and unpublished at the TNA .. collect web at TNA for context .. release of born-digital records under the 20 year rule in coming .. narrow, depth, not a large crawl .. lots of effort to quality assurance of crawl and replaying the content ..

Archived snapshot of the UK govt's EU Referendum website #WAWeek2017

— Jane Winters (@jfwinters) June 15, 2017

.. {emphasis on care/attention/patience in content capture} ..

UK web archive asked site owners to put in redirects to the archive when sites or pages were taken down #WAweek2017

— Abbie Grotke (@agrotke) June 15, 2017

.. what is the user need of a web archive? .. based around user stories: eg, someone wants to link to a web archive knowing that it was something government said a while ago; wikipedia user trying to substantiate a point; legal compliance angle; researcher needing to find stuff and identify trends (serving this need poorly now due to limitations of search); government wanting to use its own archive so it can better understand what it said in the past.

Fascinating talk by @johnlsheridan on the UK gov web archive.Interesting to see their work on user needs #WAweek2017

— Maria Ryan (@Maria_Brid) June 15, 2017

Sheridan: user need sometimes just 'because democracy' #WAWeek2017

— Jane Winters (@jfwinters) June 15, 2017

.. surprising number of takedown requests for a small government web archive (given that the archiver is part of government) .. point is that there is a difference between being an archiver and a publisher .. from 1 July TNA Web Archive content will be migrated to the cloud to improve replay, search, presentation (partners Mirror Web) .. moving to Python WayBack to replay collection - need to maintain preservation of look and feel, and ensure interoperability .. active research organisation .. have been collecting paper, from now on born digital records .. Jenkinson (1922) Manual of Archive Administration - mind you should have with respect to what to save and how to describe them - selection key .. archival catalogue (hierarchically organised) is not what you'd design today .. moving beyond the simulation of the physical .. temptation to retrofit order onto chaotic digital records .. so, back to respect des fonds ..

Sheridan: the Web Archive is much more chaotic than traditional archives, and that's part of why it works. #WAWeek2017

— Jane Winters (@jfwinters) June 15, 2017

A view of second digital archives from TNA's @johnlsheridan #WAweek2017

— Garth Stewart (@GarthStewart1) June 15, 2017

#waweek2017 Back in March I tweeted at length about the @UkNatArchives Digital Strategy. In sum: it is great

— James Baker (@j_w_baker) June 15, 2017

.. from written textual descriptions to data orientated description

Plenary Panel - Web 25: histories from the first 25 years of the World Wide Web

Elisabetta Locatelli, The role of the Internet Wayback Machine in multi-method research project

Locatelli discusses studies into how people use social media #WAWeek2017

— Sara Day Thomson (@sdaythomson) June 15, 2017

Seeing blogs as a cultural artefact requires us to see them as cultural products with technological, economic, institutional, and cultural dimensions .. mixed methods - interviews, analyse content, examine design - helps overcome mix of permanence and ephemerality.

Interesting presentation by @DonnaBetta on the emphemerality of blogging and the relationship with web archives #WAWeek2017

— Sally Chambers (@schambers3) June 15, 2017

It creeps in every now and again, but there's a key issue with web use in the 1990s we aren't tackling head on.


— Gareth Millward (@MillieQED) June 15, 2017

Matthew Weber

#Webarchiving is folly ! #WAWeek2017

— Dépôt légal web BnF (@DLWebBnF) June 15, 2017

We talked about digital skills training yesterday, but @docmattweber raising problem of insufficient training in ethics #WAWeek2017

— Jane Winters (@jfwinters) June 15, 2017

Major issue for researchers & archivists working with user-generated data: de-identifying protects inds but makes data useless #WAWeek2017

— Sara Day Thomson (@sdaythomson) June 15, 2017

Federico Nanni, The Changing Digital Faces of Science Museums

Wanted to understand: what was the changing role of the website in the work of three science museums? .. phases: started with leaflet websites, moved onto virtual museums, to outreach, to social .. {is this just a move from museums doing something specific with web design, but then gradually moving to generic web design practice?}

Hearing about challenges of web archive work in Italy without national web archive and supporting structures for expert advice #WAweek2017

— Naomi Wells (@NASWells) June 15, 2017

Session 5A

Harry Raffal: Tracing the online development of the Ministry of Defence (MoD) and the Armed Forces through the UK Web Archive

Websites either side of the invasion of Iraq (2003) .. looking at home pages (not splash pages) .. we need to understand the integration of responsive design when analysing websites as historical sources .. care about how developers have tried to understand how users use websites .. need to analyse layout/form to study the purpose of websites .. when we do look at text, key language he focus here: for example, move from language of 'career' to 'joining' in main navigation. But career remains part of the overall message, the wider literature .. army has always tried to be interactive, even before social media: medium changed not how they are going about recruitment .. site shows awareness of need to respond to the news ..

{putting out a corporate image .. 'the army' .. periodical studies trying to avoid 'the Times' or 'Mr Punch' .. who are these people? Are they outsourcing media/comms?}

Ian Milligan: Pages by kids, for kids: unlocking childhood and youth history through the GeoCities web archive

The Enchanted Forest (c. 1996-1999) as an experiment in online publishing .. sites by kids for kids but patrolled by adult volunteers, but with a dark side .. What can we learn? 1 - how kids connected with each other; 2 - scale at play in age of web historiography .. late-1997 200k homesteaders aged 3-15 (of 1m) .. move away at the moment from focus on wen histories via corporations .. childhood and grandparents are likely to be super interesting topics of histories that use web archives: you don't need as many filtered sources, you can hear from the children and grandparents with less mediation .. awards are a key to understanding GeoCities: helped enforce community standards .. as did webrings .. the point is, we have access to new voices and new ways of finding them .. top Enchanted Forest sites turned into part of the anti-Yahoo protest movement as it was clear Yahoo was letting GeoCities die .. GeoCities is fairly heavily policed in the context of pornography panic of late-90s, fear of kids having access to things they shouldn't .. ethics: Ian not doing work people you can identify; not giving full quotes of some sensitive pages because it will be searchable soon

... to the Enchanted Forest and Kids in Geocities with @ianmilligan1 ...

— Valérie Schafer (@valerie_schafer) June 15, 2017

Niels Brügger, Ditte Laursen & Janne Nielsen: Methodological reflections about establishing a corpus of archived web. The case of the Danish web 2005-2015

How to identify a corpus in an archive? ..

Janne Nielsen on methodological challenges on delimiting a corpus for studying web archives #WAweek2017

— Anat Ben-David (@anatbd) June 15, 2017

A set of perplexing findings about the size of the .dk results. Shows how complicated web archiving can be! #WAWeek2017

— Ian Milligan (@ianmilligan1) June 15, 2017

.. deduplication central to making a corpus to work with .. and doing this is super hard! And depending on what you do, create corpora with different biases

Some admin...

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Exceptions: embeds to and from external sources, and direct quotations from speakers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment