Skip to content

Instantly share code, notes, and snippets.

@drjwbaker
Last active June 3, 2022 12:46
Show Gist options
  • Save drjwbaker/18acf7e2eb0726ed913db1fe2146b5b3 to your computer and use it in GitHub Desktop.
Save drjwbaker/18acf7e2eb0726ed913db1fe2146b5b3 to your computer and use it in GitHub Desktop.
2022-05_DH-Benelux-notes.md

DH Benelux 2022, University of Luxembourg, 31 May - 3 June 2022

Live notes, so an incomplete, partial record of what actually happened. Also, some material presented at this event was not intended to be shared, so notes may be patchy

Tags: DHBenelux2022

Site: https://2022.dhbenelux.org/

Presentations and abstracts: https://zenodo.org/communities/dhbenelux2022/

My asides in {}

Tweet embeds are things I liked that seemed relevant to include or that captured things I missed


Tuesday (Day 0: pre-conference workshop)

Morning

Track 2 (MSA 4.330 – cap: 20) Creating Data and Workflows for Humanities Research (Lorella Viola and Sean Takats)

{my notes here}

Framing of the session around the @DHARPAproject.

What do we think a research workflow is?

  • from inputs to outputs
  • from start to end.
  • reflexive.
  • often hidden; needs to be visible.
  • relational: depends on your domain, your influences, where you publish.
  • should be self-contained, but usually isn't.

A processing pipeline that takes inputs, produces outcomes, and consists of several modules (an atomic part of the workflow that does a single thing [e.g. read in a csv file]).

DH gets critiqued for being a black box. But that isn't a DH problem. It is a humanities problem.

Why did we register?

  • sustainable data collections.
  • improve documentation.
  • improve the backend of research.
  • support ECRs in developing transparent workflows.
  • supporting non-scholars in using workflows.
  • transparency and reproducibility.

DHARPA responding to the "graveyard" of VRE projects.

making workflows

  • human checking in the loop.
  • workflows that feed back into themselves.
  • {we talked through Beyond Notability and tried to describe its workflow .. comparable project is WeChangEd}
  • modelling uncertainty and decision making.

DHARPA

  • kiara as a data orchestration engine.
  • data isn't cleaned, it is moved from one state to the next.
  • kiara is a python module
  • gives us ability to tell us the lineages of data that we have.
  • creating data objects for steps, and remembering the relationship between bits of data that have been created.

Track 1 (MSA 2.220 – cap: 30) Greening Digital Humanities James Baker, Jo Lindsay Walton, Lisa Otty, Christopher Ohge and Lea Beiermann


Wednesday (Day 1)

Morning

9:30 – 10:30 Opening Keynote: Modeling and investigating variation in language use from a communicative perspective: methods, challenges, and types of evidence Stefania Degaetano-Ortlieb (University of Saarbrücken)

Studying language is about looking for both regularities and variations .. communication through language is rationale: we strive for successful communication using reasonable effort .. efficiency shapes human language .. methods for modelling language variation can be both too theoretically motivated and shallow, neither of which are strong on integrating context into the study of language change .. instead: language use of situational (register theory) and usage based communication

Literary Studies scenario. Work with Andrew Piper: hypothesis of scientization of (the language of) literary studies since the mid-20th century - analysis suggests that the language of literary studies diverges from both standard English and scientific papers .. divergence of (the language) scholarship from past scholarship (say, 40 years ago) profound in ways we don't see in standard english: this is the process of specialisation .. further method of analysis: how predictable is language (e.g. "Jane reads a" has a more predictable next word than "Jane buys a") - we find greater standardisation (which, as we see before, diverging fro standard english)

Scientific lit scenario. Analysing over change of time shows both specialisation and conventionalisation. Using methods of divergence in 20/30 year blocks, see rolling change over time.

Old Bailey Corpus. Women influencing language change, especially in informal settings.

Session 1.01 PRACTICES AND TOOLS – Chaired by Frédéric Clavert (C2DH) [11:00-12:30]

11:00 – 11:15 A two-way street between AI research and media scholars [long] Rasa Bocyte, Philo van Kemenade and Johan Oomen(@)

AI4Media Consortium .. published a roadmap ("Roadmap on AI technologies and applications for the Media Industry ") with a section on AI for Social Sciences and Humanities Research .. use case example: matching parts of media using audio fingerprinting of audio signal to understand their overlaps .. CLARIAH Media Suite

11:15 – 11:30 Developing Data Stories as Enhanced Publications in Digital Humanities [long] Willemien Sanders(+), Roeland Ordelman, Mari Wigham, Rana Klein, Jasmijn van Gorp and Julia Noordegraaf

Stories based on quantitative data .. CLARIAH Media Suite: 91 datasets from 62 orgs .. it is a platform for annotating, making collections, et al, but there is also a lot there, so quant approaches sometimes needed .. storyfinding and contextualisation as a reflexive/iterative process .. we need data stories, because money is often spent (e.g. by governments) on making data but money is rarely spent in explaining how the data was created, should be used, context of production etc .. Scalar can be seen as a platform for telling data stories ..

11:30 – 11:40 Open Science and Linked Data to Create a FAIR Corpus of IntraBelgian Literary Translations 1970-2020 [short] Sven Lieber(+) and Ann Van Camp(+)

performed work is often undocumented and thus unrecognised .. BELTRANS project: looking at intra-Belgium book translations, focusing on translation flows .. trying to make process FAIR, not just data ..

https://github.com/kbrbe/beltrans-data-integration

11:40 – 11:50 Extracting and providing online access to annotated and semantically enriched historical data. The AGODA project [short] Marie Puren(+), Pierre Vernus, Aurélien Pellet and Nicolas Bourgeois

Word embeddings (word2vec etc): words that appear in similar contexts will have similar vectors, and therefore cluster together.

11:50 – 12:00 Moving beyond tool-oriented teaching within Digital Humanities: The challenge of appropriating the CLARIAH Media Suite into tool-supported [short] Susan Aasman(@), Berber Hagedoorn(@), Sabrina Sauer(@), Iris Baas and Lisenka Bakker

Encourage students to use tool use to develop a more reflexive approach to research methods/processes .. deGoogling students - how to encourage them to see the value of an archive like CLARIAH Media Suite, rather than just assume they can find anything on the net .. intuitive search doesn't really work in environments like the Suite, but that is okay, because students learn - serendipitously - through this mismatch between search skills with the infrastructure .. students learn to see search as a craft (and educators learn things about their research process, things they need to communicate) ..

Session 1.02 CONTEMPORARY DH – Chaired by Valérie Schafer (C2DH) – [13:30 14:30]

13:30 – 13:45 A network analysis of Wikipedia editors’ engagement with History: Interests, Identities, Power, and Hierarchy [long] Petros Apostolopoulos(+)

network analysis to show us patterns that illuminate the motivations editors have for taking part .. editors the nodes, pages the edges .. network viz by topic .. a small number of editors make most edits, and the (vast) majority of those declare they are interested in these topics (on their wikipedia page) rather than general interest in history, so topic interest drives editing .. most central actors are bots, whose work is essential to the production/stabilisation of knowledge of wikipedia .. {I wonder if having a profile page on Wikipedia is a good proxy for being part of the community, and - maybe - a better proxy for experience than number of pages edited}

13:45 – 13:55 “Dragen van mondkapjes niet nodig is” “mondkapjes zijn verplicht” How the Netherlands dealt with the first wave of the COVID-19 pandemic [short] Iris Baas(+), Tommaso Caselli and Marc Esteve Del Valle

This #DHBenelux2022 presentation on the Dutch response to #Covid19 is so fascinating, it feels like a hundred years ago and yesterday, but it's like an advertisement for documentation and having crisis management plans set up in anticipation of public health breakdowns.

— Merisa Martínez (@merisamartinez) June 1, 2022

13:55 – 14:05 Challenges on the promising road to Automatic Speech Recognition of privacy-sensitive Dutch doctor-patient consultation recordings [short] Berrie van der Molen, Cristian Tejedor-García(@), Henk van den Heuvel, Roeland Ordelman, Toine Pieters, Sandra van Dulmen and Arjan van Hessen

ASR has improved a lot, but has significant privacy issues .. medical terms are often mispronounced, and terminology often has multiple spelling variations

Session 1.03 DIGITAL HISTORY – Chaired by Thomas Smits (UAntwerp) – [15:00 – 16:00]

I had to miss this session to take a call :(


Thursday (Day 2)

Morning

Session 2.01 LITERARY STUDIES: DUTCH LITERATURE – Chaired by Marijn Koolen (KNAW) – [9:00 – 10:15]

9:00 – 9:15 Defying Expectations: Stylistically Unconventional Anger in a Contemporary Dutch Literary Novel [long] Julia L. Neugarten(+), Lisanne M. Van Rossum(+) and Joris J. Van Zundert(+)

Kolmogorov complexity: measure of complexity (it is a compression algorithm that measures of a information by virtue of how easy it is to compress: e.g. aaaaaaaaa is 10a) .. exploring the relationship between complexity, anger, popular reception, and literary prestige .. what are the stylistic markers of anger in fiction beyond content words.

9:15 – 9:30 Using CLIP to extract and analyze images of the family in 3,000 Dutch-language children’s books, 1800-1940 [long] Thomas Smits(+), Paavo Van der Eecken(+) and Vanessa Joosen

Representations of family in children's literature .. marked by (conservative) continuities despite changing notions and realities of family life .. 22k images in 3k books .. CLIP: strong at connecting image and text without label data; turns both images and text into embeddings/vectors; then looks for cosine similarity. Good performance on non-photographic images (so, generalisable). But problems of cosine threshold is a classic precision/recall problem .. CLIP yields better results with contextual prompts: "an image of a family" better than "a family" .. surprise results of the model (e.g. little girls looking after dolls or other children) come back as "family" which indicates visual language of comparable .. use the results of incomplete models and results as starting points for new work .. but we as DHers do need standards to measure performance in ways that work for us

9:30 – 9:45 A Distant Reading of Gender Bias in Dutch Literary Prizes [long] Noa Visser(@), Andreas van Cranenburgh and Dong Nguyen

Perceptions of literary quality aligned with (white) men ..

Noa Visser at #DHBenelux2022 on the gender equalities in Dutch literary prizes. 10% female winners, 90% male winners. Not something we didn’t know, but it baffles every time, doesn’t it? https://t.co/mVHEh2JWo3 Noa has chosen an important mission.

— Joris van Zundert (@jorisvanzundert) June 2, 2022

.. nominated and not-nominated novels are computationally distinguishable .. easier to classify nominated novels as written by men .. nominated and not-nominated novels have clusters of topics associated with them: e.g. health related topics cluster to nominated books written by men and not nominated novels written by women {!!}

Session 2.02 LITERARY STUDIES: AUTHORS – Chaired by Joris van Zundert (KNAW) – [10:45 – 12:00]

10:45 – 11:00 Claudine at the workshop: the impact of Willy and his secretaries on Colette’s writing [long] Florian Cafiero and Marie Puren(+)

1890s to 1920s: industrialised/workshop literature production under the nom "Willy" .. hard to know who wrote what within the authoring stable .. co-authors often hidden .. use people who went on to be named authors after collaborating with Willy, and do a rolling stylometric analysis (of slices of 3000 words) to measure authorship ..

11:00 – 11:15 The Shape of Doubt: Employing data visualization to investigate stylistic features in the narrative works of Italo Calvino [long] Margherita Parigini(+) and Tommaso Elli

The problem of transposing a messy analysis (e.g. the way Calvino uses negation) into data .. manual annotation ..

Yeah, love this question from Margherita Parigini #dhbenelux2022 https://t.co/Fn6p9E2ajj pic.twitter.com/u6Hbg3bvwR

— James Baker (@j_w_baker) June 2, 2022

.. change over time as interpreted through the data makes sense in the context of Calvino's literary career and scholarship ..

{this is really insightful analysis: abstract}

11:15 – 11:30 Un-mixing the re-mix. Publishing the complete manuscripts of Anne Frank [long] Peter de Bruijn, Ellie Bleeker(@), Marielle Scherer, Peter Boot, Karina van Dalen-Oskam, Ronald Haentjens Dekker, Gijsjan Brouwer and Bas Doppen

New digital edition .. bring work together in context ..

Afternoon

14:30 – 15:15 Zortify Round Table: Hybrid Knowledge: New Insights in Augmented Intelligence for Human Decision-Making. Evangelia Markidou [EC, robotics + AI], Marietjie Botes [digital/bio ethics], Anke Joubert [AI + data at Deloitte], Christopher Morse [Zortify], Katya Kamlovskaya [Zortify]. Moderator: Frederic Clavert

European Commission Co-ordinated Plan on Artificial Intelligence: trustworthy AI .. far away from general AI, but interested in collaborative AI .. machines ethically augmenting human decisions .. much of the frameworks for digital ethics are based on bio ethics .. we need explainable (reflexive?) AI

Panel here are saying things about Explainable AI that are drifting towards the notion of AIs explaining their decision making, and I'm suddenly very keen on having AI(ish) things that are reflexive (e.g. write their own documentation to accompany their outputs). #dhbenelux2022

— James Baker (@j_w_baker) June 2, 2022

Sure, the AIs would fib *massively*, but still wouldn't it be nice if DALLE painted a picture and at the same time wrote a description of why it "thinks" it painted that picture.#dhbenelux2022

— James Baker (@j_w_baker) June 2, 2022

Session 2.03 DIGITAL HISTORY – Chaired by Machteld Venken (C2DH) – [15:45 17:20]

15:45 – 16:00 User demand for supporting advanced analysis of historical text collections [long] Max Kemman(@) and Steven Claeyssens(+)

Lack of user studies of tool use in this area .. clear demand for discovery/selection, but no clear demand for analysis in a text suite (too much variety of data, too many tools [and advanced folk just want to do the analysis themselves] lack of broad application) ..

Should @KB_Nederland develop a digital research environment for historical text collections bridging the gap between simple search interfaces like @DelpherNL and advanced functionalities for (text) analysis? Report @dialogic_nl @MaxKemman #dhbenelux2022https://t.co/NwYpBLdYcJ

— Steven Claeyssens (@sclaeyssens) June 2, 2022

.. recommend KB work on discovery/selection not analysis.

And the answer is, no.. Which is interesting, because @Jisc + @UkNatArchives + @ProgHist are collaborating to produce tutorials for analysing large-scale collections to plug the gap between 'researcher gets data' & 'researcher analyses data' https://t.co/3banUaFXvr #dhbenelux2022 https://t.co/yhTYLBd9rM

— James Baker (@j_w_baker) June 2, 2022

I agree. We recommend that "supporting advanced analysis" should consist of educational support through tutorials and documentation rather than technical development of a software platform #DHBenelux2022

— Max Kemman (@MaxKemman) June 2, 2022

16:00 – 16:15 Exploring the History of Digital History: Setting an Agenda [long] Gerben Zaagsma(+)

Three important centres: more mathematical in eastern europe, more social science in western europe, more public history in north america ..

@gerbenzaagsma just presented a positively exciting and multi-perspective agenda for the research of the history of digital history. #DHBenelux2022: https://t.co/weDcwLATPg

— Joris van Zundert (@jorisvanzundert) June 2, 2022

16:15 – 16:30 Historic machines from ‘prams’ to ‘Parliament’: new avenues for collaborative linguistic research [long] Mia Ridge(@), Giorgia Tolfo, Kalle Westerling, Nilo Pedrazzini and Barbara McGillivray

Key questions: what did machine mean in the 19th century? what was a machine?.. combined crowdsourcing as collection engagement with crowdsourcing as building research data .. volunteers learn about the role of training data in AI .. narrowed down from 28 OED notions of machine to 5, to enable volunteers to classify the type of machine they find ..

16:30 – 16:40 A clash of colorful worlds. Distant viewing the use of color in Western visual representations of the orient and occident, 1890- 1920 [short] Thomas Smits(+), Melvin Wevers and Eleonora Paklons(+)

Can we study the historical experience of colour? .. occident/orient: occidental notion of orient very colourful .. new visual mass media in this period mediated the experience of colour .. 6.5k photochromes (14 colours added in production) vs 65k autochromes ('actual' colour) .. In contrast to previous studies, hard(er) to maintain orientalist aesthetic in documentary medium of photography

16:40 – 16:50 Cutting history at its joints; a computational approach to periodizing the history of a concept [short] Hugo Hogenbirk(+)

Auto periodisation of a lexical concepts (like the narrative of a fotoball match) ..


Friday (Day 3)

Morning

{yeah, I didn't make the first session}

Session 3.02 NEWSPAPERS AND TEXT – Chaired by Estelle Bunout (C2DH) – [10:45 – 12:00]

10:45 – 11:00 Whose ‘I’ is it anyway? Comparing a rule-based approach and a BERT token- classifier for quote detection in Dutch newspapers [long] Kim Smeenk(+), Herbert Kruitbosch, Frank Harbers and Marcel Broersma

finding journalists speaking in the first person is hard - it is a varied, nebulous, and unruly genre

Just presented our work on quote detection in Dutch newspapers at the warm bath that is @DHBenelux Very exciting to be back in person #dhbenelux2022 pic.twitter.com/sodKBAv9Vo

— Kim Smeenk (@SmeenkKim) June 3, 2022

11:00 – 11:15 What’s a ‘Liberal’ newspaper anyway? [long] Kaspar Beelen(+), Jon Lawrence and Mariona Coll Ardanuy

Newspaper data as a hybrid construct: a sample, an uneven sample, it is big in terms of words by small in terms of the whole environment of victorian publishing .. contextualise what we have .. Newspaper Press Directories; summary of where newspaper distributed, by whom, price, political leaning et al .. how representative is the Jisc Newspaper Corpus in the Victorian landscape: over-represented 'liberal' and 'conservative' newspapers (and expensive newspapers!), under-represented 'neutral' and 'independent' .. but these historical categories are in need of historical deconstruction, and change over time in the press directories (fluid labels) ..

Fascinating work by Kaspar Beelen at #DhBenelux2022 on press directories and the political leanings of the Victorian press! Also corroborates work by @hobbb and other @RS4VP colleagueshttps://t.co/CmCgo9fyP4

— Thomas Smits (@thomassmits) June 3, 2022

11:15 – 11:25 Comparing the performance and usability of state-of-the-art OCR workflows on French-Dutch bilingual historical sources [short] Alec van den Broeck(+), Tess Dejaeghere(+), Vincent Ducatteeuw, Lise Foket, Sally Chambers, Christophe Verbruggen, Julie M. Birkholz and Frederic Lamsens

Art exhibition catalogues, 1792-1914, Flemish, mostly digitised .. quite variable in how the 'data' is organised within the catalogues .. goal is to create easy to use packages that work well with historical documents (and with historians!) .. goal to get to a language agnostic workflow that is robust against noise, ink seepage, text shew .. {they basically built a custom workflow that out-performs Tesseract!}

Afternoon

Session 3.03 RESEARCH DATA – Chaired by Margherita Fantoli (KU Leuven) – [13:00 – 14:30]

13:00 – 13:05 Sorry, I Don’t Follow You: The Translation of Donald Trump’s Tweets in Dutch Newspaper De Volkskrant [poster] Elise van Berkum(@)

Paraphrasing often passed off as quote when in translation ..

13:05 – 13:10 To harvest or not to harvest? The importance of legal advice in BESOCIAL [poster] Fien Messens(@), Lise-Anne Denis(@), Alejandra Michel, Eva Rolin, Patrick Watrin, Julie M. Birkholz, Sally Chambers, Friedel Geeraert, Peter Mechant, Eveline Vlassenroot, Pieter Heyvaert and Sophie Vandepontseele

attempt to expand legal deposit collection to new forms of born digital collections: legal barriers created by poor adaptation to digital data ..

13:10 – 13:15 [eu-fo-nì-a]: a program to automatically compute euphonic phenomena in the Italian language [poster] Andrea Consalvi(@)

studying sound of words and their collocation .. Python package that pumps out data useable in R and as csv ..

13:15 – 13:30 Data Montage: Towards Coherence in Multimodal Data Representation [long] Sara Akhlaq(@), Arran Ridley(@), Mark-Jan Bludau and Marian Dörk

Akhlaq, Sara, Arran Ridley, Mark-Jan Bludau, and Marian Dörk. ‘Data Montage: Towards Coherence in Multimodal Data Representation’. 30 May 2022. https://doi.org/10.5281/zenodo.6594779.

Interweaving of multi-modal data interfaces .. data montage seeks to carefully be accessible (taking in mind range of expertise), emotionally provocative (whilst rigorous), and coherent (across modalities) .. visual, sonic, aural, olfactory, edible, physicalisation .. {yeah, this work is great} .. https://tiedinknots.io/#/

13:30 – 13:45 Enriching Cultural Heritage Data for Research – the Quest for Interoperability in Audiovisual Archives [long] Mari Wigham(+), Willem Melder and Roeland Ordelman

media data archives are heterogeneous, their cataloguing is produced in heterogeneous ways .. NL Sound & Vision have a SPARQL endpoint .. https://cat.apis.beeldengeluid.nl/sparql


Some admin...

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Exceptions: embeds to and from external sources, and direct quotations from speakers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment