After a general introduction into the goals of Digital Humanities, we discuss the shift in perspective when we look at our objects of study as data points. What questions and methodologies does this new point of view prompt? We encourage you to start thinking about new research questions that could be answered by analysing the data. In order to facilitate this, we introduce a workflow that guides the process of getting from raw through presentable data. We focus on the process of acquiring data and on how to structure it in a meaningful way. We focus on Application Programmer Interfaces (APIs) and writing REST queries to acquire data.
Reading material: Ben Fry - The Seven Stages of Visualizing Data
Slides
Tools
APIs
- Europeana (European portal for online collections of Galleries, Libraries, Archives and Museums)
- Open Library (Bibliographic records for all the world's books)
- Project Gutenberg (digital library with full-texts of public domain books)
An introduction on the use of digital tools in historical disciplines by Charles van den Heuvel will be followed by a hands-on session under the guidance of Jan Hartmann to get acquainted with the open-access Geographical Information System QGIS. Using historical maps of the Atlas der Neederlanden (UB) you will exercise in making your own maps and learn how to link geo-located data to a map from instance from an Excel spread sheet. After this experiment with mapping data in time and space, the second part of this afternoon will be dedicated to the visualization of topical data in historical network research. After a brief explanation of basic terms used in network visualizations and a demonstration of some examples hereof, we will experiment with the visualization tool Gephi to visualize a data set of Dutch correspondents from the Catalogus Epistularum Neerlandicarum.
Slides
Tools
Data sets
Today, we continue with our workflow. First, we prepare a raw data so that we can start exploring. We discuss how to parse, filter and extract information from our data sets. Once acquired, data is hardly ever in the shape and structure that we need to answer our research questions. There are several steps needed to transform the data, before we can query, represent and visualise it. During the transformations, new insights and questions may come up that require us to go back to previous steps. We use Google Refine and Gephi to explore data sets created during the first day.
Slides
Tools
This session focuses on analysis of digital text. The first part, taught by Karina van Dalen-Oskam, will start with a short overview of the main sources of available digital texts in the Dutch language. Then some of the most accessible text analysis tools will be demonstrated. One of these, AntConc, will be dealt with more thoroughly. For this, participants are advised to bring a small set of digital texts they would like to experiment with. These should be in txt-format (if necessary, we will help you to convert your file to the right format). NB: The teachers will also bring some texts for experimenting.
In the second part, taught by Jelle Zuidema, we will focus on regular expressions as a more general and even more powerful way to search for patterns in text data. We will use the Cygwin-environment (providing a so-called unix-shell that runs on Windows) to practice with simple commands to find linguistic patterns in corpora (e.g., consonant triplets, past-tense inflections) and to collect frequency counts. To show how these techniques can also be used outside linguistics, we will use them to collect text statistics in a authorship attribution exercise.
Slides
Tools
Data sets
We finish with the last stages of our workflow. Focusing on presenting and interacting with the data through visualisation. There are many ways to visualise information, but only some of them will present the data such that it provides an answer to our research questions. We experiment with Many Eyes, which is a web-based visualization tool that offers a broad range of visualisations to select from, to optimally present the story we want to tell with the data.
Slides
Tools