Skip to content

Instantly share code, notes, and snippets.

@interrogator
Last active November 8, 2016 06:46
Show Gist options
  • Save interrogator/7ec79e0b982e30d728a78e1d2554aa91 to your computer and use it in GitHub Desktop.
Save interrogator/7ec79e0b982e30d728a78e1d2554aa91 to your computer and use it in GitHub Desktop.
daniel's blog post

Halfway through my PhD candidature in linguistics at Melbourne Uni, I was introduced by Fiona to the ResPlat family. One of their aims, I was told, was to train researchers across the university in emerging tools and methods for doing better, more reproducible research. A specific target of this agenda was the Humanities and Social Sciences, who, let's admit, sometimes lag behind a little when it comes to engagement with digital tools and methods.

IMAGE OF RESPLAT http://67.media.tumblr.com/ede2ddf22557269fd92dd13c4b344c53/tumblr_inline_nk9gcyW6pE1ssbz72.jpg "ResPlat Family"

My thesis was about corpus linguistics—that is, using computers to locate patterns in large collections of written text. Because of this, Fiona asked me if I could come on board and help out, teaching Python to researchers around the university, but with extra focus on those from the humanities. A key issue among corpus linguists, however, is that many don't really know how to code. A more common workflow is to load text files into graphical tools, which provide interesting, but in many senses limited, windows into natural language data. The expertise is more in the interpretation of results than in the generation of them.

My confession is that at the time, this was me. I ran decade-old software, and pressed the 'Keywords' button to get a list of words that were 'key' in the texts. I described and tried to explain the meaning behind whatever output the tool gave me—but the process was leaving me with doubts. When there were problems, could I fix them at their source? If someone gave me a new set of texts, or if I updated the old set, would I have to start all all over again? Was what I doing transparent and reproducible? And though it was all very interesting, was I really doing research that I could respect?

Regardless of how things were going in Thesisland, with only the most basic knowledge of shell scripting and Python under my belt, in December of 2014, I was invited on board with ResPlat, and was quickly apprenticed (read: hazed) in. While being an instructor was a key part of the job, really, I was a student at the same time. We were running the first ResBaz in February, and I was supposed to revise the course materials for "Text analysis with Python". Uh oh.

First #HackyHour of the year, snacks courtesy of @OKFNau pic.twitter.com/mhtcM5BskU

— Fiona Tweedie (@FCTweedie) January 15, 2015
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> "Hacky Hour/Frantic ResBaz Preparation"

Over the summer, I learned and practiced Python in the Jupyter Notebook, and put what I learned right into our lesson materials. It was a beginners' guide in more than one sense. Lachlan taught me Git with patience and mercy, so that our emerging materials stayed open-source and under version control.

As I learned, it became obvious how I could apply the code to my thesis research. So, I did. I started writing code that could extract the most common nouns from my dataset. Then, I wrote code that counted the number of imperatives. Before long, I was writing a Python module for getting texts annotated with grammatical features, for searching those annotated texts, and for visualising the results. An early version of the module was used during ResBaz, to show how you can progress from a series of text files to an analysis of meaning and pragmatics in Australian political discourse. Today, use of the tool is becoming more widespread. It bridges the divide between corpus and computational linguistics, and addresses some of the misgivings I had about the the methodology of my thesis.

Seems like #challengeaccepted is very quickly becoming our new #ResBaz mantra! Cheers @About_Memory!

— Research Bazaar (@ResBaz) February 15, 2015
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Because of ResBaz, my research improved, and I honed in on what it is that I really enjoy doing. I also learned the terrifying art of teaching while live-coding—a skill that comes in handy all the time, both for teaching and for conference talks. By submission time, the code was, in my eyes, a key contribution of my work. Shortly after, a live demonstration of the module helped me land a postdoc position at the University of Tübingen, working within the European CLARIN (Common Language Resources and Technology Infrastructure) project. Like ResBaz, CLARIN aims to provide researchers, especially from the humanities and social sciences, with access and training in the use of digital resources that underpin more and more modern research. ResBaz showed me not only how important this aim is, but how much fun it can be to work toward. More specifically, my role will involve developing software and creating exemplar projects, using languages (German, Java) that I'm far from fluent in. No worries—ResBaz, via Jee, taught me to say "Challenge accepted".

ResBaz Germany, Summer 2017. You heard it here first.

Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment