Skip to content

Instantly share code, notes, and snippets.

@george-hawkins
Last active December 3, 2017 18:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save george-hawkins/e70f9cf5c21c1cb22333e66d10eda26d to your computer and use it in GitHub Desktop.
Save george-hawkins/e70f9cf5c21c1cb22333e66d10eda26d to your computer and use it in GitHub Desktop.

Below is page 2 of the Routledge Frequency Dictionary of German. This screenshot was taken using the "Look inside" feature of the Amazon product page for this book. The dictionary says it is based off the Leipzig/BYU Corpus of Contemporary German. Googling for information on this corpus turns up little or nothing.

Leipzig make available various corpora here but neither it nor Brigham Young University seem to have any web content related to something called the "Corpus of Contemporary German".

The dictionary says the corpus brings together words from spoken language, literature, newspapers, academic texts and instructional language. So the dictionary claims to use a broad based corpus and as its full title is "A Frequency Dictionary of German: Core Vocabulary for Learners" it doesn't sound like its targetting some narrow foreign affairs obsessed audience.

So it seems odd that this frequency dictionary claims that Afghanistan is a more frequently used proper noun than e.g. Frankreich (see next screenshot below).

It's especially odd if one compares it with the data from the Leipzig corpora that specifically focus on news (rather than their broader ones). Even if one looks at these news corpora for the years 2004/2005 (when the dictionary was published) and when German peacekeeping efforts in Afghanistan meant it appeared particularly often in the news Frankreich still appears far ahead in frequency. Given France's importance to Germany in terms of trade, EU politics, closeness etc. this seems hardly surprising.

Update: while the corpus covers literature from 1990 to 2000, the window described below for newspaper coverage is remarkably narrow - September 2001 to February 2002. If you look at the Leipzig news corpus for 2001 Afghanistan does spike up above Frankreich for this year (before dropping down below again in 2002). However even given this it seems surprising that the authors let words from such a narrow window swamp out what they claim is the input from more than a decade of other sources, including spoken language and literature.

page 2

Here's page 186 that lists the 100 most frequent proper nouns in German according to the dictionary's corpus.

page 186

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment