Skip to content

Instantly share code, notes, and snippets.

@stefanschmidt
Last active May 8, 2021 03:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stefanschmidt/e1dd4d04575873c13dd71cdd4098a198 to your computer and use it in GitHub Desktop.
Save stefanschmidt/e1dd4d04575873c13dd71cdd4098a198 to your computer and use it in GitHub Desktop.
Convert list of german last names from html to plain text
# Using htmlparser we will convert an extensive list of german last names from html to plain text
# Digitales Familiennamenwörterbuch Deutschlands (DFD) is available from https://www.namenforschung.net
#
# depends on htmlparser go package
# https://github.com/htmlparser/htmlparser
# single-page view: currently 46035 names (May 2021)
curl -s 'https://www.namenforschung.net/dfd/woerterbuch/gesamtliste-veroeffentlichter-namenartikel/' |
htmlparser '#maincontent > ul:nth-child(even) > li > a text{}' > dfd.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment