Skip to content

Instantly share code, notes, and snippets.

@Glench
Created January 19, 2014 18:19
Show Gist options
  • Save Glench/8508781 to your computer and use it in GitHub Desktop.
Save Glench/8508781 to your computer and use it in GitHub Desktop.
get wikipedia titles in a file
curl http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz | gunzip | sed 's/_/ /g' | grep -v '(redirect)$'
@Glench
Copy link
Author

Glench commented Dec 24, 2014

Add this to remove everything in parens. Good for fixing things like 'Barack Obama (politician)':

| perl -pe 's/\(.*?\)//g'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment