Skip to content

Instantly share code, notes, and snippets.

@cuevasclemente
Created February 21, 2017 18:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cuevasclemente/0ef99dfd508736856f11e813a20d0cf5 to your computer and use it in GitHub Desktop.
Save cuevasclemente/0ef99dfd508736856f11e813a20d0cf5 to your computer and use it in GitHub Desktop.
Process the output of processed wikipedia output
cat wiki_* | awk '!/<doc id/ || !/<\/doc/{print x, "\n";};{x=$0}' > wiki_text_extracted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment