Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Process the output of processed wikipedia output
cat wiki_* | awk '!/<doc id/ || !/<\/doc/{print x, "\n";};{x=$0}' > wiki_text_extracted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment