Skip to content

Instantly share code, notes, and snippets.

View sushil10018's full-sized avatar

Sushil Shrestha sushil10018

View GitHub Profile
@xecutioner
xecutioner / Wikimeida_extraction.md
Last active August 29, 2015 13:57
Wikimedia article extractions

STEP 1: Use media labs tool to generate doc based xml from the wikipedia dump.


  • Get the latest copy of the articles from wikipedia download page.
> wget http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2