Skip to content

Instantly share code, notes, and snippets.

@idlecool
Created March 2, 2018 07:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save idlecool/866c10798b268f34264bf8d715779b31 to your computer and use it in GitHub Desktop.
Save idlecool/866c10798b268f34264bf8d715779b31 to your computer and use it in GitHub Desktop.
Copy wikipedia dataset to HDFS
# install p7zip from here - https://gist.github.com/marcesher/7168642#gistcomment-2249579 (careful about the typo)
# extract the dataset on HDFS
7z x -so enwiki-20080103-pages-meta-history.xml.7z | hadoop fs -put - /user/hadoop/enwiki-20080103-pages-meta-history.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment