Skip to content

Instantly share code, notes, and snippets.

@caleb-kaiser
Created October 25, 2019 15:48
Show Gist options
  • Save caleb-kaiser/1d647a0d394271d41c5cb47b3c063049 to your computer and use it in GitHub Desktop.
Save caleb-kaiser/1d647a0d394271d41c5cb47b3c063049 to your computer and use it in GitHub Desktop.
Extract data and transform XML to TSV
dvc run \
-f extract.dvc \
-d data/Posts.xml.zip \
-o data/Posts.xml \
'unzip data/Posts.xml.zip -d data'
dvc run \
-f prepare.dvc \
-d code/xml_to_tsv.py \
-d data/Posts.xml \
-o data/Posts.tsv \
python \
code/xml_to_tsv.py \
data/Posts.xml \
data/Posts.tsv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment