Skip to content

Instantly share code, notes, and snippets.

@anjesh
Last active April 9, 2018 07:17
Show Gist options
  • Save anjesh/4d58132a49c54e58538f29513e8aa1e7 to your computer and use it in GitHub Desktop.
Save anjesh/4d58132a49c54e58538f29513e8aa1e7 to your computer and use it in GitHub Desktop.
Script to prepare the individual sheets from multiple excel files for PPTIN

Setup

  • make sure that you have curl and python installed
  • run pip install csvkit to install csvkit

Running

  • after setup, run bash prep.sh
    • it will create data folder, download xlsx from the PPTIN website to data and create individual CSVs for each sheet in out folder
## download data
mkdir -p data
curl http://ppip.gov.np/excel-download/2013 -o data/2013.xlsx
curl http://ppip.gov.np/excel-download/2014 -o data/2014.xlsx
curl http://ppip.gov.np/excel-download/2015 -o data/2015.xlsx
curl http://ppip.gov.np/excel-download/2016 -o data/2016.xlsx
curl http://ppip.gov.np/excel-download/2017 -o data/2017.xlsx
## prepare individual sheets from all the excel files
mkdir -p out
for sheet in releases awa_documents awa_suppliers awards contracts ten_criteria ten_items ten_tenderers
do
rm -rf "out/$sheet.csv"
touch "out/$sheet.csv"
echo "Writing header for $sheet"
writeHeader="in2csv --sheet $sheet data/2017.xlsx"
`$writeHeader | head -n 1 >> out/$sheet.csv`
for file in {2013..2017}
do
echo "Reading $sheet from $file"
writeData="in2csv --sheet $sheet data/$file.xlsx"
echo "Running $writeData"
`$writeData | tail -n +2 >> out/$sheet.csv`
done
done
# in2csv -n 2017.xlsx # to list all the following sheets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment