Skip to content

Instantly share code, notes, and snippets.

@czuriaga
Created April 19, 2019 07:22
Show Gist options
  • Save czuriaga/b7b95cbf92b02565412b203a14fbf65b to your computer and use it in GitHub Desktop.
Save czuriaga/b7b95cbf92b02565412b203a14fbf65b to your computer and use it in GitHub Desktop.
#! /bin/bash
# Generate a file per page
for i in {1..448}; do pdftotext -f ${i} -l ${i} mueller-report-searchable.pdf mueller_page_${i}.txt; done
# Double interban quotes
for i in {1..448}; do perl -pi -e 's/\"/\"\"/g' mueller_page_${i}.txt; done
# Set CSV header
echo "page","text" > mueller_pages.csv
# Generate CSV rows, including page number and page text enclosed by quotes
for i in {1..448}; do echo ${i}',"'`cat mueller_page_${i}.txt`'"' >> mueller_pages.csv; done
@czuriaga
Copy link
Author

Shell script code to generate a 448 CSV file, from original PDF mueller-report-searchable.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment