Skip to content

Instantly share code, notes, and snippets.

View largocreatura's full-sized avatar

Julián Pérez largocreatura

View GitHub Profile
@largocreatura
largocreatura / batch_ocr_to_table
Created January 26, 2022 14:02
Bash script to convert tables inserted in PDFs into CSVs in batch mode using the python module table-ocr
#!/bin/sh
# Convert each PDF to separate images per page
for pdf in ./*.pdf;do
python3 -m table_ocr.pdf_to_images $pdf;
done;
# Extract tables from each page in .png output
for images in $(find . -name "*.png"); do
python3 -m table_ocr.extract_tables $images;