Skip to content

Instantly share code, notes, and snippets.

Avatar
🏔️
Looking for my next adventure

Vinayak Mehta vinayak-mehta

🏔️
Looking for my next adventure
View GitHub Profile
@vinayak-mehta
vinayak-mehta / pdftables_extract.py
Last active Sep 22, 2018
A Python2 script to extract tables from a PDF file using pdftables; saves tables as CSV files inside the current working directory.
View pdftables_extract.py
#!/usr/bin/env python
"""
Usage: python pdftables_extract.py <filename>
"""
import os
import sys
import pandas as pd
from pdftables.pdf_document import PDFDocument
View hn-comments.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View pdfplumber_extract.py
import os
import sys
import pandas as pd
import pdfplumber
pdf = pdfplumber.open(sys.argv[1])
p0 = pdf.pages[0]
table = p0.extract_table()
print table
@vinayak-mehta
vinayak-mehta / pdf_table_extract.py
Created Sep 22, 2018
A Python2 script to extract tables from a PDF file using pdf-table-extract; saves tables as CSV files inside the current working directory.
View pdf_table_extract.py
#!/usr/bin/env python
"""
Usage: python pdf_table_extract.py <filename>
"""
import os
import sys
import pandas as pd
import pdftableextract as pdf
View ttt.py
s = [" ", " ", " ", " ", " ", " ", " ", " ", " "]
def render_grid(s):
grid = "\n-----------------\n".join(
[
f" {s[0]} | {s[1]} | {s[2]}\n (1) | (2) | (3)",
f" {s[3]} | {s[4]} | {s[5]}\n (4) | (5) | (6)",
f" {s[6]} | {s[7]} | {s[8]}\n (7) | (8) | (9)"
]
@vinayak-mehta
vinayak-mehta / pdf2png.txt
Created Sep 6, 2020 — forked from zooba/pdf2png.txt
Step by step converting a PDF page to PNG using WinRT
View pdf2png.txt
Python 3.7.8 (tags/v3.7.8:4b47a5b6ba, Jun 28 2020, 10:03:53) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, time
>>> PDF_FILENAME = input("Path to PDF: ")
>>> OUT_FILE = os.path.abspath(input("Path to output PNG: "))
>>>
>>> import winrt.windows.data.pdf as PDF
>>> from winrt.windows.storage import StorageFile
>>> op = StorageFile.get_file_from_path_async(PDF_FILENAME)
>>> time.sleep(0.5) # should really await, but this is easier
@vinayak-mehta
vinayak-mehta / disease_outbreaks_camelot.ipynb
Last active Oct 5, 2020
A jupyter notebook showing how Camelot can be used to extract tables from PDFs scraped from the IDSP website.
View disease_outbreaks_camelot.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
You can’t perform that action at this time.