Skip to content

Instantly share code, notes, and snippets.

@keitazoumana
Last active December 6, 2021 17:02
Show Gist options
  • Save keitazoumana/2d0944aee3fb71e22a071ffe721bc6f3 to your computer and use it in GitHub Desktop.
Save keitazoumana/2d0944aee3fb71e22a071ffe721bc6f3 to your computer and use it in GitHub Desktop.
from tabula import read_pdf
from tabulate import tabulate
import pandas as pd
import io
# Read the only the page n°6 of the file
food_calories = read_pdf('./data/food_calories.pdf',pages = 6,
multiple_tables = True, stream = True)
# Transform the result into a string table format
table = tabulate(food_calories)
# Transform the table into dataframe
df = pd.read_fwf(io.StringIO(table))
# Save the final result as excel file
df.to_excel("./data/food_calories.xlsx")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment