Skip to content

Instantly share code, notes, and snippets.

@samkit-jain
Created December 13, 2017 15:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save samkit-jain/1e13babe2bc95749c242b0b4b756574f to your computer and use it in GitHub Desktop.
Save samkit-jain/1e13babe2bc95749c242b0b4b756574f to your computer and use it in GitHub Desktop.
# reading table using tabula
rows = tabula.read_pdf(filepath,
pages='all',
silent=True,
pandas_options={
'header': None,
'error_bad_lines': False,
'warn_bad_lines': False
})
# converting to list
rows = rows.values.tolist()
@pragyag
Copy link

pragyag commented May 10, 2018

Hi Samkit,
I followed your article to build a similar analyser. But to me it does not seem as simple as it seems in the article. Hence just confirming, with the above given snippet, are you able to get all the relevant data or there are some (for me many) missing data points. Also in most statements tabula is not able to find the table at all, hence I used the area option, which was different for every bank. Please let me understand if this is actually what worked with you for 90% accuracy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment