Skip to content

Instantly share code, notes, and snippets.

@JoaoCarabetta
Created February 2, 2020 21:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JoaoCarabetta/64f47ca7ec533d12bdf30c44d5b86561 to your computer and use it in GitHub Desktop.
Save JoaoCarabetta/64f47ca7ec533d12bdf30c44d5b86561 to your computer and use it in GitHub Desktop.
From request to pdf to string
import requests
# Download
res = requests.get('https://www.camara.leg.br/proposicoesWeb/prop_mostrarintegra?codteor=938381&filename=PL+2699/2011')
# To PDF
with open('metadata.pdf', 'wb') as f:
f.write(res.content)
# To string
from tika import parser
rawText = parser.from_file('metadata.pdf')
rawList = rawText['content'].splitlines()
print('\n'.join([r for r in rawList if r != '']))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment