Skip to content

Instantly share code, notes, and snippets.

@mtboren
Last active March 14, 2024 20:49
Show Gist options
  • Save mtboren/8525f1362646e0fd09ff04b53f4ef511 to your computer and use it in GitHub Desktop.
Save mtboren/8525f1362646e0fd09ff04b53f4ef511 to your computer and use it in GitHub Desktop.
Examples of getting PDF text contents for subsequent use
## use PowerShell and the PSWritePDF module
Install-Module -Name PSWritePDF -Scope CurrentUser
Convert-PDFToText -FilePath $strSomePdf | fabric --pattern analyze_threat_report
## or, use Python, pypdf module, and some local PDF; these Python examples can be from most any shell (including PS)
pip install pypdf --user ## however you like to install Python modules
python -c 'strSomePdf = "/tmp/2024-cyber-threat-report.pdf"; from pypdf import PdfReader; from sys import stdout; [stdout.writelines(page.extract_text()) for page in (PdfReader(strSomePdf)).pages]' | fabric --pattern analyze_threat_report
## or, use Python, pypdf module, and some PDF from URL
python -c 'strSomePdfUri = "https://www.sonicwall.com/medialibrary/en/white-paper/2024-cyber-threat-report.pdf"; from urllib.request import urlopen; from pypdf import PdfReader; import io; from sys import stdout; reader = PdfReader(io.BytesIO(urlopen(strSomePdfUri).read())); [stdout.writelines(page.extract_text()) for page in reader.pages]' | fabric --pattern analyze_threat_report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment