Skip to content

Instantly share code, notes, and snippets.

@arky
Created October 3, 2022 06:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arky/c91d20a8769846aec32262c76eea815d to your computer and use it in GitHub Desktop.
Save arky/c91d20a8769846aec32262c76eea815d to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
# coding: utf-8
# Example python code to extract Named-entity recognition (NER)
# from a PDF document.
#
# The script uses spacy and spacypdfreader
# Install a pip packages require for
#import sys
#!{sys.executable} -m pip install spacy spacypdfreader
#!{sys.executable} -m spacy download en_core_web_sm
# imports
import spacy
from spacy import displacy
from spacypdfreader import pdf_reader
# Load the PDF document
nlp = spacy.load('en_core_web_sm')
doc = pdf_reader('testcase.pdf', nlp)
# display the document using displacy renderer
page_count = doc._.last_page
for page in range(1, page_count):
displacy.render(doc._.page(page), style="ent")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment