Skip to content

Instantly share code, notes, and snippets.

@lettergram
Created January 22, 2020 02:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lettergram/57bf49febe637b77e76340981d54cac2 to your computer and use it in GitHub Desktop.
Save lettergram/57bf49febe637b77e76340981d54cac2 to your computer and use it in GitHub Desktop.
# Load USPTO .xml document
xml_text = html.unescape(open(filename, 'r').read())
# Split out patent applications / grants
for patent in xml_text.split("<?xml version=\"1.0\" encoding=\"UTF-8\"?>"):
# Skip if it doesn't exist
if patent is None or patent == "":
continue
# Load patent text as HTML document
bs = BeautifulSoup(patent)
# Search patent for application
application = bs.find('us-patent-application')
# If no application, search for grant
if application is None:
application = bs.find('us-patent-grant')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment