Skip to content

Instantly share code, notes, and snippets.

@elnazsn1988
elnazsn1988 / pypdfx.py
Created November 19, 2020 01:00 — forked from yoavram/pypdfx.py
A python client to pdfx 1.0 a "Fully-automated PDF-to-XML conversion of scientific text" (http://pdfx.cs.man.ac.uk/). Written to be used in Markx, a scientific-oriented Markdown editor (https://github.com/yoavram/markx).
# pdfx usage: http://pdfx.cs.man.ac.uk/usage
# requests docs: http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file
import requests # get it from http://python-requests.org or do 'pip install requests'
url = "http://pdfx.cs.man.ac.uk"
def pypdfx(filename):
'''
Filename is a name of a pdf file WITHOUT the extension
The function will print messages, including the status code,
def fonts(doc, granularity=False):
"""Extracts fonts and their usage in PDF documents.
:param doc: PDF document to iterate through
:type doc: <class 'fitz.fitz.Document'>
:param granularity: also use 'font', 'flags' and 'color' to discriminate text
:type granularity: bool
:rtype: [(font_size, count), (font_size, count}], dict
:return: most used fonts sorted by count, font style information
@elnazsn1988
elnazsn1988 / object_map_generation.py
Created June 16, 2020 10:06 — forked from dhavalpotdar/object_map_generation.py
Code to use Google Vision API to create a text object map from Structured Documents
import os
import cv2
from itertools import chain
import base64
import pandas as pd
import requests
import json
def ocr_using_google_api(image_path, request_url):
'''
@elnazsn1988
elnazsn1988 / object_map_generation.py
Created June 16, 2020 10:06 — forked from dhavalpotdar/object_map_generation.py
Code to use Google Vision API to create a text object map from Structured Documents
import os
import cv2
from itertools import chain
import base64
import pandas as pd
import requests
import json
def ocr_using_google_api(image_path, request_url):
'''
from btgym import BTgymEnv
import IPython.display as Display
import PIL.Image as Image
from gym import spaces
import gym
import numpy as np
import random
#!/usr/bin/env python
'''Crop an image to just the portions containing text.
Usage:
./crop_morphology.py path/to/image.jpg
This will place the cropped image in path/to/image.crop.png.
For details on the methodology, see