Skip to content

Instantly share code, notes, and snippets.

@iammosespaulr
iammosespaulr / get_hash_from_pdf.py
Last active May 26, 2024 17:20
Generates a custom MD5 hash for a PDF after stripping all the metadata and using a deterministic PDF ID
import sys
import hashlib
import subprocess
from tempfile import NamedTemporaryFile
def get_hash_from_pdf(input_pdf):
with NamedTemporaryFile(delete=True) as temp_pdf:
# Run qpdf to strip metadata and output to a temporary file
subprocess.run([
'qpdf', '-empty', '-static-id', '-pages', input_pdf, '1-z', '--', temp_pdf.name
{"lastUpload":"2020-08-13T18:27:47.625Z","extensionVersion":"v3.4.3"}
This file has been truncated, but you can view the full file.
Last login: Thu Apr 23 15:41:38 on ttys000
mosespaul@eiphohch0aYa ~ % codin
mosespaul@eiphohch0aYa Coding % cd GSoC/sympy2
mosespaul@eiphohch0aYa sympy2 % sympydev
(sympy-dev-py35) mosespaul@eiphohch0aYa sympy2 % python
Python 3.5.5 | packaged by conda-forge | (default, Jul 23 2018, 23:45:11)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sympy import *
>>> test('sympy/integrals/tests/test_integrals.py')