Skip to content

Instantly share code, notes, and snippets.

@shhyou
Last active December 2, 2023 02:51
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shhyou/a49b7ab18e2ae5f34d6d8a658d59c79f to your computer and use it in GitHub Desktop.
Save shhyou/a49b7ab18e2ae5f34d6d8a658d59c79f to your computer and use it in GitHub Desktop.

Comparing PDF files in git diff commands

The typical approach to comparing PDF files in git diff outputs amounts to converting PDF files into texts through pdftotext and show the diff of conversion.

  1. Install the utility pdftotext from the project poppler
  2. Enable the handler pdffiles in diff for PDF files and instruct the handler to call the pdf-astextplain script:
    echo "*.pdf diff=pdffiles" >> ~/.config/git/attributes
    echo "[diff \"pdffiles\"]\n\ttextconv = pdf-astextplain\n\tbinary = true" >> ~/.gitconfig
    
    See: https://git-scm.com/docs/gitattributes#_marking_files_as_binary
    and https://git-scm.com/docs/git-diff#Documentation/git-diff.txt---textconv
  3. Add the wrapper of pdftotext to direct its output to stdout. Create a script pdf-astextplain in $PATH:
    #!/bin/sh
    pdftotext -layout -enc UTF-8 "$1" -
    
    To compare metadata in addition to the content of the PDF, add pdfinfo "$1" in pdf-astextplain.
  4. In some GIT implementations, there is a astextplain script that converts PDF and other files to text for diff as well.
    See: https://github.com/git-for-windows/build-extra/blob/c223c7757745c1df552c0dd4628c368aaea11f32/git-extra/astextplain
  5. In similar spirit, use zipinfo -l to show contents of ZIP archive and use the script racket-wxme-astext to show Racket WXME files.
#!/bin/sh
pdfinfo "$1"
pdftotext -layout -enc UTF-8 "$1" -
#!/bin/bash
if ! file --mime "$1" | grep octet-stream > /dev/null; then
cat "$1"
else
cat "$1" | racket -n -l racket/base -l wxme -l racket/port \
-e '(copy-port (wxme-port->text-port (current-input-port)) (current-output-port))'
fi
#!/bin/bash
zipinfo -l "$1" || unzip -l "$1"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment