Skip to content

Instantly share code, notes, and snippets.

@ross-spencer
Last active May 27, 2021 07:13
Show Gist options
  • Save ross-spencer/ad51e6b29d8aa63440993aec07f2e307 to your computer and use it in GitHub Desktop.
Save ross-spencer/ad51e6b29d8aa63440993aec07f2e307 to your computer and use it in GitHub Desktop.
Digitally signed documents in the govdocs selected corpus
govdocs_selected/PDF_1304/119596.pdf
govdocs_selected/PDF_1304/470680.pdf
govdocs_selected/PDF_997/475108.pdf
govdocs_selected/PDF_1304/475482.pdf
govdocs_selected/PDF_237/497449.pdf
govdocs_selected/PDF_1304/511649.pdf
govdocs_selected/PDF_997/518946.pdf
govdocs_selected/PDF_997/595453.pdf
govdocs_selected/PDF_1304/805594.pdf
govdocs_selected/PDF_997/866130.pdf
govdocs_selected/PDF_997/949949.pdf
govdocs_selected/PDF_1198/999150.pdf
govdocs_selected/PDF_1783/254330.pdf
govdocs_selected/PDF_627/021275.pdf
@ross-spencer
Copy link
Author

ross-spencer commented May 26, 2021

Legacy documents that have been digitally signed to help researchers understand the preservation risks around digitally signed documents in a collection and how to treat those that may already be exhibiting issues, e.g. signatures no longer validating the document.

Method: cat and ack on Govdocs selected corpus for /Sig at first, but finally settling on looking for strings matching the supported algorithms in the format.

#!/usr/bin/env sh

find /media/ross-spencer/Govdocs/govdocs_selected/ -name "*.pdf" -print0 | while read -d $'\0' file
do
	cat "$file" | ack "CAdES" > /dev/null
	if [ $? -eq 0 ]
	then
		echo "$file"
	fi
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment