-
-
Save andyrbell/25c8632e15d17c83a54602f6acde2724 to your computer and use it in GitHub Desktop.
# use ImageMagick convert | |
# the order is important. the density argument applies to input.pdf and resize and rotate to output.pdf | |
convert -density 90 input.pdf -rotate 0.5 -attenuate 0.2 +noise Multiplicative -colorspace Gray output.pdf |
I just developed www.scanyourpdf.com for everyone to use. Code is open source if you'd like to contribute!
convert -density 130 input.pdf -rotate 0.33 -attenuate 0.15 +noise Multiplicative -colorspace Gray output.pdf
Closest to a modern scanner in my opinion.
Sometimes you'll have to replace a few pages with real scanned pages. Eg: Replace the signature page.
The flow will be:
- Convert input as a scanned PDF.
- Split the sections that should be replaced with a real scan.
- Merge everything back to the output.
You'll need qpdf
and img2pdf
installed.
convert -density 130 input.pdf -rotate -0.33 -attenuate 0.15 +noise Multiplicative -colorspace Gray output.pdf
qpdf --empty --pages output.pdf 1-5 -- output_1.pdf
img2pdf --pagesize A4 --auto-orient signed.jpg -o output_2.pdf
qpdf --empty --pages output.pdf 7 -- output_3.pdf
qpdf --empty --pages output_*.pdf -- final_scan.pdf
I get the following:
C:\Program Files\ImageMagick-7.0.10-Q16-HDRI>magick convert -density 150 input.pdf -rotate "$([ $((RANDOM % 2)) -eq 1 ] && echo -)0.$(($RANDOM % 4 + 5))" -attenuate 0.4 +noise Multiplicative -attenuate 0.03 +noise Multiplicative -sharpen 0x1.0 -colorspace Gray output.pdf
convert: FailedToExecuteCommand `"gswin32c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r150x150" "-sOutputFile=C:/Users/TURKEY~1/AppData/Local/Temp/magick-9420wtSmlrXSBcfh%d" "-fC:/Users/TURKEY~1/AppData/Local/Temp/magick-9420IJ2RKxHzcTQf" "-fC:/Users/TURKEY~1/AppData/Local/Temp/magick-9420t7VINjcK97Pq"' (The system cannot find the file specified.
) @ error/delegate.c/ExternalDelegateCommand/475.
convert: PDFDelegateFailed `The system cannot find the file specified.
' @ error/pdf.c/ReadPDFImage/662.
convert: invalid argument for option '-rotate': $([ $((RANDOM % 2)) -eq 1 ] && echo -)0.$(($RANDOM % 4 + 5)) @ error/convert.c/ConvertImageCommand/2643.
@turkeyphant: You seem to use Windows. Most commands here use features from UNIX shells like Bash (e.g. command substitution via $()
, the $RANDOM
variable, arithmetic expressions or conditionals). These features are not available in the default Windows command line, therefore you need to find another way (e.g. remove UNIX shell features from the command or use a UNIX shell like Cygwin, Git shell or WSL under Windows).
@muellermartin: oops good point. However, I'm still having issues on a OS X machine when running brew install imagemagick
. It seems to be either an issue with curl
(I don't know how to sub in a different version) or a 301 redirect at kernel.org:
curl: (60) SSL certificate problem: Invalid certificate chain
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
Error: Failed to download resource "gnu-getopt"
Download failed: https://www.kernel.org/pub/linux/utils/util-linux/v2.35/util-linux-2.35.2.tar.xz
@turkeyphant Nice that you have macOS at hand – these ImageMagick commands should work there :) You seem to have bad luck due to your issue with Homebrew, though. The error seems to indicate that curl
(probably used by brew
to download the dependencies for ImageMagick) can't validate the SSL certificate (the redirect is likely not an issue). As the certificate for www.kernel.org seems to be valid from my point, there is likey some issue with the certificate bundle used by your curl
. This is odd, as the pre-installed version of curl
should use the system certificates and therefore it should work. Maybe you have "overwritten" the curl
command via Homebrew (which is not recommended). You can check that by using which curl
which should output /usr/bin/curl
. If the output is something like /usr/local/bin/curl
or /usr/local/opt/curl/bin/curl
then you might have linked the version from Homebrew (or other tools). With Homebrew you can try brew unlink curl
to undo this.
I haven't messed with curl.
$ which curl
/usr/bin/curl
Any other workaround for this download?
@turkeyphant: Hm, I wonder why Homebrew tries to install gnu-getopt
from the sources instead of using a keg file. Maybe you could try to install gnu-getopt
explicitly to work around this issue: brew install gnu-getopt
No dice I'm afraid. Still get Error: Failed to download resource "gnu-getopt" Download failed: https://www.kernel.org/pub/linux/utils/util-linux/v2.35/util-linux-2.35.2.tar.xz
Given I'm able to download the file manually there must be a workaround? Any way to tell brew not to make curl to use -k
? Or use wget --no-check-certificate
instead?
Seem to have solved it (slowing making as I write) with Homebrew/legacy-homebrew#6103 (comment) for each and every invalid cert.
Do think there must be a way to update my machine's certs so that curl can work correctly though.
@turkeyphant: Well, if these SSL errors are not only related to curl
then something is really off. Sometimes an utterly wrong system time causes such errors (because the certificates seem to be expired/not valid yet) or you're in a shitty corporate network that uses some kind of HTTPS-Interception and thus breaks security or you're the victim of a MITM attack.
It's seems to be a common macos issue to be honest. System time is correct, there is no vpn or other network issues and I'm fairly certain there's no mitm going on (have tested various Internet connections for example and other macos machines). It's 10.11 and the certificates might just be out of date?
Nice! The other day, I had 19 pages to sign with unique signatures. First, I used xournal on Ubuntu 20.04 with a stylus, and then I ran the following script:
#!/usr/bin/env bash
# Dependencies
sudo apt install pdftk imagemagick -y
# Output folder
mkdir -p output
# Keep pages in the right order
for i in {1..19}; do
if (( $i < 10 )); then
j=0$i
else
j=$i
fi
pdftk input.pdf cat $i output output/$j.pdf
convert -density 200 -trim -flatten -quality 80 -attenuate 0.15 +noise Multiplicative -rotate 0.01 output/$j.pdf output/$j.jpg
convert output/$j.jpg output/$j.pdf
rm output/$j.jpg
done
pdftk output/* cat output result.pdf
The conversion to .jpg prevents the file from bloating.
The +noise Multiplicative
argument created a dappled background behind where I had text but not in other places. Using Gaussian, Laplacian, or Uniform instead of Multiplicative produced better results for me.
If you get this error:
convert-im6.q16: not authorized `input.pdf' @ error/constitute.c/ReadImage/412.
convert-im6.q16: no images defined `output.pdf' @ error/convert.c/ConvertImageCommand/3258.
you can run
sudo mv /etc/ImageMagick-6/policy.xml /etc/ImageMagick-6/policy.xml.off
to disable the policy. When done, you can restore the original with
sudo mv /etc/ImageMagick-6/policy.xml.off /etc/ImageMagick-6/policy.xml
Taken from here
Guys - completely newbie here
I downloaded Visual studio and Git as per install-windows.txt
then ... how do I run the scanner.sh file?
Do I add this file into folder somewhere...?
tks .....
I've improved upon this script slightly (having used it for a while now):
- by splitting the PDF into separate pages per file
- applying slightly different rotations to each page
- recombining the files
- support for macOS automator quick actions
- Fixing the noise so it appears across the document
See: https://gist.github.com/Pezmc/38017cb03daccb17d3835280c568dc0f
Thanks @Pezmc. To have the noise only at the edges instead of across the whole document is a feature IMO, and also keeps the file size much smaller. Unfortunately, I couldn't figure out how to get your script to use noise only at the edges.
I ended up modifyng the original script using the higher density to make the output sharper. Got to keep up with the increasing quality of the scanners in the 3 years since then. 😉
convert -density 130 input.pdf -rotate 0.2 -attenuate 0.2 +noise Multiplicative -colorspace Gray output.pdf
For those on Windows make sure to install Ghostscript as well or else you'll get errors like
convert: FailedToExecuteCommand `"gswin32c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r150x150" "-sOutputFile=C:/Users/TURKEY~1/AppData/Local/Temp/magick-9420wtSmlrXSBcfh%d" "-fC:/Users/TURKEY~1/AppData/Local/Temp/magick-9420IJ2RKxHzcTQf" "-fC:/Users/TURKEY~1/AppData/Local/Temp/magick-9420t7VINjcK97Pq"' (The system cannot find the file specified.
) @ error/delegate.c/ExternalDelegateCommand/475.
convert: PDFDelegateFailed `The system cannot find the file specified.
' @ error/pdf.c/ReadPDFImage/662.
thank you! I have used some of these commands to build https://oakpdf.com which not only applies scanner effect, but also allows to insert an image of signature or draw a signature.
My observations regarding -density
parameter: 200 is good enough in most cases, while 300 gives ultimate quality - but the build time get catastrophically slow..
Great
Thank you!
I used zenity to add graphical input and output prompts:
convert -density 150 "$(zenity --file-selection --title="Select Input File" --file-filter=*[PpDdFf])" -rotate "$([ $((RANDOM % 2)) -eq 1 ] && echo -)0.$(($RANDOM % 4 + 5))" -attenuate 0.4 +noise Multiplicative -attenuate 0.03 +noise Multiplicative -sharpen 0x1.0 -colorspace Gray "$(zenity --file-selection --save --title="Select Output File" --filename ".pdf")"
Can also be found here as a .desktop file, so the script can be started from the starter on Linux machines:
https://gist.github.com/fewaltix/c1437171d16671741aafe146751dbf9f
work
Very useful. Added as function: