Skip to content

Instantly share code, notes, and snippets.

@wvengen
Created October 11, 2018 09:10
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wvengen/27162f92acadfaf3ac6b782b9a018285 to your computer and use it in GitHub Desktop.
Save wvengen/27162f92acadfaf3ac6b782b9a018285 to your computer and use it in GitHub Desktop.
Generate PDF from Publitas folder URL
#!/bin/sh
#
# Generates PDF from Publitas images (online folder service)
# Stores generated PDF and JSON (which may contains links).
#
# Requirements:
# - wget https://www.gnu.org/software/wget/
# - jq https://stedolan.github.io/jq/
# - imagemagick https://www.imagemagick.org/
#
# You may need to remove the PDF-related security policy for ImageMagick for this to work.
#
if [ ! "$2" ]; then
echo "Usage: $0 <publitas_folder_url> <output_name>"
exit 1
fi
URL="$1"
OUT="$2"
DIR=`mktemp -d --suffix=.getpublitas`
wget -q -O /dev/stdout "$URL" | sed 's/^\s*var\s\+data\s\+=\s\+\(.*\);\s*$/\1/p;d' > "$DIR/$NAME.json"
cat "$DIR/$NAME.json" | jq -r '.spreads[].pages[].images | .at2400 // .at2000 // .at1600 // .at1200 // .at1000' >"$DIR/img_urls"
i=1
for u in `cat "$DIR/img_urls"`; do
echo "$u" >"$DIR/cur_url" # use file to be able to use base
wget -q --base="$URL" -O `printf "$DIR/image-page-%04d.jpg" $i` -i "$DIR/cur_url"
i=$(( $i + 1 ))
done
convert "$DIR/image-page-*.jpg" "$OUT.pdf"
cp "$DIR/$NAME.json" "$OUT.json"
rm -Rf "$DIR"
@luduma
Copy link

luduma commented Oct 3, 2020

Hello, I'm trying to download a Publitas page and make a pdf out of it, using your script. I'm new to .sh files but have managed to install all the dependencies, however I'm getting this error, when running:
bash get-publitas.sh https://view.publitas.com/malmberg/589122_bvj_4vwo_lob_a_bladerboek BVJ-4VA.
I've tried modifying the ImageMagick policy.xml file to this . Also in your script I changed line 34 to
convert -limit memory 8GiB -limit disk 8GiB -limit area 8GiB "$DIR/image-page-*.jpg" "$OUT.pdf"
Any idea how to solve this?

@srleojaco
Copy link

Thanks for the script. Still working in 2023.

I would suggest showing that the program is downloading the images since it doesn't output anything and you might think it is not doing anything.

@Zerovelocity275
Copy link

Hello, I'm trying to download a Publitas page and make a pdf out of it, using your script. I'm new to .sh files but have managed to install all the dependencies, however I'm getting this error, when running: bash get-publitas.sh https://view.publitas.com/malmberg/589122_bvj_4vwo_lob_a_bladerboek BVJ-4VA. I've tried modifying the ImageMagick policy.xml file to this . Also in your script I changed line 34 to convert -limit memory 8GiB -limit disk 8GiB -limit area 8GiB "$DIR/image-page-*.jpg" "$OUT.pdf" Any idea how to solve this?

Did you eventually manage to solve this? I am also trying to download the biology books :).

@GlowingBulb
Copy link

Hi @Zerovelocity275 & @luduma,
I would like to point out that this script is not necessary anymore, since you can just add /unsupported to the url and download the pdf from Publitas themselves.

@Zerovelocity275
Copy link

Hi @Zerovelocity275 & @luduma, I would like to point out that this script is not necessary anymore, since you can just add /unsupported to the url and download the pdf from Publitas themselves.

Oh, thank you so much, that's great.

@Cristark02
Copy link

you can just add /unsupported to the url and download the pdf from Publitas themselves.

Hi @GlowingBulb , I'm doing that and it just says: Whoops! Something went wrong... We're sorry, but this part is no longer available., so they patched it right? idk if I'm doing it right, im adding it at the end of url

@GlowingBulb
Copy link

Hi @Cristark02, As far as I know it still works. Make sure that you add the /unsupported to the end of the "root" url like this:
https://view.publitas.com/four-hands/fourhands_fall23/page/1

https://view.publitas.com/four-hands/fourhands_fall23/unsupported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment