Skip to content

Instantly share code, notes, and snippets.

@outlyer
Created February 17, 2018 15:03
Show Gist options
  • Save outlyer/5ff1101e5d0919f5bd49afc1117cce3b to your computer and use it in GitHub Desktop.
Save outlyer/5ff1101e5d0919f5bd49afc1117cce3b to your computer and use it in GitHub Desktop.
Tool to fix epub encoding for use with devices like the Amazon Kindle which do not handle high UTF characters and require HTML Entities
#!/bin/sh
#
IFS='|'
base=${1%%.*}
ext=${1#*.}
#if [[ $# -eq 0 ]] ; then
if [ "$ext" != "epub" ]; then
echo "Usage: $0 [EPUB FILE]"
echo "\nNo source epub file specified. Please specify a valid epub document."
echo "\nThis tool will fix HTML entities in badly formatted epub documents so they can be"
echo "used on devices that do not allow non-HTML entities for high ASCII characters."
exit 255;
fi
if type he >/dev/null 2>&1; then
# Be safe
tmp=$(mktemp -d epubfix.XXXXXX)
pwd=$(pwd) # Store original working directory to avoid ".." nonsense
echo "=== Processing $base ==="
mkdir -p $tmp/orig $tmp/work || true
echo "=== Creating Working Copies ==="
unzip -aoq "$base.$ext" -d "$tmp/orig/$base/"
unzip -aoq "$base.$ext" -d "$tmp/work/$base/"
echo "=== Processing HTML Encoding ==="
printf "\tFixing HTML encoding..."
for b in $(find $tmp/orig/ -type f -regextype posix-egrep -regex ".*?html" -printf "%P|")
do
cat "$tmp/orig/$b" | he --encode --use-named-refs --allow-unsafe > "$tmp/work/$b"
#printf "$b\t"
printf "."
# \diff -U 0 "orig/$b" "work/$b" | grep ^@ | wc -l # Show number of changed lines,
done
printf " Done!\n"
echo "=== Creating new epub book ==="
cd "$tmp/work/$base"
epub="$pwd/$base-html5.epub"
zip -X0q "$epub" mimetype
zip -rDX9q "$epub" * -x "*.DS_Store" -x mimetype
echo "\t$base-html5.epub written."
echo "=== Cleaning up workspace ==="
#echo "Removing $tmp"
cd $pwd
rm -rf $tmp
echo "=== Done ==="
else
echo "Missing he -- (https://github.com/mathiasbynens/he)"
echo "Please install he by running"
echo "\n\tnpm install -g he"
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment