Skip to content

Instantly share code, notes, and snippets.

@dsanson
Created August 30, 2011 17:52
Show Gist options
  • Star 52 You must be signed in to star a gist
  • Fork 16 You must be signed in to fork a gist
  • Save dsanson/1181510 to your computer and use it in GitHub Desktop.
Save dsanson/1181510 to your computer and use it in GitHub Desktop.
any2pandoc.sh: script that tries to convert documents thrown at it to pandoc's extended markdown
#!/bin/sh
# any2pandoc.sh
#
# A shell script that tries its best to convert documents thrown at it
# to pandoc's extended markdown.
#
# https://gist.github.com/1181510
#
# Depends on:
#
# pandoc: http://johnmacfarlane.net/pandoc/
# a utility for converting lots of things to lots of things
# textutil: a built-in OS X utility for converting lots of things to
# lots of things.
# pdftohtml: http://pdftohtml.sourceforge.net/
# a utility for converting pdf to html
#
if [ ! $(which pandoc) ]; then
echo "pandoc not found: unable to process files."
exit
fi
for file in "$@"
do
base="${file%%.*}"
ext="${file#*.}"
case $ext in
doc | docx | webarchive | rtf | rtfd | odt )
if [ ! $(which textutil) ]; then
echo "textutil not found:"
echo " unable to process doc, docx, webarchive, rtf, rtfd, or odt files"
exit
fi
textutil -format "$ext" -convert "html" -stdout "$file" \
| pandoc -f html -s -o "${base}.markdown"
;;
pdf )
if [ ! $(which pdftohtml) ]; then
echo "pdftohtml not found: unable to process pdf files."
exit
fi
pdftohtml -noframes -stdout "$file" \
| pandoc -f html -o "${base}.markdown"
;;
tex )
pandoc -f latex -s "$file" -o "${base}.markdown"
;;
* )
pandoc -s "$file" -o "${base}.markdown"
;;
esac
done
@yankchina
Copy link

Great Command!

@RichardDooling
Copy link

Wow! Nice. Thanks a bunch.

@RichardDooling
Copy link

When I try converting from PDF, I get a message about bad encoding. Apparently the default for pdftohtml is latin1, which pandoc does not like.

@iiiw
Copy link

iiiw commented Feb 1, 2014

Adding -enc "UTF-8" in line 45 works for me. Incredibly useful script, thanks.

@flintforge
Copy link

Pandoc now supports RTF as an input format

@dsanson
Copy link
Author

dsanson commented Apr 9, 2022

Yes, this script is an artifact of its time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment