Skip to content

Instantly share code, notes, and snippets.

@erochest
Created March 21, 2012 17:46
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save erochest/2150126 to your computer and use it in GitHub Desktop.
Save erochest/2150126 to your computer and use it in GitHub Desktop.
A script I wrote in Literate Haskell using Shelly
#!/bin/sh
# pandoc -f markdown+lhs -t html5 --smart --css https://raw.github.com/richleland/pygments-css/master/default.css s5topdf.lhs
# pandoc -f markdown+lhs -t html5 --smart --css s5topdf.css s5topdf.lhs
pandoc -f markdown+lhs -t html5 --smart s5topdf.lhs
nmap \h :w<CR>:call Send_to_Tmux("./generate-html.sh > index.html\n")<CR>
nmap \c :w<CR>:call Send_to_Tmux("./generate-html.sh \| pbcopy\n")<CR>
#!/bin/bash
runhaskell ./s5topdf.lhs "$@"
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; }
code > span.dt { color: #902000; }
code > span.dv { color: #40a070; }
code > span.bn { color: #40a070; }
code > span.fl { color: #40a070; }
code > span.ch { color: #4070a0; }
code > span.st { color: #4070a0; }
code > span.co { color: #60a0b0; font-style: italic; }
code > span.ot { color: #007020; }
code > span.al { color: #ff0000; font-weight: bold; }
code > span.fu { color: #06287e; }
code > span.er { color: #ff0000; font-weight: bold; }
pre.sourceCode {
margin-left: 1cm;
margin-right: 1cm;
padding: 0.5em 1em;
border: 1px solid #888;
-moz-border-radius: 0.5em;
-webkit-border-radius: 0.5em;
border-radius: 0.5em;
-moz-box-shadow: 5px 5px 5px #888;
-webkit-box-shadow: 5px 5px 5px #888;
box-shadow: 5px 5px 5px #888;
}
Shell Programming in Haskell: Converting S5 Slides to PDF
=========================================================
Recently, I gave an introduction to Python for Chris' and Kelly's [GIS
Workshop][s12gis]. It was a really great experience, and we had a lot of fun
learning about Python and how to use it with ArcGIS.
I did [my slides][slides] for it in Markdown, using [S5][s5]. Others around the
Scholars' Lab have used [Show-off][showoff] to compose slide-shows in Markdown,
but I wanted something a little simpler, and it had been a while since I'd
looked at S5, so I used that instead.
Then Kelly asked me for a PDF version of the slideshow. Heh.
At first I thought I might have to covert it to Showoff or (worse yet)
PowerPoint. But I Googled around and found that converting it wouldn't be too
difficult. The process itself would be simple, and a small shell script would
make it even easier.
<img src="http://www.scholarslab.org/wp-content/uploads/2012/03/philosoraptor.jpg" style="float: left; padding-right: 0.5em;">
And then my infallible instinct to make any project ten times more interesting
(i.e., *complicated*) kicked in.
I remembered that I'd just read Greg Weber's post about [Shelly][shelly], a
library to make shell scripting a bit easier in Haskell. I've been seriously
playing with Haskell for almost a year now, using it for most of my
side-projects and for anything that no one else will have to maintain. The
thought of using Haskell for shell scripting was intriguing, just because it
would be another way for me to wrap my head around this very different computer
language.
But I was skeptical. At first glance, Haskell doesn't seem like a good
candidate for shell programming. Typically, these scripts are quick, one-off
programs, often written in anger, that need to be created quickly and nimbly
(dare I say, *agily*?). However, Haskell is statically-typed, and its type
system is not given to making quick changes. (Well, I've found that not to be
quite accurate, but it is the perception.) Generally, I think that languages
like Haskell are more suited to larger systems, because their power and
concision really only become apparent when working with large bodies of code.
Whatever my reaction, though, a small script like this, with limited scope,
seemed perfect.
The Process
-----------
The process I found to handle the conversion was fairly simple.
1. Get a PNG screenshot of each slide using [webkit2png][webkit2png];
2. Concatenate all of the PNGs into a PDF using the [ImageMagick][magick] tool
`convert`;
3. Clean up the PNGs.
With that laid out, let's jump in.
Preface
-------
First, some book-keeping: I have to let Haskell know that I'm going to use
string literals in places that require [Data.Text.Text][text] instances:
\begin{code}
{-# LANGUAGE OverloadedStrings #-}
\end{code}
Also, we have to import the [Shelly][shelly] module.
\begin{code}
import Shelly
\end{code}
And we need some other modules for working with characters, text, and other
things.
\begin{code}
import Control.Monad (forM_)
import qualified Data.Char as C
import qualified Data.Text.Lazy as T
import Filesystem.Path
import Prelude hiding (FilePath)
import System.Environment
\end{code}
Converting to PNGs
------------------
The first step is taking screenshots of each slide. To do that, I used the
[webkit2png][webkit2png] script.
For most things, I'm using Python 2.7, but I haven't bothered installing
`pyobjc` for it. `webkit2png` uses `pyobjc`, though, so I have to run that
program with Python 2.6, which is the default Python shipped with Mac OS 10.6.
I only generate the full-sized screenshot, and I output it to a filename that
includes the slide number. In Bash, that would look like this:
```bash
python2.6 $(which webkit2png) \
--fullsize \
--filename pythongis-000 \
http://people.virginia.edu/~err8n/pythongis/#slide0
```
First, let's create a generic function to run commands in Python 2.6. In
Shelly, the convention is to add an underscore to functions that throw away
their output:
\begin{code}
python26_ script args = run_ "python2.6" (script:args)
\end{code}
This is kind of interesting because I wouldn't abstract this out if I were
writing this in Bash, Python, or Ruby. But adding this function felt quite
natural in Haskell, which tends to encourage smaller, more generic, yet more
focused, functions.
Now I'll build on that to create a command to look for the program
`webkit2png`, and if it finds it, pass it to Python 2.6:
\begin{code}
webkit2png_ filename url = do
script <- which "webkit2png"
case script of
Nothing -> echo "ERROR: webkit2png not installed."
Just script' -> do
s <- toTextWarn script'
python26_ s [ "--fullsize"
, "--filename", filename
, url
]
\end{code}
This could be better. For one thing, this command could print an error message
if `webkit2png` isn't available. If that happens, it should probably also
short-circuit the rest of the script. The way to do this in Haskell would be to
return a [Maybe][maybe] value, which is what the `which` function above
does. In this case, I know that the program is installed and on the `PATH`,
so I'm being a little sloppy.
Converting to PDF
-----------------
The next step is to concatenate all the PNGs into one PDF. I'm using the
`convert` program from [ImageMagick][magick] to do this. This takes a list of
PNG files to convert, the name of the PDF file, and generates the output.
\begin{code}
convert :: FilePath -> [FilePath] -> ShIO ()
convert pdf pngs = run_ "convert" =<< mapM toTextWarn (pngs ++ [pdf])
\end{code}
Working on Multiple Files
-------------------------
Right now, `webkit2png_` (the function to download the slides as PNGs) operates
on a single slide. But we'll need to do this for every slide in the show.
`downloadSlides` takes the number of slides and the base URL, and it calls
`webkit2png_` for each slide. It returns a list of file names for the
downloaded PNGs.
\begin{code}
downloadSlides :: Int -> String -> ShIO [FilePath]
downloadSlides slideCount baseUrl = do
forM_ inputs $ \(url, file) -> webkit2png_ file url
return files'
where
baseUrl' = T.pack $ baseUrl ++ "#slide"
range = map (T.pack . show) [0..slideCount]
urls = map (T.append baseUrl') range
files = map (T.append "slide-") range
files' = map (fromText . flip T.append "-full.png") files
inputs = zip urls files
\end{code}
The only wrinkle here is that the file names that are passed to `webkit2png`
aren't the ones that are output. Instead, the program appends the size of the
image (thumbnail, full, etc.) and the ".png" extension. Since I want to operate
on those files later, I have to create both the file name prefix to pass to
`webkit2png` and the full file name to process later. This is unfortunate and
brittle, because if `webkit2png` ever changes how it names the output files, my
script will break.
This is also shell-script sloppy in another way. I should really create a
temporary directory and download the PNGs there. Maybe someday.
Putting it all Together and Getting the Inputs
----------------------------------------------
All the pieces are in place. The only things left are to parse the command-line
arguments, call `downloadSlides` and `convert`, and delete the downloaded PNGs.
The `main` function is the entry-point for the script. It picks three
parameters from the command line and tries to make one a `Int`. If that can't
happen for any reason, it prints the usage message and exits. If the
command-line is right, the script continues processing.
\begin{code}
main :: IO ()
main = shelly $ verbosely $ do
args <- liftIO $ getArgs
case args of
[slides, url, pdf] | all C.isNumber slides -> do
pngs <- downloadSlides (read slides) url
convert (fromText $ T.pack pdf) pngs
echo . T.pack $ "Wrote PDF to " ++ pdf
mapM_ rm_f pngs
otherwise -> echo usage
\end{code}
This is the usage/help message.
\begin{code}
usage :: T.Text
usage = "\
\usage: s5topdf.lhs [slides] [url] [output] \n\
\ \n\
\ slides is the number of slides in the slideshow.\n\
\ url is the URL to access the slideshow at.\n\
\ output is the filename of the PDF file to create.\n"
\end{code}
Running
-------
To run this script, pass it to `runhaskell` with the right command-line
arguments. For example, here's a small [wrapper script][wrapper].
Conclusion
----------
Using Haskell for shell programming hasn't been bad, but it's not as fast as
shell programming usually is, either. This is still more verbose than the bash,
Python, or Ruby versions would be, and it took me (a little) longer to write.
(Of course, I was unfamiliar with several of these libraries, and that slowed
me down.)
However, I needed to do almost no debugging. Once I got the types to line up
and `runghc` to stop complaining, it just worked. There were no bugs hiding in
parts that hadn't run yet. Based on experience with other languages, I'd
expected to have to tweak the `convert` function (the second stage of
processing) once I got the `webkit2png` part working (the first stage). But
that wasn't necessary. After I coaxed the complete script into printing the
usage message, everything else worked flawlessly.
The bottom line: For very short one-off scripts, this seems like over-kill. For
scripts that you expect to grow, Haskell plus Shelly might be more attractive.
Second Conclusion
-----------------
One of the things that attracts me to Haskell is it's history of using
[literate programming][literate]. In fact, I'm using it right now. This post
was generated from the script itself. I've posted the raw version to a
[gist][gist], so you can compare them.
Using literate Haskell was a success. I really liked being able to interleave
extended commentary with the code and to have both be part of the final
product. I think it changed the nature of both the script and the post. This
might not work as well for larger projects with more lines of code and multiple
modules, but for a small script, it was very comfortable. I can see doing this
again for descriptions of small algorithms, projects, or demos.
Also, having this file double as a script *and* the post is kind of neat, at
least for the moment.
[markdown]: http://daringfireball.net/projects/markdown/ "Markdown"
[s12gis]: http://tinyurl.com/s12gis "GIS Workshop"
[slides]: http://people.virginia.edu/~err8n/pythongis/ "The Slides in Question"
[s5]: http://meyerweb.com/eric/tools/s5/ "S5: A Simple Standards-Based Slide Show System"
[showoff]: https://github.com/schacon/showoff "showoff"
[haskell]: http://www.haskell.org/haskellwiki/Haskell "Haskell"
[shelly]: http://www.yesodweb.com/blog/2012/03/shelly-for-shell-scripts "Shelly for Shell Scripts"
[literate]: http://en.wikipedia.org/wiki/Literate_programming "Wikipedia: Literate Programming"
[gist]: https://gist.github.com/2150126 "The raw script"
[wrapper]: https://gist.github.com/2150126#file_s5topdf "A wrapper script"
[text]: http://hackage.haskell.org/package/text "Data.Text package"
[webkit2png]: http://www.paulhammond.org/webkit2png/ "webkit2png"
[maybe]: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Maybe.html "Data.Maybe package"
[magick]: http://www.imagemagick.org/script/index.php "ImageMagick"
\begin{code}
-- vim: set filetype=lhaskell:
\end{code}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment