Skip to content

Instantly share code, notes, and snippets.

@danprince
Last active April 24, 2016 22:43
Show Gist options
  • Save danprince/6654ea22fe448f40d70bf168c725e40d to your computer and use it in GitHub Desktop.
Save danprince/6654ea22fe448f40d70bf168c725e40d to your computer and use it in GitHub Desktop.
Academic Pandoc

There's a hypothetical paradise where we write all of our documents with markdown and this is well on the way to being a reality with readmes, websites, wikis and documentation but in academic writing, LaTeX reigns supreme.

Alone, markdown isn't much of a competitor. It can't do references, it doesn't understand chapters, it can't generate page numbers, you have to manage tables of contents manually, there's no support for captions or automatic numbering of figures and listings. Markdown wasn't designed with these kinds of complex features in mind --- and rightly so.

Maybe I've just been spoiled by the expressive nature of markdown but I don't enjoy writing LaTeX.

There's a project called Pandoc, which allows you to create convert from one document type to another. We can use pandoc to convert markdown into PDFs and HTML.

If you need to convert files from one markup format into another, pandoc is your swiss-army knife.

Pandoc supports an extended version of markdown which offers additional syntax and configuration for generating additional LaTeX before it is compiled to PDF. This allows you to start taking advantage of all sorts of features that makes academic writing not only feasible, but enjoyable.

I've just finished writing a Masters Dissertation and want to run through the process, explaining the challenges I faced along the way.

Build System

This is the most minimal syntax for creating a PDF from a markdown file using Pandoc.

pandoc -o publish.pdf report.md 

It's fine for small projects where all your content lives inside one markdown file, but as soon as you want to distinguish between chapters, you'll probably also want to start splitting your files up.

Your compilation command can get quite long and complex. By the end of the project mine looked something like this.

pandoc $input -o $output \
  --toc \
  --chapters \
  --number-sections \
  --listings \
  --highlight-style kate \
  --include-in-header=meta/header.tex \
  --include-before meta/before.tex \
  --bibliography=references.bib

My very first piece of advice would be to set up a build system with some bash scripts. First of all, this saves you typing out your compilation command yourself, but it also allows you to define more involved behaviour for managing multiple markdown files.

For instance I keep my chapters in a chapters/ folder, along with an .index file which describes the order they should come in. My build scripts are responsible for reading the index, then concatenating the chapters in the appropriate order and storing them in a temporary file. Pandoc then runs on the temporary file, generates a PDF, then cleans up.

Doing this by hand would be crazy and when your chapters are all multiple pages long, you almost certainly don't want to keep them all in a single document.

References

Pandoc accepts a bibliography argument, which you can point towards the file that contains your references.

pandoc $input -o $output --bibliography=references.bib

Then it's pretty simply to create a reference and cite in your writing.

// references.bib

@book{okasaki1999purely,
  title={Purely functional data structures},
  author={Okasaki, Chris},
  year={1999},
  publisher={Cambridge University Press}
}

Then cite the reference.

See @okasaki1999purely

Personally, I relied heavily on Google Scholar for keeping track of references and generating BibTeX citations.

Although I used BibTeX, Pandoc also supports Natbib and BibLaTeX.

Figures

You'll need to complement your markdown syntax with a little bit of LaTeX if you want to be able to caption, list and reference your figures from elsewhere in your report

We'll have to uniquely identify our image using a \label. You can add a caption and a label using the Alt Text field of your markdown images.

![Figure Caption\label{fig:unique-name}](figures/misc/some-figure.pdf)

Then you can reference the image from elsewhere, using the \ref command.

See Figure \ref{fig:unique-name}.
```

Together this will generate a LaTeX figure with a caption and an appropriate number. It should look something like this.

![](https://i.imgur.com/xb4Eb44.png)

## Listings
To insert a code block, you'll need a similar syntax to figures.

```markdown
\```{caption="Listing Caption" label=lst:unique-name}
(defn menu [state]
  (render [this]
    (dom/ul nil
      (map (fn [item]
        (dom/li nil item))
        (:items state)))))
\```
```

You can reference it the same way.

```markdown
See Listing \ref{lst:unique-name}.
```

However, the default rendering style isn't particularly great. However, we can customise it by adding a header include to the metadata YAML for the project.

```yaml
---
header-includes:
 - \lstset{
     basicstyle=\ttfamily,
     breaklines=true,
     frame=single
   }
}
---

It's not enforced, but it's very handy to prefix your labels with lst: and fig:. This helps prevent collisions across your documents.

Appendix

You can use LaTeX to automatically generate a list of figures and listings for your appendix.

\listoffigures
\lstlistoflistings

These commands generate lists of all the figures and listings in your report. This can be very helpful for readers trying the locate referenced listings or figures.

Reproducible Diagrams

A large part of my project involved demonstrating tree based algorithms. The report needed to show example trees with named nodes and two styles of highlighting in order to differentiate between the state of nodes.

Creating diagrams by hand is not only time consuming, but more importantly, more prone to inconsistency. Where possible, use a system that allows you to create data-backed reproducible diagrams.

In my case I wrote a tool that takes a JSON file describing a tree:

{
  "name": "A",
  "children": [
    {
      "name": "B"
    },
    {
      "name": "B",
      "shaded": true,
      "children": [
        {
          "name": "D",
          "selected": true
        },
        {
          "name": "E"
        }
      ]
    },
    {
      "name": "C",
      "children": [
        {
          "name": "D",
          "selected": true
        },
        {
          "name": "E"
        },
        {
          "name": "D",
          "selected": true
        },
        {
          "name": "E"
        }
      ]
    }
  ]
}

And produces a PNG that looks like this:

I added a make-figures command to my build system, which passes over a directory of JSON files and builds diagrams from them, ready to embed within the report. Now all it takes is a tweak to the appropriate JSON file, a pass of make-figures and a new diagram is generated.

TODO look at reproducible ascii art tools

LaTeX Packages

Initially, I looked at Gnuplot, then pyplot for creating reproducible graphs, however the solution that worked best was LaTeX's own pgfplots package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment