rauschma/pandoc-experiences.md

## pandoc-experiences.md

      
    Raw
  

              pandoc-experiences.md
            
          
    Using Pandoc to publish a book in multiple file formats: experiences and wishes

Pandoc was essential for publishing my book “JavaScript for impatient programmers”. The book exists in several versions:

Printable PDF (for a print-on-demand book on Amazon)
Screen PDF
Multi-page HTML
EPUB
MOBI

The homepage of “JavaScript for impatient programmers” contains previews of all artifacts.
In this document, I describe some of the challenges I’ve encountered while working on the book.
Limitations of HTML output

HTML output is missing several important features. All of them are supported when using LaTeX via Pandoc:

Top-level parts.

Issue: jgm/pandoc#6411


Chapter TOCs

Related discussion: https://groups.google.com/forum/#!topic/pandoc-discuss/KEfzxqqueBU


Index generation

Issue: jgm/pandoc#6415
My Lua filter for cross-format indices: https://gist.github.com/rauschma/bfacbe6f2e8461b4a62c0cc1a288188e


Frontmatter (unnumbered chapters without a part) and appendices

I wrote Lua filters to work around these limitations (excluding frontmatter), but it wasn’t easy.
Multi-file output

For HTML, I needed Pandoc to produce multiple files (kind of like the internals of the EPUBs that it produces).
The easiest workaround was to generate a single long HTML file and split it up, while updating cross-file links so that they also include filenames.
Related:

Forum: https://groups.google.com/forum/#!topic/pandoc-discuss/rliZN-GuEr4
Issue: jgm/pandoc#6122

--extract-media paths: optionally relative to output (vs. relative to input)?

Problem


On one hand, we need to be in the same directory as the content, so that \includepdf{} works with relative paths.

Additional important benefit: That command can’t handle spaces in paths (which are common in absolute paths on macOS). That’s a bug that’s new in the latest version of XeLaTeX: https://github.com/ho-tex/oberdiek/issues/31
Alas, Pandoc’s intermediate LaTeX output also has to sit next to the content. That’s a weakness of LaTeX, not of Pandoc.


On the other hand, --extract-media path assumes we are inside the output directory:

pandoc --standalone -o ../out/chapter.html --extract-media=../out/img chapter.md
Input: ![](img/diagram.svg)
Actual output: <img src="../out/img/08830082d8bd4c323da4ec4f51fb2a20b2dcaae7.svg" />
Desired output: <img src="img/08830082d8bd4c323da4ec4f51fb2a20b2dcaae7.svg" />


proj/
  content/
    chapter.md
    img/
      diagram.svg
  out
    chapter.html
    img/
      08830082d8bd4c323da4ec4f51fb2a20b2dcaae7.svg

The workaround that I have used


Choose a long unique name for the extracted media directory.
Search-and-replace in the produced HTML output and copy the extracted directory into the output location.

How Pandoc could be changed to fix this problem

Introduce a different mode for --extract-media where paths to extracted files are relative to the output file. This different mode could be switched on via:

An option that otherwise does the same as --extract-media, but computes paths differently: --extract-media-relative-to-output
A separate option for specifying how --extract-media computes its paths:

--extract-media-mode=relative-to-input
--extract-media-mode=relative-to-output


If other options work similarly to --extract-media, it may make sense to introduce an option that works for all of them (instead of just for --extract-media).
Related:

Forum: https://groups.google.com/forum/#!topic/pandoc-discuss/MLNKv2sENNo
Issue: jgm/pandoc#6410

Working with the filter API

Wishes:

At the moment, filters visit inlines in a separate pass. (This is a known problem and being worked on.)
Should Pandoc ever support cross-format numbering of headings, filters would benefit from having access to the numbers of headings.

I’ve found Lua difficult to work with (tables and output are frustratingly limited, etc.). I originally wanted to publish my Lua filters, but they don’t feel robust enough for me to do so. The solution will be to eventually rewrite the filters in either Haskell, Rust or TypeScript. Then I can publish them.
Various other wishes


The filter pandoc-crossref is important for supporting LaTeX’s floating images and tables for all output formats. It allows you to refer to them elegantly. It’d be great if this functionality could be built into Pandoc.


For images, I’m making a distinction:

Bitmap graphics (same across all file formats): .jpg, .png
Vector graphics (format-specific): no filename extension. The filename extension is then specified via --default-image-extension:

PDF: .pdf
EPUB, HTML: .svg
MOBI (via intermediate EPUB): .jpg
Minor inconvenience: When previewing the Markdown in an editor, you don’t see the vector graphics. I’m not sure how to best fix this. Maybe with a mapping of image extensions: --image-extension-replace=svg/pdf (i.e., use .svg in Markdown input, but .pdf in PDF output).


Additional filters that I wrote


A filter that converts links to page numbers (use case: print PDF):

Input: This phenomenon is called [_hoisting_](#hoisting).
Output: This phenomenon is called hoisting.
Print (no link, page number via LaTeX): This phenomenon is called hoisting (page 392).


References that mention the section number and section title:

Input: For more information, see [$full](#section-on-unicode).
Output: For more information, see §12.7.1 “JavaScript and Unicode”.


Inserting breaks into inline code (to fix overflow problems in LaTeX):
`Desc•.[[Con•fig•urable]]`{.break}


Linking to inline IDs doesn’t work in LaTeX. Workaround supported by filter:
[Hoisting]{#hoisting .texlabel} is an important term in this context.
UPDATE: fixed in master


Information boxes (“tip”, “warning”, etc.). Examples: https://exploringjs.com/impatient-js/ch_faq-book.html#notations-and-conventions


Conclusion

In general, I loved working with Pandoc. Especially its filters make it a flexible and powerful tool. It’s impressive how well they work.
The following features helped with creating the print PDF:

Black & white syntax highlighting
The option to convert links into footnotes

Further reading


Discussion of this document: https://groups.google.com/forum/#!topic/pandoc-discuss/_WU9R4i2Y2M
Blog post “Behind the scenes of my latest book on JavaScript”