Skip to content

Instantly share code, notes, and snippets.

@tarleb
Last active May 1, 2023 14:12
Show Gist options
  • Save tarleb/1a690d38508d99c88c331f63ce7f6a2c to your computer and use it in GitHub Desktop.
Save tarleb/1a690d38508d99c88c331f63ce7f6a2c to your computer and use it in GitHub Desktop.
Filter to include Markdown files via code blocks
--- Pandoc Lua filter to include other Markdown files
---
--- Usage: Use a special code block with class `include` to
--- include Markdown files. Each code line is treated as the
--- filename of a Markdown file, parsed as Markdown, and
--- included. Metadata from include files is discarded.
---
--- Example:
---
--- ``` {.include}
--- chapters/introduction.md
--- chapters/methods.md
--- chapters/results.md
--- chapters/discussion.md
--- ```
local List = require 'pandoc.List'
function CodeBlock(cb)
if cb.classes:includes 'include' then
local blocks = List:new()
for line in cb.text:gmatch('[^\n]+') do
if line:sub(1,1) ~= '#' then
local fh = io.open(line)
blocks:extend(pandoc.read (fh:read '*a').blocks)
fh:close()
end
end
return blocks
end
end
@mslinn
Copy link

mslinn commented May 17, 2019

Thanks so much for making this available!

@mslinn
Copy link

mslinn commented May 19, 2019

The filter works great when Pandocs renders to HTML, but when rendering to a man page the name of the file being included (scripts/googleSearch.js -- see this page) appears.

cad(1)

              scripts/googleSearch.js

NAME
       cad - Cadenza Client™ command line interface for authoring, administering, editing and deploying content to Cadenza instances.

I don't want the include to do anything when rendered to a man page - it should only perform the include when Pandoc is rendering to HTML.

I used the filter like this in the man source:

``` {.include}
scripts/googleSearch.js
```

The bash scripts to run Pandoc are:

function pandocHtml {
  # Reads Pandoc markdown version, writes man page
  # $1 - filename, including file type, which should be .md
  # $2 - uncompressed output filename, with file type
  sudo bash -c "pandoc $1 -s \
    --lua-filter=scripts/include.lua \
    --lua-filter=scripts/links-to-html.lua \
    --css=http://cadenza.micronauticsresearch.com/css/main.css \
    --css=http://cadenza.micronauticsresearch.com/css/man.css \
    --from markdown \
    --to html > \"$2\""
}

function pandocMan {
  # Reads Pandoc markdown version, writes man page
  # $1 - filename, including file type, which should be .md
  # $2 - uncompressed output filename, with file type

  sudo bash -c "pandoc $1 -s \
    --filter scripts/delink.hs \
    --from markdown \
    --to man > \"$2\""
}

What should I do to prevent the name of the include file from appearing on the man page?

Thanks, Mike

@tarleb
Copy link
Author

tarleb commented May 20, 2019

If I understand this correctly, then you trying to include the file as raw HTML/JavaScript. Did I get that right? The above filter assumes that the included file is Markdown. We could think about the allowing additional instructions to define the format.

However, that doesn't explain why the filter does not seem to work with man output. I cannot see how this would happen.

@mslinn
Copy link

mslinn commented May 20, 2019

The filter you show above includes JavaScript without any problem when rendering to HTML. I want the filter to do nothing when rendering to a man page. Currently it does not do anything for man pages, but its name appears instead, as I show above.

I edited my post to add a link to the JavaScript, in case you were curious.

@tarleb
Copy link
Author

tarleb commented May 20, 2019

Does this work?

function CodeBlock(cb)
  local blocks = List:new()
  if cb.classes:includes'include' then
    if not FORMAT:match 'html' then
      return {}
    end
    for line in cb.text:gmatch('[^\n]+') do
      if line:sub(1,1)~='#' then
        local fh = io.open(line)
        blocks:extend(pandoc.read (fh:read '*a').blocks)
        fh:close()
      end
    end
  end
  return blocks
end

It includes files only if the target format is HTML and deletes the block otherwise.

@mslinn
Copy link

mslinn commented May 20, 2019

I added this in front of the code:

local List = require 'pandoc.List'

I don't see any change in the output for HTML or man pages.

@tarleb
Copy link
Author

tarleb commented May 20, 2019

Could you send me a reproducible example as zip? My mail is in on my profile page.

@mslinn
Copy link

mslinn commented May 20, 2019

Will do tonighttomorrow, I'm headed out the door now for the rest of the day. Thanks!

@mslinn
Copy link

mslinn commented May 22, 2019

Concluding this dialog for posterity, @tarleb and I conversed offline and he came up with a modification to his lua plugin, and a couple more lua plugins.

include.lua

--- Pandoc Lua filter to include other Markdown files
---
--- Usage: Use a special code block with class `include` to
--- include Markdown files. Each code line is treated as the
--- filename of a Markdown file, parsed as Markdown, and
--- included. Metadata from include files is discarded.
---
--- Example:
---
---     ``` {.include}
---     chapters/introduction.md
---     chapters/methods.md
---     chapters/results.md
---     chapters/discussion.md
---     ```

local List = require 'pandoc.List'

function CodeBlock(cb)
  if cb.classes:includes'include' then
    local blocks = List:new()
    for line in cb.text:gmatch('[^\n]+') do
      if line:sub(1,1)~='#' then
        local fh = io.open(line)
        blocks:extend(pandoc.read (fh:read '*a').blocks)
        fh:close()
      end
    end
    return blocks
  end
end

delink.lua

delink.lua is shorter and faster than delink.hs, and doesn't require Haskell to be installed.

function Link (link)
  return link.content
end

no-includes.lua

This no-includes.lua filter removes all include blocks. The filter is used when targeting man output.
Using include.lua would also work, but this filter is faster.

function CodeBlock (cb)
  if cb.classes:includes 'include' then
    return {}
  end
end

I also updated my bash script to use the plugins:

#!/bin/bash

function pandocHtml {
  # Reads Pandoc markdown version, writes man page
  # $1 - filename, including file type, which should be .md
  # $2 - uncompressed output filename, with file type
  sudo bash -c "pandoc $1 -s \
    --lua-filter=scripts/include.lua \
    --lua-filter=scripts/links-to-html.lua \
    --css=http://cadenza.micronauticsresearch.com/css/main.css \
    --css=http://cadenza.micronauticsresearch.com/css/man.css \
    --from markdown \
    --to html > \"$2\""
}

function pandocMan {
  # Reads Pandoc markdown version, writes man page
  # $1 - filename, including file type, which should be .md
  # $2 - uncompressed output filename, with file type

  sudo bash -c "pandoc $1 -s \
    --filter scripts/delink.hs \
    --from markdown \
    --to man > \"$2\""
}

# Make the script directory current
cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1

pandocHtml cad.md cad.html
pandocMan  cad.md cad.1.man

@ricopicone
Copy link

This is a very nice little filter -- thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment