Skip to content

Instantly share code, notes, and snippets.

@phaer
Created January 3, 2021 12:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save phaer/b65d097784d93816be6d80034af482db to your computer and use it in GitHub Desktop.
Save phaer/b65d097784d93816be6d80034af482db to your computer and use it in GitHub Desktop.
Workaround for pandocs org-parser not handling generic multi-line metadata keys.
#!/usr/bin/env python3
"""
Pandocs Org-Parser does not handle generic multi-line metadata keys. Only
their last line ends up in the AST and so this can't be solved (easily) with
lua filters.
We use python to extract org metadata keys before the first non-metadata line
and output them in YAML for use with pandoc's --metadata-file. E.g.
#+TITLE: Letter
#+from: Your Name
#+from: Your Address
Hello,
should end up as
title: "Letter"
from: ["YourName", "Your Address"]
"""
import sys
import re
import yaml
metadata_regex = re.compile("^#\+(.*):(.*)$")
metadata = dict()
for line in sys.stdin.readlines():
line = line.strip() # strip leading and tailing whitespace
if not line: # skip empty lines
continue
if match := re.match(metadata_regex, line):
key, value = match.groups()
key = key.lower()
value = value.strip()
if key in metadata:
if isinstance(metadata[key], list):
metadata[key].append(value)
else:
metadata[key] = [metadata[key], value]
else:
metadata[key] = value
else:
break # stop at first non-empty, non-metadata line.
yaml.safe_dump(metadata, sys.stdout)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment