Skip to content

Instantly share code, notes, and snippets.

@roycoding
Created November 20, 2022 17:20
Show Gist options
  • Save roycoding/a18df10aac3331a7f93b5e12988227c8 to your computer and use it in GitHub Desktop.
Save roycoding/a18df10aac3331a7f93b5e12988227c8 to your computer and use it in GitHub Desktop.
A (crappy) Markua 0.10 to LaTeX conversion script

A (crappy) Markua 0.10 to LaTeX conversion script

This gist has a Python script I created to convert my book, Zefs Guide to Deep Learning from Markua 0.10 in LaTeX. It works, but maybe not for you. But maybe if you mess with it, it can suit your needs.

I wrote my book in Leanpub's Markua 0.10 markup language and am publishing the ebook there, but I decided that I wanted to publish a paperback in a smaller size via Amazon's Kindle Direct Publishing. Unfortunately, none of the smaller print sizes that Leanpub supports are the same as the smaller print sizes that KDP will print, so in order to create a "pocket-sized" book, I needed to generate my own PDF file.

The most recommended way to do this is via pandoc. Unfortunately, pandoc does not support converting from Markua to LaTeX (although they supposedly support the other direction). While pandoc does support markdown to LaTeX, I found that there were enough detials that I'd need to fix that it was not worth it for me.

Thus this script!

The way I've used this script is to first create a skeleton LaTeX book (I used JenniferMack's latex-kdp document class as the basis) and include the chapters as separate LaTeX files, themselves containing minimal styling. Once I had the overall styling, margins, etc that I wanted, I converted each of my Markua chapter files in to LaTeX files.

Buyer beware!

Assuming you are familiar enough with Python, (Python) regexing, and LaTeX, you can probably get this to work for you -- or at least serve as a good jumping off point.

Some things to note:

  • The script I created generally handles most of the features of Markua 0.10, but not all, as I did not use all of the features (e.g. I didn't mess with tables).
  • I defined several new LaTeX commands to do exactly what I wanted (e.g. the section headers and the aside environment). You will either need to change these to standard commands, change the name of your LaTeX commands to mine (🤔), or substitute your own custom commands in.
  • Because of the way the original Markua 0.10 text is modified during this process (the original file is left as is, of course [hopefully]), my script probably only works if you run the functions in the order they are in in the main convert function. This is becuase I wrote the regexes on the unaltered source Markua, so if the Markua is already altered, sometimes unpredictable things happen.

How to run this

I wrote this script in Python 3.9.7. The only external dependency is click, which can be removed if you prefer not to use it.

The command to convert a single Markua 0.10 file to LaTeX is

python zg_convert.py input_file.md output_file.tex

and of course input_file.md and output_file.tex should be replaced with the name of your Markua 0.10 input file and the name of you desired LaTeX file, respectively.

Have fun

Maybe I am the only one who will ever need this, but I figured I'd share it will the world in case some other person (most likely my future self) could make use of this.

# Convert Markua 0.10 to Latex for Zefs Guides
# Things this script handles
# - Remove sample related stuff
# - Convert chapter, section, subsection, etc
# - Footnotes
# - Links
# - Links in footnotes
# - Lists
# - Images
# - Asides
# - Bold
# - Italics
# - Quotation marks
# - LaTeX math
# - Quoted text
# - Other characters like $ and &
# - Lists
import re
import click
def sections(nfile: str) -> str:
"Convert a markua header to a ZG LaTeX header"
# Chapters
_nfile = re.sub(r"^#{1} (.+)", r"\\chapter{\g<1>}", nfile, flags=re.M)
# Sections
_nfile = re.sub(r"^#{2} (.+)", r"\\sect{\g<1>}", _nfile, flags=re.M)
# Subsections
_nfile = re.sub(r"^#{3} (.+)", r"\\ssect{\g<1>}", _nfile, flags=re.M)
# Subsubsections
_nfile = re.sub(r"^#{4} (.+)", r"\\sssect{\g<1>}", _nfile, flags=re.M)
return _nfile
def footnotes(nfile: str) -> str:
# As of now, this should be run BEFORE the links function
# Assumes footnote text is correctly formated
# List of footnotes
fnlist = []
for fn in set(re.findall(r"\[\^(.+?)\]", nfile)):
fnlist.append(
{
"marker": fn,
"note": re.findall(rf"^\[\^{fn}\]: (.+)$", nfile, flags=re.M)[0],
}
)
# Convert marker to latex note and remove markua note
for fn in fnlist:
nfile = re.sub(
rf"\[\^{fn['marker']}\]", rf"\\footnote{{{fn['note']}}}", nfile, count=1
)
nfile = nfile.replace(rf"[^{fn['marker']}]: {fn['note']}", "").strip()
return nfile
def links(nfile: str) -> str:
# As of now, this should be run AFTER the footnotes function
# Find text and URL
links = re.findall(r"[^!]\[([^\^].+?)\]\((.+?)\)", nfile)
_nfile = nfile
# Replace markdown links with Latex links
# Using my custom footnote link latex command
for i, l in enumerate(links):
if l[0] == l[1]:
# Explicit URL
_nfile = _nfile.replace(rf"[{l[0]}]({l[0]})", rf"\url{{{l[0]}}}")
else:
# Implicit URL
# Escape any parentheses in the strings
_l0 = re.sub(r"(\(|\))", r"\\\g<1>",l[0])
_nfile = re.sub(
rf"\[{_l0}\]\({l[1]}\)", rf"\\footurl{{{l[0]}}}{{{l[1]}}}", _nfile
)
return _nfile
def images(nfile: str) -> str:
# Find all images
imgs = re.findall(r"!\[([^\^].+)\]\((.+)\)", nfile)
_nfile = nfile
for img in imgs:
# Remove alt caption
# str.replace pattern
# pat = rf'{{alt: "{img[0]}", width: 100%}}'
# _nfile = _nfile.replace(pat, "")
_nfile = re.sub("\{alt:.+?\}", "", _nfile, flags=re.M)
# Replace markdown image link with LaTeX figure env
# str.replace pattern
img_format = img[1][-3:]
if img_format == "jpg":
include = 'graphics'
width = 0.3
else:
include = 'svg'
width = 1.0
fig = rf"""\begin{{figure}}[htb]
\centering
\include{include}[width={width}\textwidth]{{../{img[1]}}}
\caption*{{{img[0]}}}
\end{{figure}}
"""
_nfile = _nfile.replace(rf"![{img[0]}]({img[1]})", fig)
return _nfile
def asides(nfile: str) -> str:
return nfile.replace("{aside}", r"\begin{aside}").replace(
"{/aside}", r"\end{aside}"
)
def metatags(nfile: str) -> str:
# Remove Leanpub meta tags
_nfile = re.sub(r"\{sample: .+\}", "", nfile)
_nfile = _nfile.replace(r"{frontmatter}", "")
_nfile = _nfile.replace(r"{mainmatter}", "")
return _nfile
def styles(nfile: str) -> str:
# Convert italics and bold
# Italics
_nfile = re.sub(r"(?<!\*)\*{1}([^*]+?)\*{1}", r"\\emph{\g<1>}", nfile, flags=re.M)
# Bold
_nfile = re.sub(
r"(?<!\*)\*{2}([^*]+?)\*{2}", r"\\textbf{\g<1>}", _nfile, flags=re.M
)
return _nfile
def quote_marks(nfile: str) -> str:
# Convert quotation marks
# As of now this must be run AFTER image conversion
return re.sub(r"\"(.+?)\"", r"``\g<1>''", nfile)
def math(nfile: str) -> str:
# Convert inline and stand alone math
# Inline
_nfile = re.sub(r"`{1}([^`]+?)`{1}\$", r"$\g<1>$", nfile)
# Stand alone
_nfile = re.sub(r"^`{3}\$\n((.+\n)+)`{3}", r"\\[\n\g<1>\\]", _nfile, flags=re.M)
# _nfile = _nfile.replace()
return _nfile
def blockquotes(nfile: str) -> str:
# Convert single line and multi-line blockquotes
# Single line
_nfile = re.sub(
r"\n>(.+?)\n", r"\n\\begin{quote}\n \g<1>\n\\end{quote}\n", nfile
)
# Multi-line
_nfile = _nfile.replace("{blockquote}", r"\begin{quote}").replace(
"{/blockquote}", r"""
\end{quote}"""
)
return _nfile
def special_chars(nfile: str) -> str:
# Handle special chars, such as $, %, and &
# $, but not inline math $ followed by a single digit ## super hack
_nfile = re.sub(r"\$(?=\d{2,}|\d,)", r"\\$", nfile)
# %
_nfile = re.sub(r"(\d+)%(?!\})", r"\g<1>\%", _nfile)
# &. Don't pick up the ampersands in multi-line LaTeX math
_nfile = re.sub(r"(?<!^)(?<!\s{2})&", r"\\&", _nfile)
# #
_nfile = re.sub(r"#", r"\\#", _nfile)
return _nfile
def lists(nfile: str) -> str:
# Convert lists of various types
# This seem to work better earlier in the order of functions called
# Unordered lists
lists = [x[0] for x in re.findall(r"^(- (.+\n)+)", nfile, flags=re.M)]
_nfile = nfile
for l in lists:
items = l.strip().replace("- ", "").split("\n")
latex = f"\\begin{{itemize}}[noitemsep]\n"
for i in items:
latex += f" \item{{{i}}}\n"
latex += f"\end{{itemize}}\n"
_nfile = _nfile.replace(l, latex)
# Ordered lists
lists = [x[0] for x in re.findall(r"^(\d\. (.+\n)+)", _nfile, flags=re.M)]
for l in lists:
items = re.sub(r"\d+\. ", "", l.strip()).split("\n")
latex = f"\\begin{{enumerate}}[noitemsep]\n"
for i in items:
latex += f" \item{{{i}}}\n"
latex += f"\end{{enumerate}}\n"
_nfile = _nfile.replace(l, latex)
return _nfile
@click.command()
@click.argument("in_file", type=click.File("r"))
@click.argument("out_file", type=click.File("w"))
def convert(in_file: str, out_file) -> str:
nfile = in_file.read()
# Convert blockquotes
nfile = blockquotes(nfile)
# Convert lists
nfile = lists(nfile)
# Remove Leanpub meta tags
nfile = metatags(nfile)
# Convert footnotes
nfile = footnotes(nfile)
# Convert text styles
nfile = styles(nfile)
# Convert links
nfile = links(nfile)
# Convert images
nfile = images(nfile)
# Convert asides
nfile = asides(nfile)
# Convert quotation marks
nfile = quote_marks(nfile)
# Convert math environment markers
nfile = math(nfile)
# Convert section headers
nfile = sections(nfile)
# Escape special characters
nfile = special_chars(nfile)
out_file.write(nfile)
if __name__ == "__main__":
convert()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment