Skip to content

Instantly share code, notes, and snippets.

@rattletat
Last active July 8, 2022 10:47
Show Gist options
  • Save rattletat/4a1098d5c1e7c8db1fe52d4740076808 to your computer and use it in GitHub Desktop.
Save rattletat/4a1098d5c1e7c8db1fe52d4740076808 to your computer and use it in GitHub Desktop.
#!/bin/bash
# This is heavily based on this code here:
# https://gist.github.com/maikeldotuk/54a91c21ed9623705fdce7bab2989742
# Which is heavily based on this code here:
# https://gist.github.com/enpassant/0496e3db19e32e110edca03647c36541
# Special thank you to the user enpassant for starting it https://github.com/enpassant
# ARGUMENT PARSING
# Do not overwrite (0) or overwrite (1)
OVERWRITE="$1"
# Syntax chosen for the wiki
SYNTAX="$2"
# File extension for the wiki
EXTENSION="$3"
# Full path of the output directory
OUTPUTDIR="$4"
# Full path of the wiki page
INPUT="$5"
# Full path of the css file for this wiki
CSSFILENAME=$(basename "$6")
# Full path to the wiki's template
TEMPLATE_PATH="$7"
# The default template name
TEMPLATE_DEFAULT="$8"
# The extension of template files
TEMPLATE_EXT="$9"
# Count of '../' for pages buried in subdirs
ROOT_PATH="${10}"
# If file is in vimwiki base dir, the root path is '-'
[[ "$ROOT_PATH" = "-" ]] && ROOT_PATH=''
# Example: index.md
FILE=$(basename "$INPUT")
# Example: index
FILENAME=$(basename "$INPUT" ."$EXTENSION")
# Example: /home/rattletat/wiki/text/uni/
FILEPATH=${INPUT%$FILE}
# Example: /home/rattletat/wiki/html/uni/index
OUTPUT=$OUTPUTDIR$FILENAME
# PANDOC ARGUMENTS
# If you have Mathjax locally use this:
# MATHJAX="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"
MATHJAX="/usr/share/mathjax/MathJax.js?config=TeX-AMS-MML_HTMLorMML"
# PREPANDOC PROCESSING AND PANDOC
pandoc_template="pandoc \
--mathjax=$MATHJAX \
--template=$TEMPLATE_PATH$TEMPLATE_DEFAULT$TEMPLATE_EXT \
-f $SYNTAX \
-t html \
-c $CSSFILENAME \
-M root_path:$ROOT_PATH"
# Searches for markdown links (without extension or .md) and appends a .html
regex1='s/[^!()[]]*(\[[^]]+\])\(([^.)]+)(\.md)?\)/\1(\2.html)/g'
# [^!\[\])(]*(\[[^\]]+\])\(([^).]+)(\.md)?\)
# Removes placeholder title from vimwiki markdown file. Not needed if you use a
# correct YAML header.
# regex2='s/^%title (.+)$/---\ntitle: \1\n---/'
pandoc_input=$(cat "$INPUT" | sed -r "$regex1")
pandoc_output=$(echo "$pandoc_input" | $pandoc_template)
# POSTPANDOC PROCESSING
# Removes "file" from ![pic of sharks](file:../sharks.jpg)
regex3='s/file://g'
echo "$pandoc_output" | sed -r $regex3 > "$OUTPUT.html"
# With this you can have ![pic of sharks](file:../sharks.jpg) in your markdown file and it removes "file"
# and the unnecesary dot html that the previous command added to the image.
# sed 's/file://g' < /tmp/crap.html | sed 's/\(png\|jpg\|pdf\).html/\1/g' | sed -e 's/\(href=".*\)\.html/\1/g' > "$OUTPUT.html"
# Copy relative
# destination=$(cd -- "$4" && pwd) # make it an absolute path
# cd -- "/home/rattletat/wiki/text/" &&
# find . -type f -regex ".*\.\(jpg\|gif\|png\|jpg\)" -exec cp {} "$destination/{}"
@rattletat
Copy link
Author

@staffan7s
Sorry for the late response. Probably you already solved your problem.
I think you need to change the regex from
regex1='s/[^!()[]]*(\[[^]]+\])\(([^.)]+)(\.md)?\)/\1(\2.html)/g'
to
regex1='s/([^!()[]]*)(\[[^]]+\])\(([^.)]+)(\.md)?\)/\1(\2.html)/g'

But I haven't tested it yet, so maybe you could give me feedback. Unfortunately, I switched to wiki.vim.
You can always test your regexes on pages like this.

@rattletat
Copy link
Author

@Just-A-Visitor
I am rather clueless. Since I'm not using vimwiki anymore (see last comment), some API changes could have happened, but this is rather unlikely. To debug this, I would replace echo "$pandoc_output... by echo"$pandoc_input.. to see the parsed text before it is send to pandoc. Otherwise, I would recommend trying the python tool offered here (1)

Relevant links:

  1. Python Script
  2. Previous thread
  3. Relevant issue

@staffan7s
Copy link

@rattletat
I'm sorry to hear you're abandoning this fork, but thanks again for your input. This now works reasonably well, I just had to add a space before the first \1:
regex1='s/([^!()[]]*)(\[[^]]+\])\(([^.)]+)(\.md)?\)/\1(\2.html)/g'
to
regex1='s/[^!()[]]*(\[[^]]+\])\(([^.)]+)(\.md)?\)/ \1(\2.html)/g'

In fact, my wiki2html.sh is now rather minimal: after the HAS_MATH bit, I just kept this:

sed -r 's/[^!()[]]*(\[[^]]+\])\(([^.)]+)(\.md)?\)/ \1(\2.html)/g' <"$INPUT" | pandoc $MATH -s -f $SYNTAX -t html -c $CSSFILENAME >"$OUTPUT.html"

This covers all internal links to files located in the same folder, but for folder change (eg to ../index) I have to revert to html links (<a href ... etc). All external links starting with "http" work as well.

[2020-04-06](2020-04-06) -- working
[indexpage](../index) -- not working
[DN1](www.dn.se) -- not working
[DN2](http://www.dn.se) -- working

@fcsm1922
Copy link

fcsm1922 commented Jul 8, 2022

@rattletat Thank you for this wonderful script. However, I'm stuck on an issue. When I use :VimwikiAll2HTML, it creates the HTML version of every markdown files recursively (which is what I would expect). However, it doesn't seem to link them properly.

For example, [foo](foo) in the original document points to just foo in the HTML file (instead of foo.html, which is already present in the relevant directory).

Similarly, relative links seem to be broken. [bar](foo/bar) points to foo/bar.md instead of foo/bar.html (even though the file exists).

Do you have any idea on what the issue might be? (I used your script/vimrc verbatim so as not to introduce any errors, except some minor changes such as MathJax loading, --quiet parameter in pandoc call, etc). As for the HTML template, I just downloaded Github.html5 and placed it inside the relevant directory. Do I need to copy any other script too?

Hi. faced the same problem as you. Did you find a way to solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment