Skip to content

Instantly share code, notes, and snippets.

@thomasn
Created November 12, 2008 19:35
Show Gist options
  • Save thomasn/24243 to your computer and use it in GitHub Desktop.
Save thomasn/24243 to your computer and use it in GitHub Desktop.

Legacy content

The pages on the old wiki had critical pieces of hard-won info embedded in often obsolete articles. Just copying the whole unstructured archive seems crazy; but so does throwing it all away and only having new content. AllenJB's suggestion of a template to flag these sounds ideal. This way, when we hit a page we know whether the content is all shiny and new, or captured from the old wiki and of "variable quality"...

Automation

  • Capture existing (possibly very messy) HTML to Markdown, either with Pandoc or the Make.text bookmarklet.
  • Edit Markdown (if you prefer it to editing Mediawiki)
  • Convert to Mediawiki syntax with Pandoc

This gets the bulk of the document converted - for example:

Setup

layman -a haskell
emerge -av pandoc    # needs >=pandoc-0.47

Install the Make.text bookmarklet if you want it.

Usage

Method 1: convert directly to Mediawiki

$ wget http://scratch.gentoo-wiki.com/wiki/ALSA_sound_system
$ pandoc -r html -w mediawiki -o alsa-sound.wiki ALSA_sound_system

Method 2: convert to Markdown with Make.text, tweak Markdown, export Mediawiki

... capture page to alsa-sound.md and edit
$ pandoc -r markdown -w mediawiki -o alsa-sound.wiki alsa-sound.md

Further info

Try Wikipedia - Showdown is quite fun :-)


Markdown source: http://gist.github.com/24243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment