Skip to content

Instantly share code, notes, and snippets.

@est31
Last active January 28, 2016 04:08
Show Gist options
  • Save est31/c1bc6d20aa89f04e60f0 to your computer and use it in GitHub Desktop.
Save est31/c1bc6d20aa89f04e60f0 to your computer and use it in GitHub Desktop.
How to extract wiki pages from openproject and to bring them into more suitable format.

Exporting openproject wiki files

The following guide explains how to export an openproject wiki to somewhat readable openproject syntax-ed files.

The guide assumes you are running in bash shell, and can use GNU sed, and that you have the sqlite3 command utility installed.

Following UN*X philosophy, all commands should be silent and have no stdout. If you get something printed to stdout there is an error.

Part 0: Preparation

Make sure you have a wiki/ directory with .sql exports from mysql.

Part 1: Loading into sqlite

Open a bash shell, and cd to a directory, so that the wiki/ subdir contains all important .sql files.

Now declare the utility function sed_fun. It does most of the filtering to convert to sqlite readable format:

function sed_fun { sed '/^\(UN\)\?LOCK/d;s/AUTO_INCREMENT\(=[1234567890]*\)\?//;s/ENGINE=[^ ]\+//;s/DEFAULT CHARSET=[^ ;]\+//;/  KEY/d' | sed "s/\\\'/''/g"; };

Then do the actual importing:

for fnam in s _pages _redirects _contents _content_versions _content_journals; do cat "wikis/wiki${fnam}.sql" | sed_fun | sed 's/\((`id`)\),/\1/' | sqlite3 db2; done

It should run without errors and create a file "db2" that contains the loaded sqlite database.

Part 2: Exporting again into separate files

First create a dump file in a format more close to bash:

echo "SELECT wiki_id, title, text FROM wiki_pages INNER JOIN wiki_contents ON wiki_pages.id = wiki_contents.page_id;" | sqlite3 db2 > pages_dump

Second split the dump up to several files:

cat pages_dump | while read -r line; do LNG=`printf "%s" "$line" | cut -d"|" -f1`; TIT=`printf "%s" "$line" | cut -d"|" -f2`; CONT=`printf "%s" "$line" | cut -d"|" -f3-`; mkdir -p "op/$LNG"; printf "%s" "$CONT" | sed $'s/\\\[r]\\\[n]/\\\n/g' > "op/${LNG}/${TIT}.op"; done

The directory op will then contain language code based subdirectories, where each contains .op files, named after the page's title's. Inside those files, the string "\r\n" got replaced with proper newline characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment