est31/openproject_wiki_extractor.md

## openproject_wiki_extractor.md

      
    Raw
  

              openproject_wiki_extractor.md
            
          
    Exporting openproject wiki files

The following guide explains how to export an openproject wiki to somewhat readable openproject syntax-ed files.
The guide assumes you are running in bash shell, and can use GNU sed, and that you have the sqlite3 command utility installed.
Following UN*X philosophy, all commands should be silent and have no stdout. If you get something printed to stdout there is an error.
Part 0: Preparation

Make sure you have a wiki/ directory with .sql exports from mysql.
Part 1: Loading into sqlite

Open a bash shell, and cd to a directory, so that the wiki/ subdir contains all important .sql files.
Now declare the utility function sed_fun. It does most of the filtering to convert to sqlite readable format:
function sed_fun { sed '/^\(UN\)\?LOCK/d;s/AUTO_INCREMENT\(=[1234567890]*\)\?//;s/ENGINE=[^ ]\+//;s/DEFAULT CHARSET=[^ ;]\+//;/  KEY/d' | sed "s/\\\'/''/g"; };
Then do the actual importing:
for fnam in s _pages _redirects _contents _content_versions _content_journals; do cat "wikis/wiki${fnam}.sql" | sed_fun | sed 's/\((`id`)\),/\1/' | sqlite3 db2; done
It should run without errors and create a file "db2" that contains the loaded sqlite database.
Part 2: Exporting again into separate files

First create a dump file in a format more close to bash:
echo "SELECT wiki_id, title, text FROM wiki_pages INNER JOIN wiki_contents ON wiki_pages.id = wiki_contents.page_id;" | sqlite3 db2 > pages_dump
Second split the dump up to several files:
cat pages_dump | while read -r line; do LNG=`printf "%s" "$line" | cut -d"|" -f1`; TIT=`printf "%s" "$line" | cut -d"|" -f2`; CONT=`printf "%s" "$line" | cut -d"|" -f3-`; mkdir -p "op/$LNG"; printf "%s" "$CONT" | sed $'s/\\\[r]\\\[n]/\\\n/g' > "op/${LNG}/${TIT}.op"; done
The directory op will then contain language code based subdirectories, where each contains .op files, named after the page's title's. Inside those files, the string "\r\n" got replaced with proper newline characters.