Skip to content

Instantly share code, notes, and snippets.

@vovan888
Last active Aug 29, 2015
Embed
What would you like to do?
Split mediawiki xml export file to separate pages - one file for every page.
#!/usr/bin/perl -w
# output files can be converted to other wiki formats with pandoc:
# find . -type f -exec pandoc -f mediawiki -t markdown_github -o "{}".mark "{}" \;
use strict;
use Parse::MediaWikiDump;
my $file = shift(@ARGV) or die "must specify a Mediawiki dump file";
my $pages = Parse::MediaWikiDump::Pages->new($file);
my $page;
while(defined($page = $pages->next)) {
print $page->title, "\n";
my $text = $page->text;
open(my $fh, '>', $page->title);
print $fh $$text;
close $fh;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment