Skip to content

Instantly share code, notes, and snippets.

@hiroshi-manabe
Last active August 29, 2015 14:06
Show Gist options
  • Save hiroshi-manabe/0f239089f68537e3c9bb to your computer and use it in GitHub Desktop.
Save hiroshi-manabe/0f239089f68537e3c9bb to your computer and use it in GitHub Desktop.
Convert Wikipedia dump file to tsv
perl -nle 'next unless s{^insert\b.+?\bvalues\s*\(}{}i; tr/\t//d; while (m{([^\x27]+?|\x27(?:[^\\]|\\.)*?\x27)(,|\),\(|\);)}g) { $val=$1; $delim=$2; $val=~s/^\x27//; $val=~s/\x27$//; $val=~s/\\n/\\\\n/g; $val=~s/\\(.)/$1/g; push @elems,$val; if ($delim=~m{\)}) { print join("\t", @elems); @elems=(); } }'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment