Skip to content

Instantly share code, notes, and snippets.

@RobinDaugherty
Last active December 20, 2015 15:09
Show Gist options
  • Save RobinDaugherty/6152142 to your computer and use it in GitHub Desktop.
Save RobinDaugherty/6152142 to your computer and use it in GitHub Desktop.
Load mysqldump into postgres using as many threads as possible.

psql is CPU-bound reading mysqldump output, so it's extremely slow to load records. This script runs as many instances of psql in parallel as possible.

Determines the number of CPUs, which has been tested in Ubuntu.

Assumes data loaded from mysql has timestamps in EST5EDT time zone and has been dumped using the following command line:

mysqldump -f --skip-tz-utc --skip-add-locks --skip-lock-tables --no-create-db --no-create-info --complete-insert --default-character-set=utf8 --max_allowed_packet=1M --net_buffer_length=1M --compatible=postgresql $database $table --result-file=$file;
#!/bin/bash
database_name=$1
source_file=$(pwd)/$2
thread_count=$(cat /proc/cpuinfo | grep processor | wc -l)
total_lines=$(cat $source_file | wc -l)
lines_per_file=$((($total_lines+$thread_count-1)/$thread_count)) # Ceiling
dir="$source_file.split"
mkdir $dir
cd $dir
split -l $lines_per_file $source_file
for i in $(ls -1 xa*); do
file="$i"
echo "$file:"
sed -i '' -e "1i\\
SET TIME ZONE 'EST5EDT';\\
SET standard_conforming_strings = \'off\';\\
SET backslash_quote = \'on\';\\
" "$file";
psql --quiet $database_name -f "$file" 2>&1 | pcregrep -v -M 'psql:[^\n]+nonstandard use of (escape|\\.) in a string literal\nLINE[^\n]+\n[^\n]*\nHINT:[^\n]+' &
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment