Skip to content

Instantly share code, notes, and snippets.

@grenade
Last active May 12, 2023 13:27
Show Gist options
  • Save grenade/cc51aa744834e1ca6311ed1fe84fff7a to your computer and use it in GitHub Desktop.
Save grenade/cc51aa744834e1ca6311ed1fe84fff7a to your computer and use it in GitHub Desktop.
migrate and sync a live mongo database to a new cluster deployment

coyote

a systemd service for migrating a live atlas cluster to a self-hosted cluster

coyote uses mongodump and mongorestore to perform:

  • a point in time restore (pitr) of a live source cluster to a new target cluster
  • subsequent source operation log replays on the target, in perpetuity, to keep target synced with source until the source goes offline

prerequisites

coyote requires jq and assumes it can be installed with sudo apt-get install -y jq if it isn't installed. if that isn't the case, make sure jq is installed already at a location included in ${PATH}.

installation

environment file

clone the gist and create an environment file containing variables like those seen in atlas-hetzner.env name it something simple, like source-target.env. do ensure it has a .env extension

  • dump_path is the folder that will hold the initial database dump (${dump_path}/init) and subsequent oplog dumps (${dump_path}/${datetime})
  • source_uri is the connection string for the source cluster, including credentials or certificate-paths. it must not contain a database specifier
  • target_uri is the connection string for the target cluster, including credentials or certificate-paths. it must not contain a database specifier

installation script

run the install script, supplying positional arguments for:

  • environment_file: the filename without folder or extension of the environment file you created earlier. ie: source-target
  • target_admin_username: the admin user on the target cluster. ie: admin
  • target_admin_password: the password for admin user on the target cluster. ie: password
  • service_user: the name of the user the service will run under. ie: coyote. omit this parameter to use the default, coyote
  • service_name: the name of the systemd service. ie: coyote. omit this parameter to use the default, coyote

monitoring

tail the system journal to observe the running service logs. ie:

journalctl -fu coyote.service

migrating applications

coyote log output contains a delta value for each collection and database. if your application design is such that documents are never modified, you can assume a zero value to mean that your source and target are in sync. if your application design includes document modification, you will need to implement an application specific mechanism for determining delta.

  • a delta value of 0 indicates that both source and target have an equal number of documents.
  • a positive delta value indicates that source contains delta documents more than target. either your target sync is incomplete or your application layer is still adding documents to source.
  • a negative delta value indicates that source contains abs(delta) documents fewer than target. your application layer has commenced adding documents to target. coyote will terminate on detection of a negative delta since target has more documents than source.

to migrate your applications without data loss:

  • wait for delta values to reach the smallest values they are likely to arrive at
  • pause/disable/stop all applications which write to source
    • disable a lambda by destroying its connection string:
      aws lambda update-function-configuration \
        --profile manta-service \
        --region eu-central-1 \
        --function-name calamari-faucet-prod-shortlist \
        --description "-" \
        --environment '{
            "Variables": {
              "db_readwrite": "mongodb://undefined"
            }
          }'
      
  • wait for all delta values to reach zero
  • stop, disable and remove the coyote service
    sudo systemctl stop coyote.service
    sudo systemctl disable coyote.service
    sudo rm /etc/systemd/system/coyote.service
  • modify the connection strings of all applications to write to target rather than source
    • re-enable a lambda by restoring its connection string:
      aws lambda update-function-configuration \
        --profile manta-service \
        --region eu-central-1 \
        --function-name calamari-faucet-prod-shortlist \
        --description "-" \
        --environment '{
            "Variables": {
              "db_readwrite": "mongodb://username:password@example.com:27017/?ssl=true&replicaSet=rs0&authSource=admin"
            }
          }'
      
  • resume/enable/restart all applications which now write to target

troubleshooting

if something goes wrong with one or more of the continuous oplog replays, causing some operations to be skipped, manually delete the initial dump folder (${dump_path}/init) to force a new init dump and resync.

dump_path=${dump_path}
source_uri='mongodb+srv://chaincluster.oulrzox.mongodb.net/?authSource=%24external&authMechanism=MONGODB-X509&tls=true&tlsCertificateKeyFile=%2Fetc%2Fssl%2Fmigrator-source.pem'
target_uri='mongodb://${target_admin_username}:${target_admin_password}@alpha.temujin.pelagos.systems:27017,beta.temujin.pelagos.systems:27017,gamma.temujin.pelagos.systems:27017/?authSource=admin&tls=true&replicaSet=s0'
#!/bin/bash
# usage: curl -sLH 'Cache-Control: no-cache, no-store' https://gist.github.com/grenade/cc51aa744834e1ca6311ed1fe84fff7a/raw/coyote-install.sh | bash -s atlas-hetzner admin $(pass manta/aws/pulse/mongo-admin-password) coyote coyote
environment_file=${1}.env
target_admin_username=${2}
target_admin_password=${3}
service_user=${4:-coyote}
service_name=${5:-coyote}
environment_file_path=/usr/share/${service_user}/${environment_file}
dump_path=/var/lib/${service_user}
if ! getent passwd ${service_user} &> /dev/null; then
sudo useradd \
--system \
--create-home \
--home-dir /var/lib/${service_user} \
--user-group \
${service_user}
fi
if [ ! -x /usr/bin/jq ]; then
sudo apt-get install -y jq
fi
sudo mkdir -p $(dirname ${environment_file_path})
sudo curl \
--silent \
--location \
--header 'Cache-Control: no-cache, no-store' \
--output ${environment_file_path} \
--url https://gist.github.com/grenade/cc51aa744834e1ca6311ed1fe84fff7a/raw/${1}.env
sudo sed -i "s/\${target_admin_username}/${target_admin_username}/" ${environment_file_path}
sudo sed -i "s/\${target_admin_password}/${target_admin_password}/" ${environment_file_path}
sudo sed -i "s#\${dump_path}#${dump_path}#" ${environment_file_path}
if systemctl is-active ${service_name}.service; then
sudo systemctl stop ${service_name}.service
fi
sudo curl \
--silent \
--location \
--header 'Cache-Control: no-cache, no-store' \
--output /usr/local/bin/${service_name}.sh \
--url https://gist.github.com/grenade/cc51aa744834e1ca6311ed1fe84fff7a/raw/coyote.sh
sudo chown ${service_user}:${service_user} /usr/local/bin/${service_name}.sh
sudo chmod +x /usr/local/bin/${service_name}.sh
sudo curl \
--silent \
--location \
--header 'Cache-Control: no-cache, no-store' \
--output /etc/systemd/system/${service_name}.service \
--url https://gist.github.com/grenade/cc51aa744834e1ca6311ed1fe84fff7a/raw/coyote.service
sudo sed -i "s/\${service_name}/${service_name}/g" /etc/systemd/system/${service_name}.service
sudo sed -i "s/\${service_user}/${service_user}/g" /etc/systemd/system/${service_name}.service
sudo sed -i "s#\${environment_file_path}#${environment_file_path}#" /etc/systemd/system/${service_name}.service
sudo systemctl daemon-reload
if ! systemctl is-enabled ${service_name}.service; then
sudo systemctl enable ${service_name}.service
fi
if ! systemctl is-active ${service_name}.service; then
sudo systemctl start ${service_name}.service
fi
[Unit]
Description=${service_name} mongodb live migrator
Wants=mongod.service
After=mongod.service
[Service]
Type=simple
User=${service_user}
Group=${service_user}
EnvironmentFile=${environment_file_path}
ExecStartPre=/usr/bin/curl \
--silent \
--location \
--header 'Cache-Control: no-cache, no-store' \
--output /usr/local/bin/${service_name}.sh \
--url https://gist.github.com/grenade/cc51aa744834e1ca6311ed1fe84fff7a/raw/coyote.sh
ExecStart=/usr/local/bin/${service_name}.sh
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
#!/bin/bash
if [ -d ${dump_path}/init ]; then
is_init=false
if [ -s ${dump_path}/to ]; then
from=$(head -n 1 ${dump_path}/to)
else
unset from
fi
else
is_init=true
from=$(date --utc +'%Y-%m-%dT%T.%3NZ')
mongodump \
--uri ${source_uri} \
--forceTableScan \
--oplog \
--gzip \
--verbose \
--out ${dump_path}/init
mongorestore \
--uri ${target_uri} \
--oplogReplay \
--numParallelCollections 8 \
--numInsertionWorkersPerCollection 8 \
--noIndexRestore \
--gzip \
--verbose \
${dump_path}/init
fi
to=$(date --utc +'%Y-%m-%dT%T.%3NZ')
echo ${to} > ${dump_path}/to
dump_query='{ "$or": [] }'
mongosh \
--quiet \
--eval '
JSON.stringify(
db.getMongo().getDBNames().filter((d) => !["admin", "config", "local"].includes(d)).map(
(database) => {
const sibling = db.getSiblingDB(database);
return {
database,
collections: sibling.getCollectionNames().map((collection) => ({
collection,
size: sibling.getCollection(collection).countDocuments()
}))
};
}
)
);
' \
"${source_uri}" \
> ${dump_path}/source.json
mongosh \
--quiet \
--eval '
JSON.stringify(
db.getMongo().getDBNames().filter((d) => !["admin", "config", "local"].includes(d)).map(
(database) => {
const sibling = db.getSiblingDB(database);
return {
database,
collections: sibling.getCollectionNames().map((collection) => ({
collection,
size: sibling.getCollection(collection).countDocuments()
}))
};
}
)
);
' \
"${target_uri}" \
> ${dump_path}/target.json
source_databases=( $(jq -r '.[].database' ${dump_path}/source.json) )
target_databases=( $(jq -r '.[].database' ${dump_path}/target.json) )
diff_databases=( $(echo ${source_databases[@]} ${target_databases[@]} | tr ' ' '\n' | sort | uniq -u | sort) )
if [ "${is_init}" = "false" ] && [ "${#diff_databases[@]}" != "0" ]; then
echo "error: source and target database lists differ. resync queued..."
echo "source: ${source_databases[*]}"
echo "target: ${target_databases[*]}"
echo "diff: ${diff_databases[*]}"
rm -rf ${dump_path}/init
exit 1
fi
for database in ${target_databases[@]}; do
database_delta=0
source_collections=($(jq \
--arg database ${database} \
-r \
'.[] | select(.database == $database) | .collections[] | .collection' \
${dump_path}/source.json))
target_collections=($(jq \
--arg database ${database} \
-r \
'.[] | select(.database == $database) | .collections[] | .collection' \
${dump_path}/target.json))
diff_collections=( $(echo ${source_collections[@]} ${target_collections[@]} | tr ' ' '\n' | sort | uniq -u | sort) )
if [ "${is_init}" = "false" ] && [ "${#diff_collections[@]}" != "0" ]; then
echo "error: source and target collection lists differ in ${database} database. resync queued..."
echo "source: ${source_collections[*]}"
echo "target: ${target_collections[*]}"
echo "diff: ${diff_collections[*]}"
rm -rf ${dump_path}/init
exit 1
fi
for collection in ${target_collections[@]}; do
if [ -z ${from+x} ]; then
from=$(mongosh \
--quiet \
--eval 'use local;' \
--eval "JSON.stringify(db.oplog.rs.aggregate([{ \$match: { ns: \"${database}.${collection}\" } }, { \$group : { _id: null, max: { \$max : \"\$wall\" }}}]).toArray())" \
"${target_uri}" \
| jq -r '.[0].max')
fi
collection_dump_query=$(jq -c -n --arg from ${from} --arg to ${to} --arg ns "${database}.${collection}" '{ "ns": $ns, "wall": { "$gt": { "$date": $from }, "$lte": { "$date": $to } } }')
dump_query=$(echo ${dump_query} | jq \
-c \
--argjson query ${collection_dump_query} \
'. | ."$or" += [ $query ]')
source_document_count=$(jq \
--arg database ${database} \
--arg collection ${collection} \
-r \
'.[] | select(.database == $database) | .collections[] | select(.collection == $collection) | .size' \
${dump_path}/source.json)
target_document_count=$(jq \
--arg database ${database} \
--arg collection ${collection} \
-r \
'.[] | select(.database == $database) | .collections[] | select(.collection == $collection) | .size' \
${dump_path}/target.json)
collection_delta=$(( source_document_count - target_document_count))
database_delta=$(( database_delta + collection_delta))
echo "from: ${from}, to: ${to}, delta: ${collection_delta}, source: ${source_document_count}, target: ${target_document_count}, collection: ${database}.${collection}"
if (( collection_delta < 0 )); then
echo "terminating sync. collection delta is negative. another process may be populating target..."
exit 1
fi
done
echo "database: ${database}, delta: ${database_delta}"
done
mongodump \
--uri ${source_uri} \
--db local \
--collection oplog.rs \
--query ${dump_query} \
--gzip \
--verbose \
--out ${dump_path}/${to}
mongorestore \
--uri ${target_uri} \
--oplogReplay \
--gzip \
--verbose \
${dump_path}/${to}
#!/bin/bash
db_admin_password=$(pass manta/aws/pulse/mongo-admin-password)
db_metrics_password=$(pass manta/aws/pulse/mongo-metrics-password)
domain=temujin.pelagos.systems
ssh -o ConnectTimeout=1 alpha.${domain} "
systemctl is-active --quiet coyote.service && sudo systemctl stop coyote.service;
sudo rm -rf /var/lib/coyote/init;
sudo rm -rf /var/lib/coyote/2023*;
"
for fqdn in {alpha,beta,gamma}.${domain}; do
echo ${fqdn}
ssh -o ConnectTimeout=1 ${fqdn} "
echo 'mongodb soft nofile 1048576' | sudo tee /etc/security/limits.d/mongodb-soft-nofile.conf;
echo 'mongodb soft nproc 514429' | sudo tee /etc/security/limits.d/mongodb-soft-nproc.conf;
echo 'mongodb soft stack 1048576' | sudo tee /etc/security/limits.d/mongodb-soft-stack.conf;
echo 'vm.max_map_count=131072' | sudo tee /etc/sysctl.d/vm-max-map-count.conf;
sudo systemctl stop mongod.service;
sudo rm -rf /var/log/mongodb/mongod.log;
sudo touch /var/log/mongodb/mongod.log;
sudo chown -R mongodb:mongodb /var/log/mongodb;
sudo rm -rf /var/lib/mongodb/*;
sudo sed -i '/tls/d' /etc/mongod.conf;
sudo sed -i '/TLS/d' /etc/mongod.conf;
sudo sed -i '/certificateKeyFile/d' /etc/mongod.conf;
sudo sed -i '/CAFile/d' /etc/mongod.conf;
sudo sed -i '/allowConnectionsWithoutCertificates/d' /etc/mongod.conf;
sudo sed -i '/security/d' /etc/mongod.conf;
sudo sed -i '/authorization/d' /etc/mongod.conf;
sudo sed -i '/keyFile/d' /etc/mongod.conf;
sudo sed -i '/replication/d' /etc/mongod.conf;
sudo sed -i '/replSetName/d' /etc/mongod.conf;
cat /etc/mongod.conf;
sudo systemctl start mongod.service;
sleep 10;
mongosh --quiet --eval 'JSON.stringify(db.createUser({user: \"admin\", pwd: \"${db_admin_password}\", roles: [{role: \"root\", db: \"admin\"}]}))' mongodb://127.0.0.1:27017/admin | jq .;
sudo systemctl stop mongod.service;
sudo curl -sLo /etc/mongod.conf https://gist.githubusercontent.com/grenade/6c1d4fb5d3756042803dc4c569624c46/raw/bdcc390dcdda0cb6b410f60504df893ee11f7972/mongod.conf;
sudo sed -i 's/lets-encrypt-r3/dst-root-x3/' /etc/mongod.conf;
cat /etc/mongod.conf;
sudo systemctl start mongod.service;
sleep 10;
if [ ${fqdn} = alpha.temujin.pelagos.systems ]; then
mongosh --quiet --eval 'JSON.stringify(rs.initiate({_id: \"s0\", members: [{ _id: 0, host: \"${fqdn}\", priority: 1 }]}))' mongodb://admin:${db_admin_password}@${fqdn}:27017/admin?tls=true&replicaSet=s0&authSource=admin&retryWrites=true&w=majority | jq .;
else
mongosh --quiet --eval 'JSON.stringify(rs.add(\"${fqdn}:27017\"))' mongodb://admin:${db_admin_password}@alpha.temujin.pelagos.systems:27017/admin?tls=true&replicaSet=s0&authSource=admin&retryWrites=true&w=majority | jq .;
fi
"
done
mongosh \
--eval 'JSON.stringify(db.getSiblingDB("$external").runCommand({ createUser: "CN=quadbrat.audac.io", roles: [ { role: "root", db: "admin" } ] }))' \
"mongodb://admin:${db_admin_password}@alpha.temujin.pelagos.systems:27017,beta.temujin.pelagos.systems:27017,gamma.temujin.pelagos.systems:27017/admin?tls=true&replicaSet=s0&authSource=admin&retryWrites=true&w=majority"
mongosh \
--eval "JSON.stringify(db.createUser({user: \"metrics\", pwd: \"${db_metrics_password}\", roles: [{role: \"clusterMonitor\", db: \"admin\"},{role: \"read\", db: \"local\"}]}))" \
"mongodb://admin:${db_admin_password}@alpha.temujin.pelagos.systems:27017,beta.temujin.pelagos.systems:27017,gamma.temujin.pelagos.systems:27017/admin?tls=true&replicaSet=s0&authSource=admin&retryWrites=true&w=majority"
ssh -o ConnectTimeout=1 alpha.${domain} sudo systemctl start coyote.service
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment