Skip to content

Instantly share code, notes, and snippets.

@tonosaman
Last active March 7, 2023 02:47
Show Gist options
  • Save tonosaman/c328ccd97fad6f17b7ad3301c220aae0 to your computer and use it in GitHub Desktop.
Save tonosaman/c328ccd97fad6f17b7ad3301c220aae0 to your computer and use it in GitHub Desktop.
MongoDB: Synchronize collections on different databases

MongoDB: Synchronize collections on different databases

Create shell script step by step.

Step1: Use mongoexport(1) to export target documents as line-by-line JSON

  • Extract target documents according to the cyclic synchronization.
    • In this sample case, we assume that cron(8) trigger every hour.
    • Have 5 minutes as a margin to avoid omissions.
  • Query with ISO 8601 date format made by date(1)
  • mongoexport(1) can't recognize BSON ISODate("2023-02-23")
  • Need quate(") the field names in query. (not for sort order fields)
AUTH="--username=root --password=example --authenticationDatabase=admin"
Q=$(date -d-1hour5min +'{"createdAt": {"$gt": {"$date": "%Y-%m-%dT%H:%M:%S%z"}}}') \
sudo podman exec -i mongodb mongoexport --quiet $AUTH --db=db1 --collection=items --query="$Q" --sort='{createdAt: 1}'

Step2: Use diff(1) to take the difference of collections

  • In my case, MongoDB running on docker. If you don't need it, remove podman(1) execution.
sudo podman exec -i mongodb bash <<'EOF'
AUTH="--username=root --password=example --authenticationDatabase=admin"
export () {
  Q="$(date -d-1hour5min +'{"updatedAt":{"$gt":{"$date":"%Y-%m-%dT%H:%M:%S%z"}}}')"
  mongoexport --quiet $AUTH --db="$1" --collection=items --query="$Q" --sort='{updatedAt:1}'
}
diff --suppress-common-lines <(export db1) <(export db2)
EOF

Step3: Use awk(1) to dispatch differences to collections

  • Use mongoimport(1) to upsert collection.
sudo podman exec -i mongodb bash <<'EOF'
AUTH="--username=root --password=example --authenticationDatabase=admin"
export () {
  Q="$(date -d-1hour5min +'{"updatedAt":{"$gt":{"$date":"%Y-%m-%dT%H:%M:%S%z"}}}')"
  mongoexport --quiet $AUTH --db="$1" --collection=items --query="$Q" --sort='{updatedAt:1}'
}
diff --suppress-common-lines <(export db1) <(export db2) \
| awk -v sync_to="mongoimport $AUTH --collection=items --type=json --mode=upsert --db=" '
/^< / { l2r = l2r substr($0, 3) "\n" }
/^> / { r2l = r2l substr($0, 3) "\n" }
END {
  printf "%s", l2r | sync_to "db2"
  printf "%s", r2l | sync_to "db1"
}'
EOF

This script works reasonably well. However, if the same document is changed during the synchronization period, it must be overwritten with the later changed value, which is not handled. Let's remedy that in the next step.

Step4: Use jq(1) to deal confliction where the same document was changed

  • Sorting by updateAt, because the same _id entry on db1.items and db2.items will be overwritten by a later update if it has been changed.

  • The option --slurp lift items into a formal json array, so mongoimport(1) will accept that input with --jsonArray.

sudo podman exec -i mongodb bash <<'EOF'
AUTH="--username=root --password=example --authenticationDatabase=admin"
export () {
  Q="$(date -d-1hour5min +'{"updatedAt":{"$gt":{"$date":"%Y-%m-%dT%H:%M:%S%z"}}}')"
  mongoexport --quiet $AUTH --db="$1" --collection=items --query="$Q" --sort='{updatedAt:1}'
}
import () {
  mongoimport $AUTH --db="$1" --collection=items --jsonArray --mode=upsert
}
diff --suppress-common-lines <(export db1) <(export db2) \
| sed -n 's/^[<>] //p' | jq --slurp 'sort_by(.updatedAt)' \
| tee >(import db1) >(import db2) >/dev/null
EOF

An unnecessary effect of this script is that it udpates unchanged documents, but that would be an acceptable trade-off for simplicity.

Step5: Schedule to kick the script periodically

  • In NixOS, add the Systemd/Timers configuration to configuration.nix(5).
  • Note: Indentation must be removed for here-document terminator (EOF) to work properly in script (at least for EOF line).
  systemd = {
    timers."mongodbsync" = {
      wantedBy = [ "timers.target" ];
      partOf = [ "mongodbsync.service" ];
      timerConfig.OnCalendar = [ "*-*-* *:00:00" ];
    };
    services."mongodbsync" = {
      serviceConfig.Type = "oneshot";
      serviceConfig.User = "root";
      script = ''
${pkgs.podman}/bin/podman exec -i mongodb bash <<-'EOF'
AUTH="--username=root --password=example --authenticationDatabase=admin"
export () {
  Q="$(date -d-1hour5min +'{"updatedAt":{"$gt":{"$date":"%Y-%m-%dT%H:%M:%S%z"}}}')"
  mongoexport --quiet $AUTH --db="$1" --collection=items --query="$Q" --sort='{updatedAt:1}'
}
import () {
  mongoimport --quiet $AUTH --db="$1" --collection=items --jsonArray --mode=upsert
}
diff --suppress-common-lines <(export db1) <(export db2) \
| sed -n 's/^[<>] //p' | jq --slurp 'sort_by(.updatedAt)' \
| tee >(import db1) >(import db2) >/dev/null
EOF'';
    };
  };

Remember to do nixos-rebuild switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment