Skip to content

Instantly share code, notes, and snippets.

@eladnava
Last active March 11, 2024 10:21
Star You must be signed in to star a gist
Save eladnava/96bd9771cd2e01fb4427230563991c8d to your computer and use it in GitHub Desktop.
Automatically backup a MongoDB database to S3 using mongodump, tar, and awscli (Ubuntu 14.04 LTS)
#!/bin/sh
# Make sure to:
# 1) Name this file `backup.sh` and place it in /home/ubuntu
# 2) Run sudo apt-get install awscli to install the AWSCLI
# 3) Run aws configure (enter s3-authorized IAM user and specify region)
# 4) Fill in DB host + name
# 5) Create S3 bucket for the backups and fill it in below (set a lifecycle rule to expire files older than X days in the bucket)
# 6) Run chmod +x backup.sh
# 7) Test it out via ./backup.sh
# 8) Set up a daily backup at midnight via `crontab -e`:
# 0 0 * * * /home/ubuntu/backup.sh > /home/ubuntu/backup.log
# DB host (secondary preferred as to avoid impacting primary performance)
HOST=db.example.com
# DB name
DBNAME=my-db
# S3 bucket name
BUCKET=s3-bucket-name
# Linux user account
USER=ubuntu
# Current time
TIME=`/bin/date +%d-%m-%Y-%T`
# Backup directory
DEST=/home/$USER/tmp
# Tar file of backup directory
TAR=$DEST/../$TIME.tar
# Create backup dir (-p to avoid warning if already exists)
/bin/mkdir -p $DEST
# Log
echo "Backing up $HOST/$DBNAME to s3://$BUCKET/ on $TIME";
# Dump from mongodb host into backup directory
/usr/bin/mongodump -h $HOST -d $DBNAME -o $DEST
# Create tar of backup directory
/bin/tar cvf $TAR -C $DEST .
# Upload tar to s3
/usr/bin/aws s3 cp $TAR s3://$BUCKET/
# Remove tar file locally
/bin/rm -f $TAR
# Remove backup directory
/bin/rm -rf $DEST
# All done
echo "Backup available at https://s3.amazonaws.com/$BUCKET/$TIME.tar"
@francisdb
Copy link

you probably want to add set -e as second line in the script to avoid it happily continuing on errors

@francisdb
Copy link

and a simpler way to do all this is: mongodump --archive --gzip | aws s3 cp - s3://my-bucket/some-file

@muhsin-k
Copy link

muhsin-k commented Jun 3, 2017

I am getting an error.
A client error (InvalidRequest) occurred when calling the CreateMultipartUpload operation: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.

@cjgordon
Copy link

Small mods to make this work with array of db names:

#!/bin/bash

# Make sure to:
# 1) Name this file `backup.sh` and place it in /home/ubuntu
# 2) Run sudo apt-get install awscli to install the AWSCLI
# 3) Run aws configure (enter s3-authorized IAM user and specify region)
# 4) Fill in DB host + name
# 5) Create S3 bucket for the backups and fill it in below (set a lifecycle rule to expire files older than X days in the bucket)
# 6) Run chmod +x backup.sh
# 7) Test it out via ./backup.sh
# 8) Set up a daily backup at midnight via `crontab -e`:
#    0 0 * * * /home/ubuntu/backup.sh > /home/ubuntu/backup.log

# DB host (secondary preferred as to avoid impacting primary performance)
HOST=localhost

# DB name
DBNAMES=("db1" "db2" "db3")

# S3 bucket name
BUCKET=bucket

# Linux user account
USER=ubuntu

# Current time
TIME=`/bin/date +%d-%m-%Y-%T`

# Backup directory
DEST=/home/$USER/tmp

# Tar file of backup directory
TAR=$DEST/../$TIME.tar

# Create backup dir (-p to avoid warning if already exists)
/bin/mkdir -p $DEST

# Log
echo "Backing up $HOST/$DBNAME to s3://$BUCKET/ on $TIME";

# Dump from mongodb host into backup directory
for DBNAME in "${DBNAMES[@]}"
do
   /usr/bin/mongodump -h $HOST -d $DBNAME -o $DEST
done

# Create tar of backup directory
/bin/tar cvf $TAR -C $DEST .

# Upload tar to s3
/usr/bin/aws s3 cp $TAR s3://$BUCKET/

# Remove tar file locally
/bin/rm -f $TAR

# Remove backup directory
/bin/rm -rf $DEST

# All done
echo "Backup available at https://s3.amazonaws.com/$BUCKET/$TIME.tar"

@iamtankist
Copy link

I'd propose /bin/date -u +"%Y-%m-%dT%H%M%SZ" as a time format. It's sortable and doesn't contain : character

@eladnava
Copy link
Author

@calvinh8 Yes, this script only backs up a single database. Create multiple scripts for multiple databases, or check out @cjgordon's answer. And no, your replica set members should contain the same data, so it is only necessary to back up from one member. By default, the primary will only accept reads so specify the primary in the script.

@andfilipe1 Please give it a try and let us know.

@francisdb Thanks for the tips 👍

@muhzi4u You probably have an outdated awscli package, update it using apt-get.

@cjgordon Nicely done. 💯

@roysG
Copy link

roysG commented Aug 16, 2017

@andfilipe1 , this will answer you question better :).

According to mongo site:

"mongodump reads data from a MongoDB database and creates high fidelity BSON files which the mongorestore tool can use to populate a MongoDB database. mongodump and mongorestore are simple and efficient tools for backing up and restoring small MongoDB deployments, but are not ideal for capturing backups of larger systems."

https://docs.mongodb.com/manual/core/backups/

@sreekanth1990
Copy link

Hi how to solve error Invalid " endpoint: https://s3.US East.amazonaws.com"

@eladnava
Copy link
Author

eladnava commented May 6, 2018

@sreekanth1990 Please double check your S3 URL for storing the backup and make sure your DNS server is accessible and working properly.

@p-subudhi
Copy link

@eladnava How to backup some part of collection, condition would be older than 3 months documents need to be backed up.
How to do that?

@eladnava
Copy link
Author

eladnava commented May 6, 2018

@ps-34 You would need to write a custom script for that. Query the database in the script and save the result set to local disk or S3. You would then need a restore script as well.

@stanvarlamov
Copy link

stanvarlamov commented May 8, 2018

I think that the one-liner suggested by @francisdb 1 year ago is the way to go. The "versioning" and life cycle should be auto-managed by S3: use a pre-set, hard-coded archive name in the script. The restore script would take the object's version as the argument (vs. the timestamp). There can also be a simple parsing step coded to get the version by date from the list.
This approach should also take care of accidental multiple or parallel processes as each invocation should technically create a separate S3 object's version. Multi-part upload "emulation" can be achieved by splitting the dump by DB name as suggested above

@adityaachaubey
Copy link

adityaachaubey commented May 30, 2018

works fine on ubuntu 16.04 as well ! Thanks

@piggydoughnut
Copy link

works like a charm 😸 ❤️ Thank you

@vmacielll
Copy link

Thank you man !

@nusu
Copy link

nusu commented Oct 23, 2018

script is great yet I have had a problem with uploading to s3, backup is nearly 80mb and it brokes connection, so I switched from aws-cli to s3cmd and it's working perfectly fine

wget -O- -q http://s3tools.org/repo/deb-all/stable/s3tools.key | sudo apt-key add -
sudo wget -O/etc/apt/sources.list.d/s3tools.list http://s3tools.org/repo/deb-all/stable/s3tools.list
sudo apt-get install s3cmd

configure:

s3cmd --configure

in shell file:

#/usr/bin/aws s3 cp $TAR s3://$BUCKET/
s3cmd put $TAR s3://$BUCKET/$TIME.tar

@jainmnsh
Copy link

it worked great. this is great work.

@jainmnsh
Copy link

backup and restore works great for me. I just have one question - I have many databases and was wondering if there is a way to backup all databases without defining a list of databases? or recommendation to backup separately for restore purposes? whats the best practice
use to backup 10-12 databases via one script?

@eladnava
Copy link
Author

@jainmnsh Modify the script so it accepts a list of databases instead of a single one, and have it loop over the list, mongodumping each database in a for loop and uploading it to S3.

@jainmnsh
Copy link

thanks @eladnava, yep it worked out really well.

the 2nd part of the question - generally do you prefer to separate tar for each database or one tar bundle all databases? I know the script can do either way but I am thinking it may be better to save each database backup separately in S3?

@eladnava
Copy link
Author

Yes @jainmnsh it's a better idea to separate the tars because if you need to restore just one database, in an emergency, depending on the size of the other databases, it would take too long to untar the full archive just to restore one database.

@nzarif
Copy link

nzarif commented Mar 3, 2019

when running aws configure shall I set the default output format to zip? or tar.gz? by default it is json.

@TopHatMan
Copy link

I am using Debian on a ec2 container. I keep getting cowardly errors on archiving the database. The data is there and I got it to tar with out the -C by hand. But any time I use the script it fails.

@eladnava
Copy link
Author

eladnava commented Jun 2, 2019

@TopHatMan What is the exact error message you are facing?

@SuperStar518
Copy link

@eladnava
will it be okay if the mongodump size is over 11G?

@eladnava
Copy link
Author

Hi @alidavid0418,
Should be fine, you will need at least 23GB free space on your / mounted partition. S3 definitely supports large files. :)

@SuperStar518
Copy link

@eladnava
Thank you for your kind attention and confirmation. +1

@kshitijjind
Copy link

error parsing command line options: expected argument for flag -h, --host', but got option -d'

@siddheshjayawantv3it
Copy link

how to backup MongoDB(ECS container) data backup on s3 bucket
how to run the backup script and where?

@borekbb
Copy link

borekbb commented Mar 5, 2024

and a simpler way to do all this is: mongodump --archive --gzip | aws s3 cp - s3://my-bucket/some-file

clean and simple! thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment