xchehub/s3-curl-backups.md

## s3-curl-backups.md

      
    Raw
  

              s3-curl-backups.md
            
          
    ⇐ back to the gist-blog at jrw.fi
Simple, semi-anonymous backups with S3 and curl

Backing stuff up is a bit of a hassle, to set up and to maintain. While full-blown backup suites such as duplicity or CrashPlan will do all kinds of clever things for you (and I'd recommend either for more complex setups), sometimes you just want to put that daily database dump somewhere off-site and be done with it. This is what I've done, with an Amazon S3 bucket and curl. Hold onto your hats, there's some Bucket Policy acrobatics ahead.
There's also a tl;dr at the very end if you just want the delicious copy-pasta.
Bucket setup

Go and create a bucket on S3. Let's assume your bucket is called db-backups-739a79f7d0c8d196252026ea0ba367e1. I've added a random hash to the name to make sure it's not trivial to guess. This becomes important later. Also, bucket names are globally unique, so a bucket called backups is probably already taken.
Bucket Policy

To make the process as hassle-free as possible, we want anonymous PUT to our bucket. Unfortunately, that leaves ownership of created objects (that is, something you upload to S3) to the "anonymous" user. This means you can't e.g. download them from your S3 Console anymore, while anonymous users are allowed to do dangerous things such as download and delete them. The following Bucket Policy statement implements this incomplete behaviour:
{
  "Sid": "allow-anon-put",
  "Effect": "Allow",
  "Principal": {
    "AWS": "*"
  },
  "Action": "s3:PutObject",
  "Resource": "arn:aws:s3:::db-backups-739a79f7d0c8d196252026ea0ba367e1/*"
}

You can think of the * principal as "any user", including anonymous ones. The resource this statement applies to is any object in your bucket.
When deciding whether or not to authorize a request, regardless of object permissions, S3 consults Bucket Policy first. If the Bucket Policy denies the request, that's the end of it. We can use this to make sure an anonymous user can only upload objects to the bucket, but not e.g. download and delete them. The following additional Bucket Policy statement implements this restriction:
{
  "Sid": "deny-other-actions",
  "Effect": "Deny",
  "Principal": {
    "AWS": "*"
  },
  "NotAction": "s3:PutObject",
  "Resource": "arn:aws:s3:::db-backups-739a79f7d0c8d196252026ea0ba367e1/*"
}

That is, after Allowing s3:PutObject for anyone, we explicitly Deny all other actions with a NotAction. This is all well and good, but now:

Authorized access (with your credentials) to the objects in the bucket is denied, since the objects belong to the "anonymous" user, and
Anonymous access to the objects is denied because of the Bucket Policy

This isn't nice, since eventually we'll also want to get the backups back. You could just stop here and figure the rest out when you have to (the data will be safe until then), but as folklore has it, backups that aren't regularly restored are like no backups at all. So we'd like a way to pull the objects back as easily as we upload them.
It might seem intuitive to add an Allow statement for your AWS user account ARN, but due to how requests to objects are authorized, it wouldn't have any effect: even if the Bucket Policy allows the authorized request, the object still belongs to the "anonymous" user, and you can't just go ahead and access objects belonging to another user.
Luckily, there's a simpler solution: making the Deny rule conditional. To what? A Condition can be used to assert all kinds of things, such as the source IP of the request or its timestamp. To keep things as simple and portable as possible, I've used a secret string in the User-Agent header, as it's super simple to override with curl. The following addition to the Deny statement implements this:
{
  "Condition": {
    "StringNotEquals": {
      "aws:UserAgent": "acfe927bfbc080d001c5852b40ede4cb"
    }
  }
}


That is, the Deny statement is only enforced when the User-Agent header is something other than the secret value we invented.
The complete Bucket Policy is at the very end.
Versioning & lifecycle rules

Since we allow anyone (well not exactly; see below) to upload backups into our bucket, and since it's nice to hold on to backups for a while, go ahead and enable for your bucket:

Versioning, so that whenever an object is overwritten its past revisions are left in its version history.
Lifecycle rules, so that objects older than a certain amount of days (say, 30) are automatically cleaned out. Note that you can also set up a lifecycle rule to automatically move your objects to Glacier if they're large enough to make it cost-effective.

These are both essential in being able to just automate & forget about the backups, until you need them. Limiting the amount of past versions you store also keeps your costs down, though S3 (and especially Glacier) are fairly inexpensive for small-to-medium amounts of data.
Usage examples

Assuming you've already managed to dump your database backup (or whatever you want to back up) into a file called my-backup.tar.gz, uploading it to your S3 bucket would be as simple as:
$ curl --request PUT --upload-file "my-backup.tar.gz" "https://s3-eu-west-1.amazonaws.com/db-backups-739a79f7d0c8d196252026ea0ba367e1/"

You can repeat the command as many times as you like, and S3 will retain each version for you. So this is something that's very simple to put in a @hourly or @daily cron rule. You can also upload multiple files; man curl knows more.
Restoring the same file requires the secret:
$ curl --user-agent acfe927bfbc080d001c5852b40ede4cb https://s3-eu-west-1.amazonaws.com/db-backups-739a79f7d0c8d196252026ea0ba367e1/my-backup.tar.gz > my-backup.tar.gz

Note that since we're using the HTTPS endpoint for the bucket, we need to specify the region (in my case it would be Ireland, or eu-west-1). The plain HTTP endpoint doesn't need this, but is obviously less secure.
Also, as S3 may sometimes fail a few requests here and there, some combination of --retry switches is probably a good idea; consult man curl. (thanks for the suggestion @Para)
Security considerations

A few things to keep in mind:

Even though we talk about "anonymous uploads", that's not strictly the case: unless you advertise your bucket name to a third party, only you and Amazon will know of its existence (there's no public listing of buckets). So not anyone can upload to the bucket, but only anyone who knows the bucket name. Keep it secret. Keep it safe.
The above is comparable to the security many automated backup systems have: a plaintext/trivially encrypted password stored in a dotfile or similar.
Using the HTTPS endpoint makes sure a man-in-the-middle won't learn about the bucket name. While the HTTP endpoint uses the bucket name as part of the domain, the HTTPS doesn't, so that the resulting DNS query won't disclose the bucket name either.
The "restore secret" is different from the "upload secret". That's convenient, since only the latter needs to be stored on the server that's sending backups to your bucket. The former can be kept only in your Bucket Policy (and perhaps in your password manager). Keep it secret. Keep it safe.
Anyone holding only the upload secret can only upload new backups, not retrieve or delete old ones. With versioning enabled, an overwrite is not a delete.
Due to the above, the only obvious attack a malicious user could mount with the upload secret is uploading a bunch of large files to your bucket. While that won't mean data loss, it may mean significant charges on your next bill. While AWS doesn't have "billing limits" built in, you can set up alerts on the billing estimates if you want to.
None of this requires setting up or modifying any Amazon AWS credentials, which is convenient. That said, since your AWS credentials hold the power to update the Bucket Policy, they can still be used to recover (or destroy) any data that's stored in the bucket.

Summary

Assuming a bucket called db-backups-739a79f7d0c8d196252026ea0ba367e1, you can use the following Bucket Policy:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "allow-anon-put",
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::db-backups-739a79f7d0c8d196252026ea0ba367e1/*"
    },
    {
      "Sid": "deny-other-actions",
      "Effect": "Deny",
      "Principal": {
        "AWS": "*"
      },
      "NotAction": "s3:PutObject",
      "Resource": "arn:aws:s3:::db-backups-739a79f7d0c8d196252026ea0ba367e1/*",
      "Condition": {
        "StringNotEquals": {
          "aws:UserAgent": "acfe927bfbc080d001c5852b40ede4cb"
        }
      }
    }
  ]
}

Make sure versioning and lifecycle rules are enabled on the bucket.
This allows you to back up with:
$ curl --request PUT --upload-file "my-backup.tar.gz" "https://s3-eu-west-1.amazonaws.com/db-backups-739a79f7d0c8d196252026ea0ba367e1/"

And restore with:
$ curl --user-agent acfe927bfbc080d001c5852b40ede4cb https://s3-eu-west-1.amazonaws.com/db-backups-739a79f7d0c8d196252026ea0ba367e1/my-backup.tar.gz > my-backup.tar.gz

The bucket name and the "user agent" are secrets.
You can see the version history of your objects - among other things - in the AWS Management Console.
The end

If you have questions/comments, leave them here, or ask @Jareware. :)