Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Simple, semi-anonymous backups with S3 and curl

⇐ back to the gist-blog at jrw.fi

Simple, semi-anonymous backups with S3 and curl

Backing stuff up is a bit of a hassle, to set up and to maintain. While full-blown backup suites such as duplicity or CrashPlan will do all kinds of clever things for you (and I'd recommend either for more complex setups), sometimes you just want to put that daily database dump somewhere off-site and be done with it. This is what I've done, with an Amazon S3 bucket and curl. Hold onto your hats, there's some Bucket Policy acrobatics ahead.

There's also a tl;dr at the very end if you just want the delicious copy-pasta.

Bucket setup

Go and create a bucket on S3. Let's assume your bucket is called db-backups-739a79f7d0c8d196252026ea0ba367e1. I've added a random hash to the name to make sure it's not trivial to guess. This becomes important later. Also, bucket names are globally unique, so a bucket called backups is probably already taken.

Bucket Policy

To make the process as hassle-free as possible, we want anonymous PUT to our bucket. Unfortunately, that leaves ownership of created objects (that is, something you upload to S3) to the "anonymous" user. This means you can't e.g. download them from your S3 Console anymore, while anonymous users are allowed to do dangerous things such as download and delete them. The following Bucket Policy statement implements this incomplete behaviour:

{
  "Sid": "allow-anon-put",
  "Effect": "Allow",
  "Principal": {
    "AWS": "*"
  },
  "Action": "s3:PutObject",
  "Resource": "arn:aws:s3:::db-backups-739a79f7d0c8d196252026ea0ba367e1/*"
}

You can think of the * principal as "any user", including anonymous ones. The resource this statement applies to is any object in your bucket.

When deciding whether or not to authorize a request, regardless of object permissions, S3 consults Bucket Policy first. If the Bucket Policy denies the request, that's the end of it. We can use this to make sure an anonymous user can only upload objects to the bucket, but not e.g. download and delete them. The following additional Bucket Policy statement implements this restriction:

{
  "Sid": "deny-other-actions",
  "Effect": "Deny",
  "Principal": {
    "AWS": "*"
  },
  "NotAction": "s3:PutObject",
  "Resource": "arn:aws:s3:::db-backups-739a79f7d0c8d196252026ea0ba367e1/*"
}

That is, after Allowing s3:PutObject for anyone, we explicitly Deny all other actions with a NotAction. This is all well and good, but now:

  1. Authorized access (with your credentials) to the objects in the bucket is denied, since the objects belong to the "anonymous" user, and
  2. Anonymous access to the objects is denied because of the Bucket Policy

This isn't nice, since eventually we'll also want to get the backups back. You could just stop here and figure the rest out when you have to (the data will be safe until then), but as folklore has it, backups that aren't regularly restored are like no backups at all. So we'd like a way to pull the objects back as easily as we upload them.

It might seem intuitive to add an Allow statement for your AWS user account ARN, but due to how requests to objects are authorized, it wouldn't have any effect: even if the Bucket Policy allows the authorized request, the object still belongs to the "anonymous" user, and you can't just go ahead and access objects belonging to another user.

Luckily, there's a simpler solution: making the Deny rule conditional. To what? A Condition can be used to assert all kinds of things, such as the source IP of the request or its timestamp. To keep things as simple and portable as possible, I've used a secret string in the User-Agent header, as it's super simple to override with curl. The following addition to the Deny statement implements this:

{
  "Condition": {
    "StringNotEquals": {
      "aws:UserAgent": "acfe927bfbc080d001c5852b40ede4cb"
    }
  }
}

That is, the Deny statement is only enforced when the User-Agent header is something other than the secret value we invented.

The complete Bucket Policy is at the very end.

Versioning & lifecycle rules

Since we allow anyone (well not exactly; see below) to upload backups into our bucket, and since it's nice to hold on to backups for a while, go ahead and enable for your bucket:

  1. Versioning, so that whenever an object is overwritten its past revisions are left in its version history.
  2. Lifecycle rules, so that objects older than a certain amount of days (say, 30) are automatically cleaned out. Note that you can also set up a lifecycle rule to automatically move your objects to Glacier if they're large enough to make it cost-effective.

These are both essential in being able to just automate & forget about the backups, until you need them. Limiting the amount of past versions you store also keeps your costs down, though S3 (and especially Glacier) are fairly inexpensive for small-to-medium amounts of data.

Usage examples

Assuming you've already managed to dump your database backup (or whatever you want to back up) into a file called my-backup.tar.gz, uploading it to your S3 bucket would be as simple as:

$ curl --request PUT --upload-file "my-backup.tar.gz" "https://s3-eu-west-1.amazonaws.com/db-backups-739a79f7d0c8d196252026ea0ba367e1/"

You can repeat the command as many times as you like, and S3 will retain each version for you. So this is something that's very simple to put in a @hourly or @daily cron rule. You can also upload multiple files; man curl knows more.

Restoring the same file requires the secret:

$ curl --user-agent acfe927bfbc080d001c5852b40ede4cb https://s3-eu-west-1.amazonaws.com/db-backups-739a79f7d0c8d196252026ea0ba367e1/my-backup.tar.gz > my-backup.tar.gz

Note that since we're using the HTTPS endpoint for the bucket, we need to specify the region (in my case it would be Ireland, or eu-west-1). The plain HTTP endpoint doesn't need this, but is obviously less secure.

Also, as S3 may sometimes fail a few requests here and there, some combination of --retry switches is probably a good idea; consult man curl. (thanks for the suggestion @Para)

Security considerations

A few things to keep in mind:

  1. Even though we talk about "anonymous uploads", that's not strictly the case: unless you advertise your bucket name to a third party, only you and Amazon will know of its existence (there's no public listing of buckets). So not anyone can upload to the bucket, but only anyone who knows the bucket name. Keep it secret. Keep it safe.
  2. The above is comparable to the security many automated backup systems have: a plaintext/trivially encrypted password stored in a dotfile or similar.
  3. Using the HTTPS endpoint makes sure a man-in-the-middle won't learn about the bucket name. While the HTTP endpoint uses the bucket name as part of the domain, the HTTPS doesn't, so that the resulting DNS query won't disclose the bucket name either.
  4. The "restore secret" is different from the "upload secret". That's convenient, since only the latter needs to be stored on the server that's sending backups to your bucket. The former can be kept only in your Bucket Policy (and perhaps in your password manager). Keep it secret. Keep it safe.
  5. Anyone holding only the upload secret can only upload new backups, not retrieve or delete old ones. With versioning enabled, an overwrite is not a delete.
  6. Due to the above, the only obvious attack a malicious user could mount with the upload secret is uploading a bunch of large files to your bucket. While that won't mean data loss, it may mean significant charges on your next bill. While AWS doesn't have "billing limits" built in, you can set up alerts on the billing estimates if you want to.
  7. None of this requires setting up or modifying any Amazon AWS credentials, which is convenient. That said, since your AWS credentials hold the power to update the Bucket Policy, they can still be used to recover (or destroy) any data that's stored in the bucket.

Summary

Assuming a bucket called db-backups-739a79f7d0c8d196252026ea0ba367e1, you can use the following Bucket Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "allow-anon-put",
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::db-backups-739a79f7d0c8d196252026ea0ba367e1/*"
    },
    {
      "Sid": "deny-other-actions",
      "Effect": "Deny",
      "Principal": {
        "AWS": "*"
      },
      "NotAction": "s3:PutObject",
      "Resource": "arn:aws:s3:::db-backups-739a79f7d0c8d196252026ea0ba367e1/*",
      "Condition": {
        "StringNotEquals": {
          "aws:UserAgent": "acfe927bfbc080d001c5852b40ede4cb"
        }
      }
    }
  ]
}

Make sure versioning and lifecycle rules are enabled on the bucket.

This allows you to back up with:

$ curl --request PUT --upload-file "my-backup.tar.gz" "https://s3-eu-west-1.amazonaws.com/db-backups-739a79f7d0c8d196252026ea0ba367e1/"

And restore with:

$ curl --user-agent acfe927bfbc080d001c5852b40ede4cb https://s3-eu-west-1.amazonaws.com/db-backups-739a79f7d0c8d196252026ea0ba367e1/my-backup.tar.gz > my-backup.tar.gz

The bucket name and the "user agent" are secrets.

You can see the version history of your objects - among other things - in the AWS Management Console.

The end

If you have questions/comments, leave them here, or ask @Jareware. :)

nodge87 commented Jan 17, 2015

Excellent article! I have a question though. We have created a bucket and we have anonymous users creating folders in our bucket and uploading objects. We set this up before engaging in any bucket policy or ACL activities, etc.

Now we cannot download the objects or folders from the S3 console.

  1. Will this new bucket policy allow us to access already uploaded objects by the anonymous user?
  2. If not, is there a way that we can gain access to these objects now? Currently I feel as though the data is locked into S3 and there is no way for us to access/download it. :(

Thanks for the great article! I hope you can answer these questions. 😄

Owner

jareware commented Jan 18, 2015

@nodge87, glad you liked it. :)

I would assume you do have an existing Bucket Policy in place, as by default S3 won't allow anonymous uploads..? Anyway, the thing is that since those objects are uploaded by the user "anonymous", only that user can modify them, even if you own the Bucket.

The annoying part is most of your standard tools (like the S3 Console, s3cmd etc) only allow authenticated operations, not anonymous ones, so they won't be able to touch those objects. But unless there's some other policy at play, you should be able to use any HTTP client (such as curl) to GET, DELETE etc them, since then you'll be an anonymous user as well.

Hope this helps!

nodge87 commented Jan 18, 2015

Thanks @jareware!

This certainly sheds some light on things for me! I am going to go and play with our upload feature now!

If I can figure out how to allows broswer based uploads from a HTTP page as a specific user (I presume that we just set it up with a user key?) then I think I can create the appropriate bucket policy to match.

Thank you for the help!

What about using the amazon-specific headers to sign the command? I am trying to make this work (no luck yet). But it seems the same but secure.
http://tmont.com/blargh/2014/1/uploading-to-s3-in-bash

Thanks

Ok, I found out that the blog is using version1 for the headers, now it is necesary to use version4 which is really complicated. There is a great code here, but this is too much for me, i am new on all this, and I am not sure I understand everything.
http://geek.co.il/2014/11/19/script-day-amazon-aws-signature-version-4

Any help?
Thanks!

I ended using your solution, which is not bad at all and much easier. Thanks

EDIT: as a suggestion, this could be a good alternative instead of using the useragent:
https://pete.wtf/2012/05/01/how-to-setup-aws-s3-access-from-specific-ips/

Owner

jareware commented May 9, 2015

Thanks for the input @planetahuevo, and glad you got yours set up in the end! (gosh it's annoying how you get no notifications from gist comments, that's why it took me months to reply...)

The IP restriction looks like a handy option indeed if you're confident your point of access will remain behind a consistent IP. From portability point of view, though, it might be annoying to have the S3 access break because of some spontaneous IP change, however.

Cheers! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment