Skip to content

Instantly share code, notes, and snippets.

@noteed
Last active January 4, 2016 00:18
Show Gist options
  • Save noteed/8540393 to your computer and use it in GitHub Desktop.
Save noteed/8540393 to your computer and use it in GitHub Desktop.
Introducing Rees (blog post)

Introducing the rees command-line tool

I have been interested in creating a simple service to backup files. At first, the service was mainly intended to save PostgreSQL WAL segment files and I thought to simply use the scp command to upload the files to a remote machine. Then putting some more thoughts in it, it felt like a nice well-defined project and possibly like a minimum viable product.

More precisely I decided to expose a service, scp@reesd.com (think about the similarity with git@github.com or git@bitbucket.org when using the SSH transport) that would allow users to use the regular scp command to save and restore files. When scp would complete an upload to Reesd, it would mean that Reesd has actually streamed the content to multiple backend machines.

Although useable, Reesd is not yet ready for commercial use. I have still some tools to write to automate its deployment and more importantly, make sure I can restore data quickly in the event a replica fails. That being said, I have been able to implement a feature that I thought was nice: when uploading a file to a path where the parent directories don't exist, those are created on the fly. This is something that scp normally doesn't do, but since I have a frontend that sits between the scp client and the backends, I have been able to create that behavior.

Now, while precisely starting to write the restoration tool, I again wanted that on-the-fly path creation feature. But this time, I wanted it client-side, i.e. without requiring any change to the remote scp program. This was the perfect occasion to create rees, which does just that. Hopefully rees will evolve as a full-fledged client for Reesd but for now that feature already comes handy.

rees is open-source and is available on GitHub and Hackage (Reesd is mainly closed-source but some code has been made available as, or contributed to, open source projects).

In both cases (scp@reesd.com as proxy to the Reesd backends, and the rees client), the feature is implemented by using the SCP protocol. The SCP protocol is just a stream of commands and file content sent through SSH. When you want to transfer a directory, you have to use the recursive mode with the -r flag. In that case, the directories are created before the file contents are transfered. To do so the client scp process will send commands akin to the pushd and popd command-line tools: the remote scp process wil "enter" directories, creating them if necessary. Then when a file must be transfered, the client will issue a "file content" command, including the file size and permissions, followed by the file content itself.

When a single file is copied on the other hand, scp doesn't work in recursive mode and the target directory must exist. To allow on-the-fly directory creation, the Reesd proxy simply inserts the push/pop commands (and pass the -r flag to the backend scp processes). And rees does exaclty the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment