Skip to content

Instantly share code, notes, and snippets.

@rlmcpherson
Last active September 2, 2015 21:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rlmcpherson/2b0b5d1ef1a37aaee766 to your computer and use it in GitHub Desktop.
Save rlmcpherson/2b0b5d1ef1a37aaee766 to your computer and use it in GitHub Desktop.

2 Years of Go at CodeGuard

At CodeGuard, we've been using go (link) to build the premier website backup product for just over two years. Our first go code entered production on August 14, 2013 with the open source s3gof3r tool that transfers our backup data in and out of S3. 30 times faster than the ruby code it replaced, it made an immediate impact on our backup process efficiency, enabling us to serve our customers with larger sites rapidly.

Since that initial deployment, go has increased as a percentage of our production code ever since. As we've scaled our architecture to run more than a quarter million website backups per day, we've had to build new systems and rearchitect old systems. In this post we'll look at some of the systems we've built using go and how go has helped us to scale while also increasing the reliability of our product.

Small Services

Key to scaling our product architecture to handle hundreds of thousands of backups per day has been building small services, both for new product features and as part of breaking features out of the monolithic legacy architecture. As other companies have documented(link), Go is well-suited for these kinds of services.

The common experience with building these small services is that they are easily deployed, efficient, and -- most importantly -- reliable. Let's look at why go services tend to have these properties.

Deployment

CodeGuard is an engineering-driven startup and engineering resources are usually the limiting factor in product development to drive growth. Efficient use of engineers' time directly drives ROI and revenue growth. We also deploy frequently to our production systems. When we build a new service, the cost of deployment is a critical factor in the development and ongoing cost of that service. Deployment cost is driven by both the cost of building or setting up deployment tooling as well as by the time and complexity of deploying new code to staging or production environments. We always use tooling to automate deployment. With go, deployment automation is far simpler than for many other technologies:

  • static binaries: go service is deployed as a single statically-linked binary. This means that it has no dependencies on other software, simplifying the deployment process.

  • easy cross-compilation: a statically-linked binary is great but it must be built specifically for the architecture of the server where it will run. A binary built for OSX, for instance, will not run on a 64-bit linux. CodeGuard engineers use Apple OSX for develpment but all code is deployed and run in AWS on linux servers. To cross-compile a go binary for linux on OSX, the command is simple: env GOOS=linux GOARCH=amd64 go build

  • reliability

  • error model

  • statically typed and compiled

  • standard lib

From Serial to Parallel

CodeGuard is a website backup company and in the process of completing millions of backups per month, we move petabytes of data between customer servers, our servers, and Amazon S3. To operate cost-effectively and quickly these backup systems must be efficient. Thanks to the architecture of go and it's optimized standard library, it is far more efficient than most dynamically-typed languages such as ruby or python. The memory model (link to memory model) also ensures efficient memory usage. Compared to a similar dynacally-typed, interpreted language memory usage can often be an order of magnitude less.

The most important way that go makes code more efficient, though, is through concurrency. With the imminent death of Moore's law and the stalling of CPU clock speed increases a decade ago, concurrency is critical to efficient use of modern servers. (link to DC post) At CodeGuard, using go allows us to use concurrency to parallelize formerly serial processes. If two steps in a process are not directly dependent on one another, they can be run at the same time so that the time to complete them is the maximum of the run two run times rather than the sum. If a network operation can be paralellized by sending or receiving chunks of a file at the same time, go makes that easy. To illustrate the power of concurrency to increase speed and efficiency, let's take a closer look at a go service that CodeGuard recently deployed.

Website Listing Service

CodeGuard website backups are incremental, which just means that on each backup only the changes from the previous backup are detected and downloaded from customer websites. This reduces load on customer servers as well as backup size. In order to detect the changes, all website files must be inspected and compared to the previous backup. Since CodeGuard connects to customer sites over SFTP and ftp, this comparison is done by listing the files and directories on each website. The vast majority of CodeGuard's customer sites are backed up via SFTP, since it is more secure and reliable than ftp, so the listing service currently supports only SFTP currently.

Serial Listing

Previously, listing was entirely serial. Only one directory listing was requested at a time. It was parsed on receipt and then any directories in that listing were placed on a queue to be listed. This process continued until the queue was empty. This meant that listing speed, beyond listing only one directory at a time, was also limited by the round trip speed of the connection plus the time to parse the listing data.

Concurrent Listing

To improve the speed and efficiency of listing, the clear solution is to request multiple listings concurrently. This process is similar to the serial listing process, but instead of requesting only one listing at a time, a listing is immediately requested whenever a directory is found in a listing.

SFTP runs on top of SSH and SSH connections have a real cost, both on the client side and the remote server side and it is unacceptable for listings to cause high load on customer servers. Clearly an ssh connection can not be opened for every directory encountered. Many customers also have limitations on the number of SSH connections they can open, either enforced by their hosting provider or to reduce load. To mitigate this, the listing requests are multiplexed over multiple SFTP sessions running on ssh sessions. In addition, SFTP allows for multiple concurrent requests per SFTP session. Multiple SFTP sessions are created for each underlying SSH connection and multiple requests are then made on each sftp session. This allows a large increase in concurrency without a correspondingly large increase in server loads.

In addition to multiple concurrent listings per website, multiple sites are listed concurrently, allowing CodeGuard to maximize our server utilization while still minimizing load on customer servers.

Performance

Concurrency is nice, but what how much of a performance improvement does it give us over a serial process? Under ideal conditions (low latency, reliable connections), the serial process had a peak performance of 5 directory listings per second. The new listing service, in contrast, averages approximately 75 directories per second. For some large sites, where more connections are opened, performance can exceed 400 directories per second. In addition to this 15x average speed increase, the processing efficiency of go allows more websites to be listed per server and runs on smaller servers than previously used. This translates into significant backup speed improvements while also lowering costs to server our customers.

The Future of Go at CodeGuard

Using go has been instrumental in scaling the CodeGuard infrastructure to millions of monthly backups and has allowed us to server our customers better through increased performance, reliability, and efficiency that allows us to provide an affordable website backup product. It will be certainly be a part of our engineering-driven, customer-focused approach to making the best website backup service even better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment