Skip to content

Instantly share code, notes, and snippets.

@sploiselle
Last active November 21, 2016 22:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sploiselle/6ed4a7030b72d5b0d003170b2562239f to your computer and use it in GitHub Desktop.
Save sploiselle/6ed4a7030b72d5b0d003170b2562239f to your computer and use it in GitHub Desktop.
title summary toc
Ahoy, Matey! CockroachDB on Digital Ocean
The CREATE TABLE statement creates a new table in a database.
false

Digital Ocean aims to simplify web infrastructure. CockroachDB aims to simplify database operations. Putting the two together is a natural fit––an easy-to-deploy database on easy-to-deploy servers. For anyone who's suffered the pains of attempting to scale a relational database across across a distributed deployment, this synergy should be pretty obvious (and an enormous relief).

Want more proof of how well the two play together? Look no further than our office mascot Carl who looks adorable in a costume clearly inspired by Digital Ocean's Sammy the Shark.

To demo how easy it is, we're going to show you how to stand up a 3-node cluster through Digital Ocean's command-line tool, doctl. (If you prefer straight command-line, we've got those instructions here.)

Because this is a quick demo, we'll show you how to use an insecure cluster communicating over external IP addresses, neither of which we'd recommend for productions. For a production-ready deployment guide, see Deploy CockroachDB on Digital Ocean.

Install CockroachDB

Create A Startup Script

Because all of the servers we're going to create need to install CockroachDB, we're going to create a startup script that downloads the latest CockroachDB binary, and then makes it accessible from the command line.

  1. Create a file to store the startup script:

    $ sudo nano doctl.sh
  2. Enter the following contents into the file:

    #!/bin/bash
    wget https://binaries.cockroachdb.com/cockroach-latest.linux-amd64.tgz
    tar -xf cockroach-latest.linux-amd64.tgz --strip=1 cockroach-latest.linux-amd64/cockroach
    sudo mv cockroach /usr/local/bin
    
  3. Write the file (^O) and exit nano (^X).

Create Your Droplets

With our startup script to install CockroachDB prepared, we can create our Droplets.

  1. Find your SSH key's FingerPrint, which you'll need when creating Droplets:

    $ doctl compute ssh-key list
  2. Create your Droplets with the following options:

    • --size: We recommend at least 2GB
    • --image: We're using Ubuntu 16.10 but you should be able to use any contemporary Linux distro
    • --user-data-file: Use the startup script we created, doctl.sh
    • --ssh-keys: Use the SSH key's FingerPrint you found in the last step.
# Create the first node
$ doctl compute droplet create node1 \
--size 2gb \
--image ubuntu-16-10-x64 \
--region nyc1 \
--user-data-file doctl.sh \
--ssh-keys <SSH key FingerPrint>

# Create the second and third nodes (only name differs)

$ doctl compute droplet create node2 --size 2gb --image ubuntu-16-10-x64 --region nyc1 --user-data-file doctl.sh --ssh-keys <SSH key FingerPrint>

$ doctl compute droplet create node3 --size 2gb --image ubuntu-16-10-x64 --region nyc1 --user-data-file doctl.sh --ssh-keys <SSH key FingerPrint>

It can take a few minutes for your Droplets to finish being created. You can check on them using doctl compute droplet list.

Get Your Droplets' Details

Once the Droplets have been created, we'll need to get some information about them:

$ doctl compute droplet list

Copy down the following values from each Droplet:

  • ID
  • Name
  • Public IPv4

Start Your Cluster

At this point, we can start our cluster, which will contain all of our databases.

  1. SSH into node1:

    $ doctl compute ssh <node1 ID>
  2. Enter yes to proceed connecting to the Droplet.

  3. Start your cluster:

    $ cockroach start --insecure --background --advertise-host=<node1 IP address>

    You'll receive a message to stdout with your new cluster's details, including its ID.

  4. Terminate the SSH connection to your first node (CTRL+D).

Join Additional Nodes to the Cluster

With the cluster up and running, you can now add nodes to the cluster. This is almost as simple as starting the cluster itself (just one more flag) and demonstrates how easy it is to horizontally scale CockroachDB.

  1. SSH to the second node:

    $ doctl compute ssh <node2 ID>
  2. Enter yes to proceed connecting to the Droplet.

  3. Start a new node that joins the existing cluster using node1's IP address on port 26257 (CockroachDB's default port):

    cockroach start --insecure --background   \
    --advertise-host=<node2 IP address>  \
    --join=<node1 IP address>:26257
  4. Close the second node's SSH connection and complete the same steps for your third Droplet, using its IP address instead.

  5. Make sure all 3 nodes are connected:

    $ cockroach node status

    The response to stdout should list all 3 nodes.

Features in Action

At this point, your cluster's up and running, but we're going to take you through an some demos of CockroachDB's features that work really well on a cloud provider like Digital Ocean.

We've already seen how easy it is to scale a deployment by simply adding servers, but we'll also cover:

  • Data Distribution
  • Survivability

Data Distribution

In traditional RDBMSes, you achieve scale by sharding your deployment, which splits up a table into contiguous sets of rows and then stores those rows on separate servers. However, keeping a distribution of data in a sharded deployment is an enormous burden for your application/organization/overworked DBA.

Because we want to make operating a database as simple as possible, CockroachDB handles that kind of distribution for you without any kind of complicated configuration or additional settings. When you write data to any node in CockroachDB, it's simply available to the rest of the nodes.

Let's demonstrate how that works by generating CockroachDB's example data on one node and then viewing it from another node.

  1. From your first node, generate the example data (a database called startrek with two tables, episodes and quotes):

    $ doctl compute ssh <node1 ID>
    $ cockroach gen example-data | cockroach sql
    
  2. Find out how much data was written to your node:

    $ cockroach sql --execute="SELECT COUNT(*) FROM startrek.episodes; SELECT COUNT(*) FROM startrek.quotes;"
    

    You'll see that episodes contains 79 rows and quotes contains 200.

  3. Terminate the connection with the first node (CTRL + D), and then connect to the second node:

  4. SSH to your second node:

    $ doctl compute ssh <node2 ID>
    
  5. Run the same SQL query to see how much data is stored in the two example tables:

    $ cockroach sql --execute="SELECT COUNT(*) FROM startrek.episodes; SELECT COUNT(*) FROM startrek.quotes;"
    

    Again, 79 rows in episodes and 200 in quotes.

You'll see that even though you generated the example data on another node, it's been distributed and is accessible from all of your other servers. And, best of all, you didn't have to configure the sharding patterns or do much of anything for it to work.

Survivability

Once you have a cluster up and running with data being distributed between all of your nodes, what happens when one of the nodes dies? Well, besides not being able to connect to the server that died, not a whole lot! Part of how CockroachDB distributes data to nodes also includes replication, which means that even if one node goes down, your cluster still has two other copies of it.

To demonstrate this, we'll remove a node from the cluster and show that all of the cluster's data is still available. We'll then rejoin the node to the cluster and see that it receives all updates that happened while it was offline.

  1. Assuming you're still connected to node2, quit running CockroachDB:

    $ cockroach quit
    
  2. Now close this session (CTRL+D) and move to node3:

    $ doctl compute ssh <node3 ID>
    
  3. Delete all of the quotes where the episode is greater than 50:

    $ cockroach sql --execute="DELETE FROM startrek.quotes WHERE episode > 50; SELECT COUNT(*) FROM startrek.quotes;"
    

    You'll see there are now 131 rows of data.

  4. Close the connection with node3, and then move back to node2:

    $ doctl compute ssh <node3 ID>
    
  5. Restart the node:

    cockroach start --insecure --background   \
    --advertise-host=<node2 IP address>  \
    --join=<node1 IP address>:26257
    
  6. Count the number of rows available on the cluster:

    $ cockroach sql --execute="SELECT COUNT(*) FROM startrek.quotes;
    

131! So, despite being offline when the update happened, the node is updated as soon as it rejoins the cluster.

If you'd like, you can now remove the example data:

$ cockroach sql --execute="DROP TABLE quotes; DROP TABLE episodes; DROP DATABASE startrek;"

What's next?

Now that you've seen how easy it is to get CockroachDB up and running on Digital Ocean (and demoed the core features), you can kick it up a notch and try your hand at deploying CockroachDB on Digital Ocean with SSL encryption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment