mblair/rethinkdb.md

## rethinkdb.md

      
    Raw
  

              rethinkdb.md
            
          
    An Operator's Look at RethinkDB

Matt Blair - @mattyblair - Flipboard

The Getting Started Experience


It's in Homebrew and just runs (relative to, say, HBase): awesome!
Having packages for popular Linux distros (I use Ubuntu personally and at work): awesome!
Serving the GPG key over HTTP: not so awesome :-/
apt-get -y install rethinkdb
It didn't start by default- great! I hate when services do this. I then looked at the start script to see how it knew (usually daemons use /etc/default/blah, but it has custom logic to see if there's anything in /etc/rethinkdb/instances.d, which is cool).
I then wondered if it would restart upon upgrade, so...
I looked at the default conf; it's just comments, sweet that it starts without having to configure it!
Also, it's SHORT. I'm sure it'll grow over time, but compared to, say, Cassandra (~900 lines, including comments and blanks), this is awesome!
It's ini-style; does the last value win for duplicate keys? It's often handy to just append and keep it moving, instead of having to write sed invocations.
cd /etc/rethinkdb && cp default.conf.sample instances.d/
sudo service rethinkdb start
Oh cool, it mentioned where it's listening.
echo "bind=all" > /etc/rethinkdb/instances.d/default.conf
apt-get -y install rethinkdb=2.0.1
Couldn't be found :-(
apt-cache show rethinkdb | head -n25
ah, add ~0trusty
apt-get -y --force-yes install rethinkdb=2.0.1~0trusty
Oh, it does restart upon upgrades :-( - less than ideal because I often stage upgrades by installing binaries everywhere, then doing rolling restarts.
The source compilation steps say to run ./configure --allow-fetch; What is it fetching? Some shops don't allow machines with compilers to access the internet, and vice versa. It'd be good to enumerate what dependencies can be staged ahead of time.

Ops Documentation Questions


The memory usage page talks about expecting "each query and background process to use 1-20MB of memory"; what about connections? MySQL has memory overhead for each open connection, which can become a bottleneck. Is this an issue for RethinkDB?
Hot backup- "it will use some cluster resources, but it will not lock out any of the clients, so you can safely run it on a live cluster." A little more detail would be great here; HBase 1.1 has added request throttling, so folks doing scripted backups or analytical queries (for example) can throttle by table or user. I don't think request throttling is needed, but some more information about how the hot backup job affects interactive queries would be great.
Monitoring- system info is just a RethinkDB table, no reporters :-(

Important for integration into existing tools (Nagios, OpenTSDB,
Graphite, Riemann...); to write an integration, you need a RethinkDB client.
JSON over HTTP would be cool!
What to monitor? If I wanted to throw together a RethinkDB dashboard, what metrics would I choose? Riak has a 'Riak Metrics to Graph' section of their docs, for example.
Latency numbers, not just throughput! (Riak has median/95/99/max)


Version migration- can it be done in a rolling fashion? The docs don't say either way.
systemd support appears to be in progress- Ubuntu 16.04 will (probably) use it by default, so having an answer here would be cool.
Log syntax docs?
Log rotation- built in? No? (no is just fine, ops people know how to use logrotate)
Log levels? Are they configurable? Without restarting?

Clustering


I haven't tested it because I'm waiting for the Raft stuff to land in 2.1 to really give it a workout.

Other random thoughts


'Writing RethinkDB drivers'- very well written...now I might have to write one :-)
Mike mentioned deletes during his talk- are they soft deletes, or real deletes? Since it's log-structured, I'm guessing they're soft and compaction needs to occur for the space to be freed?
Is there a preferred filesystem? EXT4/XFS/ZFS?
What do latency and throughput look like when the data size is larger than the amount of memory you've allocated to RethinkDB? I get that it's not a memory-first DB like MongoDB, but a quick answer here would alleviate the inevitable "is this thing web scale?" questions.
Official benchmarks, for both bare metal and AWS. I know that Scientific Benchmarking™ is hard, so even just code users can run themselves to do their own stress tests would be great.
AWS instance recommendations! Surely it'll run well on i2s, but what about r3s? c3s? EBS with PIOPS?
Spreading over multiple disk volumes? "Note: it is possible to attach more specialized EBS volumes and have RethinkDB store your data on them, but this option is not yet available out of the box." (http://rethinkdb.com/docs/paas/) Cassandra has a JBOD mode where they'll spread the data across volumes; striping (or RAID10) is fine for now.
For backup/restore, if the restored node has a different FQDN, do you need to mark the old node as "down" before bringing the new one up? Does DNS matter at all (please say no)?
What happens when disks die, or fill up? Can reads still occur when disks are full? Cassandra has a blog post about this (http://www.datastax.com/dev/blog/handling-disk-failures-in-cassandra-1-2), some answers here would be great.