Skip to content

Instantly share code, notes, and snippets.

@jakcharlton
Last active December 31, 2015 14:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jakcharlton/8000864 to your computer and use it in GitHub Desktop.
Save jakcharlton/8000864 to your computer and use it in GitHub Desktop.
Rethink concepts

A Datacentre is a group of Servers

Servers can be grouped in a Datacentre

A Server (Instance) is a single Rethink process

A Database is a logical grouping for Tables - Tables may sit on different Servers

A Shard is a partition of a Table

A Shard can be allocated to a specific Server

Each Shard has a Master and one or more Replicas

The Master is the one to send the ACK of a write

You can set the number of replicas (=copy) of a Table per datacenter

The few constraints that I am aware of are:

  • The masters of one table has to be in the same datacenter
  • You cannot have ask for more replicas in a datacenter than you have servers in this datacenter
  • Oh, and there is also the "ack" parameter
  • the ack number is set per table and per datacenter
  • It's the number of writes that have to be flushed to disk before a write is acknowledge
  • So number of acks <= number of replicas <= number of servers in a datacenter

You create one database per project, then you create your table inside this database. If you have too much data, and want to spread the load, you shard your table. If you want to have a back up, you set more than 1 replica

If at some point, you want to scale (because you have too much data - or because the load is too much), you shard your table

By default the master can be anywhere - you can force it to be in a specific datacenter though. You can end up with masters on multiple datacenters (when you don't assign a primary datacenter)

You can Create a Database and a Table from ReQL

You cannot Shard a Table from ReQL (you must use the Web interface or CLI to do this at present). You cannot automatically create balanced shards with the CLI for the moment. With the CLI you have to define the split points, the web interface infer where the good split points are.

An API is planned to allow sharding from ReQL, but hasn't yet been designed.

Rethink queries are automatically routed to the appropriate server. If you write it goes to the master, if you do a map reduce it will reduce on each shard then join the result. This is an automatic process.

@jakcharlton
Copy link
Author

Am I right in remembering that Rethink splits queries in parallel? Is that across servers or shards?
[5:42pm] neumino: The query is automatically routed to the appropriate server
[5:42pm] neumino: If you write it goes to the master
[5:43pm] neumino: If you do a map reduce it will reduce on each shard the join the result
[5:43pm] neumino: It basically do it for you, you don't have to worry about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment