A Datacentre is a group of Servers
Servers can be grouped in a Datacentre
A Server (Instance) is a single Rethink process
A Database is a logical grouping for Tables - Tables may sit on different Servers
A Shard is a partition of a Table
A Shard can be allocated to a specific Server
Each Shard has a Master and one or more Replicas
The Master is the one to send the ACK of a write
You can set the number of replicas (=copy) of a Table per datacenter
The few constraints that I am aware of are:
- The masters of one table has to be in the same datacenter
- You cannot have ask for more replicas in a datacenter than you have servers in this datacenter
- Oh, and there is also the "ack" parameter
- the ack number is set per table and per datacenter
- It's the number of writes that have to be flushed to disk before a write is acknowledge
- So number of acks <= number of replicas <= number of servers in a datacenter
You create one database per project, then you create your table inside this database. If you have too much data, and want to spread the load, you shard your table. If you want to have a back up, you set more than 1 replica
If at some point, you want to scale (because you have too much data - or because the load is too much), you shard your table
By default the master can be anywhere - you can force it to be in a specific datacenter though. You can end up with masters on multiple datacenters (when you don't assign a primary datacenter)
You can Create a Database and a Table from ReQL
You cannot Shard a Table from ReQL (you must use the Web interface or CLI to do this at present). You cannot automatically create balanced shards with the CLI for the moment. With the CLI you have to define the split points, the web interface infer where the good split points are.
An API is planned to allow sharding from ReQL, but hasn't yet been designed.
Rethink queries are automatically routed to the appropriate server. If you write it goes to the master, if you do a map reduce it will reduce on each shard then join the result. This is an automatic process.
So, you can shard Tables, but not Databases
[5:26pm] jakcharlton_: If we are creating Databases with ReQL as a customer creates a project, can we shard from ReQL too?
[5:26pm] neumino: jakcharlton_, you cannot shard from ReQL for the moment
[5:26pm] jakcharlton_: (as we would have something like 15-20 tables per Database)
[5:26pm] neumino: We are thinking about it, but we haven't settle an API yet
[5:26pm] jakcharlton_: Can sharing be set to automatic?
[5:26pm] jakcharlton_: sharding*
[5:26pm] neumino: You can write a script to shard with the CLI
[5:26pm] jakcharlton_: I swear OSX autocorrect hates me
[5:27pm] neumino: You can shard with the web interface
[5:27pm] neumino: But you cannot automatically create balanced shards with the CLI for the moment
[5:27pm] jakcharlton_: CLI / Web are both manual processes - think Basecamp - when a customer sets a new project up, we were intending to create a new database
[5:27pm] neumino: It's range shards, so with the CLI you have to define the split points
[5:27pm] neumino: The web interface infer where the good split points are
[5:28pm] neumino: jakcharlton_, you can write a bash script to use the CLI
[5:28pm] jakcharlton_: Ugggg ....
[5:28pm] neumino: I have personnally not done it, but some people have
[5:28pm] jakcharlton_: Thats pretty much my nightmare scenario with a database
[5:28pm] neumino: We are going to have hash shards
[5:28pm] neumino: (soon I think)
[5:28pm] jakcharlton_: hash shards?
[5:28pm] neumino: And in this case, you will just have to set the number of shards, not the split points
[5:29pm] neumino: range shard = a shard where the primary key is between two keys
[5:29pm] neumino: for hash shard, you hash the primary key, and define in which shard it falls