A Datacentre is a group of Servers
Servers can be grouped in a Datacentre
A Server (Instance) is a single Rethink process
A Database is a logical grouping for Tables - Tables may sit on different Servers
A Shard is a partition of a Table
A Shard can be allocated to a specific Server
Each Shard has a Master and one or more Replicas
The Master is the one to send the ACK of a write
You can set the number of replicas (=copy) of a Table per datacenter
The few constraints that I am aware of are:
- The masters of one table has to be in the same datacenter
- You cannot have ask for more replicas in a datacenter than you have servers in this datacenter
- Oh, and there is also the "ack" parameter
- the ack number is set per table and per datacenter
- It's the number of writes that have to be flushed to disk before a write is acknowledge
- So number of acks <= number of replicas <= number of servers in a datacenter
You create one database per project, then you create your table inside this database. If you have too much data, and want to spread the load, you shard your table. If you want to have a back up, you set more than 1 replica
If at some point, you want to scale (because you have too much data - or because the load is too much), you shard your table
By default the master can be anywhere - you can force it to be in a specific datacenter though. You can end up with masters on multiple datacenters (when you don't assign a primary datacenter)
You can Create a Database and a Table from ReQL
You cannot Shard a Table from ReQL (you must use the Web interface or CLI to do this at present). You cannot automatically create balanced shards with the CLI for the moment. With the CLI you have to define the split points, the web interface infer where the good split points are.
An API is planned to allow sharding from ReQL, but hasn't yet been designed.
Rethink queries are automatically routed to the appropriate server. If you write it goes to the master, if you do a map reduce it will reduce on each shard then join the result. This is an automatic process.
Am I right in remembering that Rethink splits queries in parallel? Is that across servers or shards?
[5:42pm] neumino: The query is automatically routed to the appropriate server
[5:42pm] neumino: If you write it goes to the master
[5:43pm] neumino: If you do a map reduce it will reduce on each shard the join the result
[5:43pm] neumino: It basically do it for you, you don't have to worry about it