#A Changes Feed Example
##Create a new database:
curl -XPUT https://samsmith.cloudant.com/new_database
By default, this creates an n=3, q=4 database.
This means the database is split into 4 shard ranges. Each shard range being stored three times.
You can see how any database is sharded via the API:
curl -XGET https://samsmith.cloudant.com/new_database/_shards
{
"shards": {
"00000000-3fffffff": [
"dbcore@db1.mead.cloudant.net",
"dbcore@db2.mead.cloudant.net",
"dbcore@db3.mead.cloudant.net"
],
"40000000-7fffffff": [
"dbcore@db1.mead.cloudant.net",
"dbcore@db2.mead.cloudant.net",
"dbcore@db3.mead.cloudant.net"
],
"80000000-bfffffff": [
"dbcore@db1.mead.cloudant.net",
"dbcore@db2.mead.cloudant.net",
"dbcore@db3.mead.cloudant.net"
],
"c0000000-ffffffff": [
"dbcore@db1.mead.cloudant.net",
"dbcore@db2.mead.cloudant.net",
"dbcore@db3.mead.cloudant.net"
]
}
##Add some docs
We now add 10 documents to this database: doc1
, doc2
, …, doc10
You can see which shard range is holding a particular document via the API.
Lets see which range is holding doc1
:
curl -XGET https://samsmith.cloudant.com/new_database/_shards/doc1
{
"range": "c0000000-ffffffff",
"nodes": [
"dbcore@db1.mead.cloudant.net",
"dbcore@db2.mead.cloudant.net",
"dbcore@db3.mead.cloudant.net"
]
}
##Querying _changes
curl -XGET https://samsmith.cloudant.com/new_database/_changes
Our first query returns this sequence of updates:
seq: 1-XXXX id: doc3
seq: 2-XXXX id: doc4
seq: 3-XXXX id: doc2
seq: 4-XXXX id: doc7
seq: 5-XXXX id: doc8
seq: 6-XXXX id: doc6
seq: 7-XXXX id: doc10
seq: 8-XXXX id: doc1
seq: 9-XXXX id: doc5
seq: 10-XXXX id: doc9
However, our second query returns this sequence of updates:
seq: 1-XXXX id: doc3
seq: 2-XXXX id: doc1
seq: 3-XXXX id: doc4
seq: 4-XXXX id: doc2
seq: 5-XXXX id: doc7
seq: 6-XXXX id: doc8
seq: 7-XXXX id: doc5
seq: 8-XXXX id: doc6
seq: 9-XXXX id: doc9
seq: 10-XXXX id: doc10
You’ll notice that the ordering doesn’t appear consistent here.
To see why this is we need to know which documents are held by each of the 4 shard ranges:
shard range: 00000000-3fffffff - holds docs: doc3, doc7
shard range: 40000000-7fffffff - holds docs: doc2, doc6, doc10
shard range: 80000000-bfffffff - holds docs: doc4, doc8
shard range: c0000000-ffffffff - holds docs: doc1, doc5, doc9
With this in mind, lets look back at the two differing _changes
results from earlier.
Although the overall ordering appears different, you’ll notice that doc3
always had a lower update seq than doc7
. Similarly, doc6
always had a lower update seq than doc10
. This is because these docs were from the same shard range and the update histories of the 3 shard copies were identical.
If the shard copies of a particular range have a different update history then we'd see that even this partial ordering doesn't hold true.
This fact isn’t overly useful; but is important in understanding how the changes result is generated.
Our full changes history so far...
curl -XGET https://samsmith.cloudant.com/new_database/_changes
{
"results": [
{
"seq":"1-g1AAAAEEeJzLYWBgYMlgTmGQSUlKzi9KdUhJMtTLTU1M0UvOyS9NScwr0ctLLckBqmJKZEiy____f1YGcyJjLlCAPSXZNDXN3JiAXuIMT3IAkkn1IPMTGYjTkscCJBkagBRQ134StR2AaAPZlgUA91NUQw",
"id":"doc3",
"changes":[
{
"rev":"1-967a00dff5e02add41819138abb3284d"
}
]
},
…
{
"seq":"9-g1AAAAFseJzLYWBgYMlgTmGQSUlKzi9KdUhJMtTLTU1M0UvOyS9NScwr0ctLLckBqmJKZEiy____f1YGUyJTLlCAPSXZNDXN3Jg43UkOQDKpHmwAM9SARHPL5GRTAwLaiTM_jwVIMjQAKaAV-5HsMDQyMbBIoqIdByB2gP3BDLYj2TTRLMkslYARWQBehXKh",
"id":"doc9",
"changes":[
{
"rev":"1-967a00dff5e02add41819138abb3284d"
}
]
},
{
"seq": "10-g1AAAAEueJzLYWBgYMlgTmGQSUlKzi9KdUhJMtbLTU1M0UvOyS9NScwr0ctLLckBqmJKZEiy____f1YGUyJTLlCAPcXA1NAizRxVtxEO3UkOQDKpHmwAcyIz2IBkSwPTJHNDAtqJc10eC5BkaABSQCv2IxyZZmGQYppqSoohByCGgH0Kcai5cYqloUlyFgCF0Vwn",
"id": "doc10",
"changes": [
{
"rev": "1-967a00dff5e02add41819138abb3284d"
}
]
}
],
"last_seq": "10-g1AAAAEPeJzLYWBgYMlgTmGQSUlKzi9KdUhJMtbLTU1M0UvOyS9NScwr0ctLLckBqmJKZEiy____f1YGUyJTLlCAPcXA1NAizRxVtxEO3UkOQDKpHmoAM9iAZEsD0yRzQ-Ksz2MBkgwNQApoxn6EK9IsDFJMU01JMeQAxBAkl5gbp1gamiRnAQAublE9",
"pending": 0
}
Each update seq, once decoded, tells us several things:
- What shard copies were used to created the changes results.
- The highest seq the client has seen from each of the shard ranges.
For example, this last update seq decodes to the following:
10-g1AAAAEueJzLYWBgYMlgTmGQSUlKzi9KdUhJMtbLTU1M0UvOyS9NScwr0ctLLckBqmJKZEiy____f1YGUyJTLlCAPcXA1NAizRxVtxEO3UkOQDKpHmwAcyIz2IBkSwPTJHNDAtqJc10eC5BkaABSQCv2IxyZZmGQYppqSoohByCGgH0Kcai5cYqloUlyFgCF0Vwn
[
{‘dbcore@db3.mead.cloudant.net’, '00000000-3fffffff', 2},
{'dbcore@db2.mead.cloudant.net', '40000000-7fffffff', 3},
{'dbcore@db2.mead.cloudant.net', '80000000-bfffffff', 3},
{'dbcore@db3.mead.cloudant.net', 'c0000000-ffffffff', 3}
]
The above tells us that for the shard range 00000000-3fffffff
, we’ve chosen to stream changes from the copy on node db3.mead
. It also says the client has seen all changes up to seq 2
from this shard.
Similarly, we’ve chosen the copy of 40000000-7fffffff
shard on db2.mead
. The client has seen up to seq 3
.
..and so on...
We can pass any update seq into a new _changes
query as a since
parameter. This will ensure that the changes are gathered from the same set of internal shards (if available). The result will show all changes that we have not already seen (we'll see later how we might get back things we have seen too).
Lets add doc11
and doc12
and query _changes
using our last_seq as a since
parameter:
curl https://samsmith.cloudant.com/new_database/_changes?since=“10-g1AAAAEPeJzLYWBgYMlgTmGQSUlKzi9KdUhJMtbLTU1M0UvOyS9NScwr0ctLLckBqmJKZEiy____f1YGUyJTLlCAPcXA1NAizRxVtxEO3UkOQDKpHmoAM9iAZEsD0yRzQ-Ksz2MBkgwNQApoxn6EK9IsDFJMU01JMeQAxBAkl5gbp1gamiRnAQAublE9”
And we get…
seq: 11-XXXX id: doc11
seq: 12-XXXX id: doc12
However, since doc11
and doc12
haven’t landed in the same shard range, we query again and the ordering flips...
seq: 11-XXXX id: doc12
seq: 12-XXXX id: doc11
The point I wish to stress here is that you will always see every change.
I mentioned that using the since
parameter in your queries gathers changes from the same set of internal shards. However, what if a node is down and the shard we want is not available?
I don't want to get into the weeds here, but needless to say, we choose a substitute shard and stream changes from that. The tricky bit is when the substitute shard has a different update history. We then have to find a suitable seq in which to begin streaming so that no changes are missed. This often introduces changes that the client might have already seen. Again, the single guarantee that we make here is that the client will see every change at least once. I can go into further detail here if required.