TLDR: The datastore emulator has a consistency setting. When developing, always set this to a value to 0.05 or lower to simulate realistic eventual consistency.
The Google Datastore is only partly consistent: only lookups and ancestor queries are strongly consistent. All types of queries use the indexes, which take a while to process in a distributed system like Google Datastore. When you develop an application that uses the Datastore, it helps to use the emulator in a way that simulates this eventual consistency. This way to can test if your application still works if the indexes are slow to update as they are in production. However, to what value should we set the consistency? It accepts a range from 0 to 1, for 0% to 100% consistency. But what does this mean?
See benchmark.go for a reference implementation for how this was measured. Each time the datastore emulator was started and stopped like this:
docker run --rm -d --name datastore -p 18081:18081 \
google/cloud-sdk:emulators \
gcloud beta emulators datastore start \
--project=dummy --host-port=0.0.0.0:18081 \
--consistency=CONSISTENCY_SETTING
export DATASTORE_EMULATOR_HOST=localhost:18081
go run benchmark.go
docker kill datastore
The results are summarized with a boxplot of latency per --consistency
setting. The x-axis is logarithmic. Only the last 3 consistency decrease steps
of the test where logarithmic, but those give the strong impression that this
setting is interpreted logarithmically.
Consider how these latencies affect your application. For example, an
client-server application where the client makes two calls, the first a PUT
to
insert a new entity, and a second a GET
to list all existing entities, will
have issues with displaying the newly inserted entity consistently depending on
network conditions & how fast the remaining parts of the client/server are.
In your localhost network the client/server communication might be so fast (1ms)
that it misses the indexing even for very high consistency settings (10ms spend
indexing), always appearing inconsistent. However, if the time between GET
and
PUT
is always more than 20ms then the index will always appear consistent,
and you might have surprises when deploying to production where the index update
might take in 1s or more.
It is important to tweak your consistency setting to take into account your
development and test environments properties. A safe bet is to set consistency
so low that you essentially always miss the index update for anything that does
not require user interaction. So use --consistency=0.05
for a latency of
approximately 350ms.
Good practice would be for the aforementioned application to remain on the detail screen of the newly inserted entity. If the user navigates back to the list, substantial time will have passed and the list would refresh using the updated index. The list UI should also include a refresh button or automatically refresh periodically.
If eventual consistency is not acceptable, consider storing references in a predictable entity. For example, if a user has todo list items, store a reference to the todo list items in the user, or save the todo list items with the users as an ancestor.
References:
Super nice en interessant !! Dit is iets waar ik in ieder geval nooit aan denk wanneer ik bezig ben. Maar vanaf nu dus wel ! Echt heel tof en leerzaam. Lekker gedaan ! 💯