Skip to content

Instantly share code, notes, and snippets.

@hermanbanken
Last active January 9, 2023 23:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hermanbanken/ec80873508b005e6bca70432facc62f6 to your computer and use it in GitHub Desktop.
Save hermanbanken/ec80873508b005e6bca70432facc62f6 to your computer and use it in GitHub Desktop.
Datastore Emulator consistency

TLDR: The datastore emulator has a consistency setting. When developing, always set this to a value to 0.05 or lower to simulate realistic eventual consistency.

The Google Datastore is only partly consistent: only lookups and ancestor queries are strongly consistent. All types of queries use the indexes, which take a while to process in a distributed system like Google Datastore. When you develop an application that uses the Datastore, it helps to use the emulator in a way that simulates this eventual consistency. This way to can test if your application still works if the indexes are slow to update as they are in production. However, to what value should we set the consistency? It accepts a range from 0 to 1, for 0% to 100% consistency. But what does this mean?

Benchmark

See benchmark.go for a reference implementation for how this was measured. Each time the datastore emulator was started and stopped like this:

docker run --rm -d --name datastore -p 18081:18081 \
  google/cloud-sdk:emulators \
  gcloud beta emulators datastore start \
    --project=dummy --host-port=0.0.0.0:18081 \
    --consistency=CONSISTENCY_SETTING
export DATASTORE_EMULATOR_HOST=localhost:18081
go run benchmark.go
docker kill datastore

Results

The results are summarized with a boxplot of latency per --consistency setting. The x-axis is logarithmic. Only the last 3 consistency decrease steps of the test where logarithmic, but those give the strong impression that this setting is interpreted logarithmically.

consistency latency visualized per setting

Impact

Consider how these latencies affect your application. For example, an client-server application where the client makes two calls, the first a PUT to insert a new entity, and a second a GET to list all existing entities, will have issues with displaying the newly inserted entity consistently depending on network conditions & how fast the remaining parts of the client/server are.

In your localhost network the client/server communication might be so fast (1ms) that it misses the indexing even for very high consistency settings (10ms spend indexing), always appearing inconsistent. However, if the time between GET and PUT is always more than 20ms then the index will always appear consistent, and you might have surprises when deploying to production where the index update might take in 1s or more.

Conclusion

It is important to tweak your consistency setting to take into account your development and test environments properties. A safe bet is to set consistency so low that you essentially always miss the index update for anything that does not require user interaction. So use --consistency=0.05 for a latency of approximately 350ms.

Good practice would be for the aforementioned application to remain on the detail screen of the newly inserted entity. If the user navigates back to the list, substantial time will have passed and the list would refresh using the updated index. The list UI should also include a refresh button or automatically refresh periodically.

If eventual consistency is not acceptable, consider storing references in a predictable entity. For example, if a user has todo list items, store a reference to the todo list items in the user, or save the todo list items with the users as an ancestor.

References:

package main
import (
"context"
"fmt"
"log"
"math/rand"
"os"
"strings"
"time"
"cloud.google.com/go/datastore"
google_datastore "cloud.google.com/go/datastore"
)
var dsClient *google_datastore.Client
var gctx = context.Background()
func init() {
var err error
proj := os.Getenv("DATASTORE_PROJECT_ID")
if proj == "" {
proj = "dummy"
}
dsClient, err = google_datastore.NewClient(gctx, proj)
orPanic(err)
}
func main() {
// use randomness to allow testing independent updates
random := rand.New(rand.NewSource(1337))
times := []string{}
for i := 0; i < 10; i++ {
// Insert some stuff into the Datastore
parent := datastore.IDKey("Foobar", random.Int63(), nil)
child := datastore.IDKey("Nested", random.Int63(), parent)
_, err := dsClient.Put(gctx, parent, &datastore.PropertyList{
{Name: "test1", Value: parent.ID, NoIndex: false},
{Name: "test2", Value: []interface{}{"foo", "bar"}, NoIndex: false},
{Name: "test3", Value: []interface{}{&datastore.Entity{Key: child, Properties: []datastore.Property{
{Name: "nested1", Value: child.ID, NoIndex: false},
}}}, NoIndex: false},
})
orPanic(err)
var start = time.Now()
// Poll to see how soon a Query returns it
var keys1 []*datastore.Key
var keys2 []*datastore.Key
for ; time.Since(start) < 10*time.Second; time.Sleep(20 * time.Millisecond) {
keys1, err = dsClient.GetAll(gctx, datastore.NewQuery("Foobar").Filter("test1 =", parent.ID).KeysOnly().Limit(10), nil)
orPanic(err)
keys2, err = dsClient.GetAll(gctx, datastore.NewQuery("Foobar").Filter("test3.nested1 =", child.ID).KeysOnly().Limit(10), nil)
orPanic(err)
if len(keys1) > 0 && len(keys2) == 0 {
// only one became consistent
log.Printf("Partial consistency (parent) after: %s", time.Since(start))
} else if len(keys1) == 0 && len(keys2) > 0 {
// only one became consistent
log.Printf("Partial consistency (child) after: %s", time.Since(start))
}
if len(keys1) > 0 && len(keys2) > 0 {
break
}
}
// Check that it resolves at some point
d := time.Since(start)
ms := float64(d.Microseconds()) / 1000.0
times = append(times, fmt.Sprintf("%f", ms))
log.Printf("Consistency after: %f ms", ms)
}
log.Printf("Iteration done")
log.Printf("csv column data:\n%s", strings.Join(times, "\n"))
}
func orPanic(err error) {
if err != nil {
panic(err)
}
}
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.025 0.01 0.005
10.594 11.1785 34.581 102.38 25.425 215.40 177.203 132.801 178.653 244.766 1381.85 814.799 1782.795
8.4148 12.4489 30.332 26.586 68.954 74.960 114.166 115.937 73.5970 107.438 1550.39 2029.71 2100.881
8.8305 11.6125 42.469 26.733 77.942 35.541 30.2070 103.335 114.278 210.633 1763.28 2272.00 10015.13
9.7403 14.0977 35.685 27.681 26.211 24.040 185.357 31.6750 165.389 223.020 1251.28 2443.81 5688.346
8.9296 14.1373 24.701 66.154 25.830 66.900 26.2640 226.067 24.6680 133.155 579.646 421.656 262.1830
10.207 14.8648 24.503 28.606 67.522 142.81 139.360 366.390 250.028 24.2010 357.668 142.232 3085.924
8.7228 10.8802 22.001 20.844 72.989 98.466 61.8230 93.3690 228.328 827.833 468.765 3283.17 6148.725
9.0775 11.9103 18.634 58.288 57.798 60.240 66.0590 60.9010 282.761 84.5500 278.942 1376.76 1026.152
9.2669 12.8044 17.822 18.551 18.872 23.932 22.8550 22.0390 136.164 346.044 215.371 1240.96 5588.169
df <- read.csv("consistency_data.csv")
boxplot(df/1000,
main = "Effect of --consistency",
las = 2,
horizontal = TRUE,
names = c("0.9", "0.8", "0.7", "0.6","0.5", "0.4", "0.3", "0.2", "0.1", "0.05", "0.025", "0.01", "0.005"),
xlab="time until consistent (s)",
ylab="consistency setting",
log = "x"
)
@timonvw
Copy link

timonvw commented May 10, 2021

Super nice en interessant !! Dit is iets waar ik in ieder geval nooit aan denk wanneer ik bezig ben. Maar vanaf nu dus wel ! Echt heel tof en leerzaam. Lekker gedaan ! 💯

@OscarVanL
Copy link

Now that datastore has been migrated to "Firestore in Datastore mode" which removes the eventual-consistency limitation: https://cloud.google.com/datastore/docs/firestore-or-datastore#in_datastore_mode

Eventual consistency: Datastore queries become strongly consistent unless you explicitly request [eventual consistency](https://cloud.google.com/datastore/docs/reference/data/rpc/google.datastore.v1#google.datastore.v1.ReadOptions).

Do I need to explicitly set --consistency to 0, or is this set by default?

@hermanbanken
Copy link
Author

I'm not sure @OscarVanL. Unfortunately the emulators are closed source.

@OscarVanL
Copy link

@hermanbanken Thanks. As it turns out there is a flag --use-firestore-in-datastore-mode which makes reads always strongly consistent, which is most suitable for people who have been ported to Firestore in Datastore mode.

@hermanbanken
Copy link
Author

That is very goed information to document here as well. Thanks for the link 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment