Benchmark recovery from remote
We will use the existing workflow in: https://github.com/dliappis/stack-terraform/tree/ccr/projects/ccr
- ES: 3 node clusters, security enabled
- GCP: ES: custom-16-32768 16cpu / 32GB ram / min Skylake processor, Loaddriver: n1-standard-16 (16vcpu 60GB ram)
- AWS: ES: c5d.4xlarge 16vcpu / 32GB ram, Loaddriver: m5d.4xlarge (16vcpu 64GB ram)
- Locations:
- AWS: Leader: eu-central-1 (Frankfurt) Follower: us-east-2 (Ohio)
- GCP: Leader: eu-west-2 (Netherlands) Follower: us-central-1 (Iowa)
- Index settings: 3P/1R
- Rally tracks: geopoint / http_logs / pmc. Index settings (unless specifically changed): 3P/1R
- OS: Ubuntu 16.04
4.4.0-1061-aws
- Java version (unless specifically changed):
OpenJDK 8 1.8.0_191
- metricbeat from each node (metricbeat enabled everywhere)
- recovery-stats (from
_recovery
) every 1s - from some experiments: node-stats only for jvm/mem to provide heap usage and gc insights
- median indexing throughput and the usual results Rally provides in the summary
Criteria to compare runs:
- time taken to follower to complete recovery
- indexing throughput
- overall system usage (CPU, IO, Network)
- remote tcp compression off vs on
- Telemetry device collection frequency
- "recovery-stats-sample-interval": 1
Tracks: geopoint, http_logs, pmc Challenge: new challenge (to be created) executing in the following order: 1. Delete and create indices on leader 2. Index entire corpora 8 indexing clients using max performance (no target-throoughput set) and then stop (don't index any more) 3. Join follower, start recovery 4. Consider benchmark over when follower has recovered completely
Tracks: geopoint, http_logs, pmc Challenge: new challenge (to be created) executing in the following order: 1. Delete and create indices on leader 2. Index entire corpora 8 indexing clients using max performance (no target-throoughput set) and then stop (don't index any more) 3. Join follower, start recovery 4. Consider benchmark over when follower has recovered completely
Experiment 3: AWS, fully index corpora, initiate recovery, keep indexing at lower performance during recovery, remote tcp compression off
Tracks: geopoint, http_logs, pmc Challenge: new challenge (to be created) executing in the following order: 1. Delete and create indices on leader 2. Index some % of corpora, 8 indexing clients max performance (no target-throughput set) 3. Join follower, start recovery and keep indexing on leader at a lower throughput 4. Consider benchmark over when follower has recovered completely
Tracks: geopoint, http_logs, pmc Challenge: new challenge (to be created) executing in the following order: 1. Delete and create indices on leader 2. Index some % of corpora, 8 indexing clients max performance (no target-throughput set) 3. Join follower, start recovery and keep indexing on leader at a lower throughput 4. Consider benchmark over when follower has recovered completely
Experiment 3 results (index up to % -> recovery and index slower)
All experiments included:
38416aa
http_logs
700s
with no replication720s
after start of benchmark80000doc/s
until the end of the benchmark.Performance Charts