Skip to content

Instantly share code, notes, and snippets.

@codefromthecrypt
Created January 27, 2019 12:38
Show Gist options
  • Save codefromthecrypt/7b617f9d8ff2f7766ffb5939e3d3b311 to your computer and use it in GitHub Desktop.
Save codefromthecrypt/7b617f9d8ff2f7766ffb5939e3d3b311 to your computer and use it in GitHub Desktop.
troubleshooting cassandra

Load testing example of cassandra3 storage

Hypothesis: a recent change to the code caused a problem which would be visible under load as dropped spans.

Validation approach: create a lot of load and check if there are any dropped spans

Conclusion: Hypothesis isn't supported. there could be a different explanation for dropped spans, possibly data in nature.

Steps

Changed brave-webmvc-example to make a lot of separate requests instead of buffering.

diff --git a/webmvc4-boot/src/main/java/brave/webmvc/TracingConfiguration.java b/webmvc4-boot/src/main/java/brave/webmvc/TracingConfiguration.java
index a3e914c..6c4a979 100644
--- a/webmvc4-boot/src/main/java/brave/webmvc/TracingConfiguration.java
+++ b/webmvc4-boot/src/main/java/brave/webmvc/TracingConfiguration.java
@@ -43,7 +43,9 @@ public class TracingConfiguration extends WebMvcConfigurerAdapter {
 
   /** Configuration for how to buffer spans into messages for Zipkin */
   @Bean AsyncReporter<Span> spanReporter() {
-    return AsyncReporter.create(sender());
+    return AsyncReporter.builder(sender())
+        .messageMaxBytes(512)
+        .build();
   }
 
   /** Controls aspects of tracing such as the service name that shows up in the UI */

Run the zipkin server with autocomplete indexing

$ AUTOCOMPLETE_KEYS=http.method,environment STORAGE_TYPE=cassandra3 java -jar zipkin.jar

Once the services are running, use a high load to help flush out any concurrency problems.

$ wrk -t4 -c128 -d1m http://localhost:8081 --latency
Running 1m test @ http://localhost:8081
  4 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    68.50ms   77.77ms   1.38s    88.76%
    Req/Sec   622.08    243.02     1.27k    63.06%
  Latency Distribution
     50%   46.64ms
     75%   90.21ms
     90%  155.38ms
     99%  369.99ms
  147480 requests in 1.00m, 20.00MB read
  Socket errors: connect 0, read 78, write 2, timeout 0
Requests/sec:   2456.91
Transfer/sec:    341.14KB

$ wrk -t4 -c128 -d1m http://localhost:8081 --latency
Running 1m test @ http://localhost:8081
  4 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    36.35ms   33.26ms 432.87ms   80.94%
    Req/Sec     1.04k   166.28     1.59k    71.44%
  Latency Distribution
     50%   28.25ms
     75%   50.29ms
     90%   78.84ms
     99%  154.21ms
  248995 requests in 1.00m, 33.76MB read
Requests/sec:   4149.26
Transfer/sec:    576.14KB

Verify statistics report no dropped spans

$ curl -s localhost:9411/metrics|jq .
{
  "counter.zipkin_collector.messages.http": 102682,
  "counter.zipkin_collector.spans_dropped.http": 0,
  "gauge.zipkin_collector.message_bytes.http": 253,
  "counter.zipkin_collector.bytes.http": 38760897,
  "gauge.zipkin_collector.message_spans.http": 1,
  "counter.zipkin_collector.spans.http": 117899,
  "counter.zipkin_collector.messages_dropped.http": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment