codefromthecrypt/troubleshooting_cassandra.md

## troubleshooting_cassandra.md

      
    Raw
  

              troubleshooting_cassandra.md
            
          
    Load testing example of cassandra3 storage

Hypothesis: a recent change to the code caused a problem which would be visible under load as dropped spans.
Validation approach: create a lot of load and check if there are any dropped spans
Conclusion: Hypothesis isn't supported. there could be a different explanation for dropped spans, possibly data in nature.
Steps

Changed brave-webmvc-example to make a lot of separate requests instead of buffering.
diff --git a/webmvc4-boot/src/main/java/brave/webmvc/TracingConfiguration.java b/webmvc4-boot/src/main/java/brave/webmvc/TracingConfiguration.java
index a3e914c..6c4a979 100644
--- a/webmvc4-boot/src/main/java/brave/webmvc/TracingConfiguration.java
+++ b/webmvc4-boot/src/main/java/brave/webmvc/TracingConfiguration.java
@@ -43,7 +43,9 @@ public class TracingConfiguration extends WebMvcConfigurerAdapter {
 
   /** Configuration for how to buffer spans into messages for Zipkin */
   @Bean AsyncReporter<Span> spanReporter() {
-    return AsyncReporter.create(sender());
+    return AsyncReporter.builder(sender())
+        .messageMaxBytes(512)
+        .build();
   }
 
   /** Controls aspects of tracing such as the service name that shows up in the UI */
Run the zipkin server with autocomplete indexing
$ AUTOCOMPLETE_KEYS=http.method,environment STORAGE_TYPE=cassandra3 java -jar zipkin.jar
Once the services are running, use a high load to help flush out any concurrency problems.
$ wrk -t4 -c128 -d1m http://localhost:8081 --latency
Running 1m test @ http://localhost:8081
  4 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    68.50ms   77.77ms   1.38s    88.76%
    Req/Sec   622.08    243.02     1.27k    63.06%
  Latency Distribution
     50%   46.64ms
     75%   90.21ms
     90%  155.38ms
     99%  369.99ms
  147480 requests in 1.00m, 20.00MB read
  Socket errors: connect 0, read 78, write 2, timeout 0
Requests/sec:   2456.91
Transfer/sec:    341.14KB

$ wrk -t4 -c128 -d1m http://localhost:8081 --latency
Running 1m test @ http://localhost:8081
  4 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    36.35ms   33.26ms 432.87ms   80.94%
    Req/Sec     1.04k   166.28     1.59k    71.44%
  Latency Distribution
     50%   28.25ms
     75%   50.29ms
     90%   78.84ms
     99%  154.21ms
  248995 requests in 1.00m, 33.76MB read
Requests/sec:   4149.26
Transfer/sec:    576.14KB
Verify statistics report no dropped spans
$ curl -s localhost:9411/metrics|jq .
{
  "counter.zipkin_collector.messages.http": 102682,
  "counter.zipkin_collector.spans_dropped.http": 0,
  "gauge.zipkin_collector.message_bytes.http": 253,
  "counter.zipkin_collector.bytes.http": 38760897,
  "gauge.zipkin_collector.message_spans.http": 1,
  "counter.zipkin_collector.spans.http": 117899,
  "counter.zipkin_collector.messages_dropped.http": 0
}