gingerlime/redis-request-irregularities-ruby.md

## redis-request-irregularities-ruby.md

      
    Raw
  

              redis-request-irregularities-ruby.md
            
          
    ... I couldn't resist the temptation for some pseudo-alliteration.
Need for speed

When it comes to driving, I'm a rather slow driver. I can parallel park with my eyes-closed, merge in traffic easily, but I really don't enjoy driving fast (Mind you, I take public transport these days). But when it comes to squeezing performance out of a system, I guess I'm much more attracted to the idea of things running fast. The redis attraction is therefore quite obvious. In-memory database. It doesn't get much faster than that.
Bumps on the road

It was rather surprising to come across speed issues with redis, and fairly early-on with our project. At kenHub, we are building an anatomy learning platform. The platform must adapt to each and every student, and present them with the best question for their level. To do this, we must store statistics for each and every anatomy term for each user. Redis was a natural candidate for this. Quick access to read/write individual records allows us to update the user progress on-the-fly. Retrieval is also fast when we want to show the overall performance to the user, or build a training-sequence adapted to each person (We use a minimalist form of the Leitner System for spaced-repetition). We started seeing some strange behaviour on certain pages. Specifically, when the user starts training, and on the personalised dashboard.
We fired Miniprofiler to see where the bottleneck was. We were seeing that some requests to redis were running great (completing within 2-4ms), but every now and then, we'll hit a redis query taking around 100ms. We were using ohm as a wrapper around redis, so the natural first point of call was to ask the ohm guys for assistance. We weren't lucky enough to get any feedback. I'm guessing that my question-asking style is to blame. Otherwise, maybe it was our own fault and the problem is with our deployment configuration, software stack, or something else. Whatever it was, we had to investigate it.

Route-planning

First thing to do was to see whether we can reduce the number of calls we make to redis. We noticed that we make quite a lot of requests, and with some tweaks we can reduce those. My colleague Johannes made a pull request to ohm, which makes it much easier to retrieve several records with one find command. This reduces the overall number of requests to redis in our app, reducing latency overall. But even with less requests, some were just taking too long.
Test-drives

We had to isolate the problem, and also reproduce it more-or-less reliably. So we took a few logical steps to investigate:

Trying on different platforms - we saw the same behaviour on different platforms, so we could rule-out hardware and even OS-specific issues.
Using different versions of redis. Again, no noticable difference when upgrading to the latest version, and also sanity-checking our redis configs.
Isolating whether it's data-specific or irrespective of our data. Finally we were making progress.

We were checking redis using the slowlog command. It showed us that redis was serving all requests pretty fast. More precisely around 20 microseconds (that's 0.02ms, quite a margin from 100ms). Using trial and error, we also noticed that it doesn't matter which redis command we issue. If we issue enough commands, some of them will suddenly hiccup. We were doing a simple test using redis ping, and still observing these issues. More specifically:
(1..10000).each { x = (Benchmark.measure { redis.ping }.real * 1000); puts x if x>1 }

This will print-out a fair number of examples taking around 10-15ms. This is with pure ruby, without all the Rails layers.
So then we had to try this also with unix-sockets, to exclude tcp-specific behaviour. And whilst unix-sockets were generally faster, we were still getting some outliers.
So we then reported this to redis-rb, and this time we even got some response!
Redis on 5th gear

Pieter Noordhuis, the author of the Redis Ruby library pushed us in the direction of using hiredis ruby wrapper -- which he also wrote. It was a pretty good suggestion. If you're getting tired of reading, then take a look at hiredis, and you're 90% there on boosting redis access from ruby. However, I still wasn't quite satisfied (am I ever?). Why does this happen? How does ruby compare to, say, python for accessing redis?
Snake run

I had to try the same thing with python, and see if we get anything similar happening. Running a very similar test with python wasn't getting anywhere near the same behaviour. The outliers were far fewer, and furthermore, the request time was way faster. None of the requests were taking longer than 2ms using python.
So lets switch from rails to django. If only it was that easy...
Django?

No. We didn't switch to django or python. However, there's some strange connection with our investigation. A few days later, I was getting frustrated with how slow rails loading time was. Every time I had to launch the console, run the server, run tests or do a migration, it had to load the entire stack. It was just taking too long. Compared to my experience with django, this was a huge difference (doing a quick cursary test showed me django shell loads in under 1 second and the rails console takes over 30 seconds). In my frustration I decided to ask for ways to load rails faster. Luckily, there was a solution. In fact, two solutions. One was to try the Falcon patches for ruby 1.9.3, and the other was to tune our GC settings. Both were great in terms of boosting our rails loading time. It slashed it by over 400%. But how is this related to our redis?
I was curious about those mysterious GC settings and a quick search led me to this blog post. The author was experiencing something strangely familiar. Occasional slow rendering... no redis involved, but the similarity was obvious.
Getting stuck behind the garbage truck

There are very few things more frustrating than driving on a narrow street just behind a garbage truck. It's slow, and it stinks. Looks like this is what happened to us with redis. I haven't been able to validate this completely, but I noticed that tweaking the GC (Garbage Collection) settings dropped the number of outliers significantly. Looking at the count of garbage collections, it loosely correlated with the number of slow-responses from redis! I believe the reason hiredis worked better was that it was less dependent on ruby, and so less garbage-collections were necessary (however I am not entirely sure). This might also explain why testing redis with pure-ruby (without the rails stack), the outliers were hitting around 20ms, but with the full rails stack on-top, it was getting up to 100ms. I imagine it has to do with more work involving the garbage collector, taking even longer.
The finish line

We are now looking at implementing both those solutions to avoid those speed-bumps. Tuning our GC settings + switching ohm to use hiredis. This is still in our little lab, but it's looking promising.
discussion on HN