Best Practices for Azure Redis
Below are a set of best practices that I recommend for most customers. This information is based on my experience helping hundreds of Azure Redis customers investigate various issues.
Configuration and Concepts
- Use Standard or Premium Tier for Production systems. The Basic Tier is a single node system with no data replication and no SLA. Also, use at least a C1 cache. C0 caches are really meant for simple dev/test scenarios since they have a shared CPU core, very little memory, are prone to "noisy neighbor", etc.
- Remember that Redis is an In-Memory data store. Read this article so that you are aware of scenarios where data loss can occur.
- Configure your client library to use a "connect timeout" of at least 10 to 15 seconds, giving the system time to connect even under higher CPU conditions. If your client or server tend to be under high load, use an even larger value. If you use a large number of connections in a single application, consider adding some type of staggered reconnect logic to prevent a flood of connections hitting the server at the same time.
- Develop your system such that it can handle connection blips due to patching and failover.
- Configure your maxmemory-reserved setting to improve system responsiveness under memory pressure conditions, especially for write-heavy workloads or if you are storing larger values (100KB or more) in Redis. I would recommend starting with 10% of the size of your cache, then increase if you have write-heavy loads. See some considerations when selecting a value.
- Redis works best with smaller values, so consider chopping up bigger data into multiple keys. In this Redis discussion, 100kb is considered "large". Read this article for an example problem that can be caused by large values.
- Locate your cache instance and your application in the same region. Connecting to a cache in a different region can significantly increase latency and reduce reliability. Connecting from outside of Azure is supported, but not recommended especially when using Redis as a cache (as opposed to a key/value store where latency may not be the primary concern).
- Reuse connections - Creating new connections is expensive and increases latency, so reuse connections as much as possible. If you choose to create new connections, make sure to close the old connections before you release them (even in managed memory languages like .NET or Java).
- Avoid Expensive Commands - Some redis operations, like the "KEYS" command, are VERY expensive and should be avoided. Read more here
There are several things related to memory usage within your Redis server instance that you may want to consider. Here are a few:
- Choose an eviction policy that works for your application. The default policy for Azure Redis is volitile-lru, which means that only keys that have an expiration value configured will be considered for eviction. If no keys have an expiration value, then the system won't evict any keys and clients will get out of memory errors when trying to write to Redis. If you want the system to allow any key to be evicted if under memory pressure, then you may want to consider the allkeys-lru policy.
- Set an expiration value on your keys. This will help expire keys proactively instead of waiting until there is memory pressure. Evictions due to memory pressure can cause additional load on your server, so it is always best to stay ahead of the curve whenever possible. See the Expire and ExpireAt commands for more details.
Client Library Specific Guidance
- StackExchange.Redis (.NET)
- Java - Which client should I use?
- Lettuce (Java)
- Jedis (Java)
- Asp.Net Session State Provider
When is it safe to retry?
Unfortunately, there is no easy answer. Each application needs to decide what operations can be retried and which cannot because each has different requirements and inter-key dependencies. Things you should consider:
- You can get client-side errors even though Redis successfully ran the command you asked it to run. For example:
- Timeouts are a client-side concept. If the operation reached the server, the server will run the command even if the client gives up waiting.
- When an error occurs on the socket connection, it is indeterminate whether or not the operation ran on the server. For example, the connection error could happen after the request was processed by the server but before the response was received by the client.
- How does my application react if I accidentally run the same operation twice? For instance, what if I increment an integer twice instead of just once? Is my application writing to the same key from multiple places? What if my retry logic overwrites a value set by some other part of my app?
If you would like to test how your code works under error conditions, one options would be to use the Reboot Feature as a way to trigger such connection blips, then see how your application reacts.
- Start by using
redis-benchmark.exeto get a feel for possible throughput/latency before writing your own perf tests. Redis-benchmark documentation can be found here http://redis.io/topics/benchmarks. Note that redis-benchmark does not support SSL, so you will have to enable the Non-SSL port through the Azure Portal before you run the test. A windows compatible version of redis-benchmark.exe can be found here
- The client VM used for testing should be in the same region as your Redis cache instance.
- We recommend using Dv2 VM Series for your client as they have better hardware and will give the best results.
- Make sure your client VM you choose has at least as much computing and bandwidth capability as the cache you are testing.
- Enable VRSS on the client machine if you are on Windows. See here for details. Example powershell script:
PowerShell -ExecutionPolicy Unrestricted Enable-NetAdapterRSS -Name (Get-NetAdapter).Name
- Premium tier Redis instances will have better network latency and throughput because they are running on better hardware for both CPU and Network.
Note: Our observed performance results are published here for your reference. Also, be aware that SSL/TLS adds some overhead, so you may get different latencies and/or throughput if you are using transport encryption.
Setup the cache:
redis-benchmark.exe -h yourcache.redis.cache.windows.net -a yourAccesskey -t SET -n 10 -d 1024
Test Latency for GET requests using a 1k payload:
redis-benchmark.exe -h yourcache.redis.cache.windows.net -a yourAccesskey -t GET -d 1024 -P 50 -c 4
Test throughput you are able to get using Pipelined GET requests with 1k payload.
redis-benchmark.exe -h yourcache.redis.cache.windows.net -a yourAccesskey -t GET -n 1000000 -d 1024 -P 50 -c 50