Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@hrishi18pathak
Created June 23, 2016 16:55
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hrishi18pathak/f10a6301239baae84fcf00eddcc5e5ea to your computer and use it in GitHub Desktop.
Save hrishi18pathak/f10a6301239baae84fcf00eddcc5e5ea to your computer and use it in GitHub Desktop.

REDIS CLUSTER ERRORS

INTRODUCTION:

Redis cluster provides automatic sharding of data where every key is conceptually a part of something called a hash slot. There are 16384 hashslots in Redis Cluster, and these hashslots are distributed across all shards of a cluster and every shard is responsible for serving queries for a subset of these 16384 hashslots. For details of the Redis cluster refer this tutorial


MOVED ERRORS:

A MOVED response is returned by a node in the Redis Cluster if it gets a query for a slot which it does not serve. The response has the following format:

MOVED SLOTNUMBER IPADDRESS:PORTNUMBER

StackExchange.Redis tries to send the query to the node which serves the slot corresponding to the key. However, if StackExchange.Redis is unable to connect to this node for any reason (intermittent network connectivity issues, failovers between master - slave pair of a shard etc.), it will send the query to a random node which it is able to connect to. In this case a MOVED response would be returned by the node which receives this query StackExchange.Redis treats this response as an exception and reports it back to the user. Thus, a MOVED exception occurs because StackExchange.Redis is not able to reconnect to the recipient of the query after it has lost connectivity to the recipient because of one of the aforementioned reasons. For details of MOVED response returned by a node in Redis Cluster refer this article

Handling MOVED exceptions in your client application (which uses StackExchange.Redis):

  • Upgrade to the latest version of StackExchange.Redis 1.1.603. This version has several important bug fixes in the client code which interacts with a Redis Cluster.
  • Use the default connectTimeout of 5 seconds (or more) in StackExchange.Redis configuration. This would give StackExchange.Redis sufficient time to re-establish the connection, in case of a blip
  • In general, the customer application should also be more resilient to some number of MOVED errors as it can occur during a failover or a network blip. The app could retry before failing or falling back to some other location.

CLUSTER DOWN ERRORS

Following error is returned by a node in a redis cluster if the cluster (from the perspective of that node) is in a FAILED state:

"CLUSTERDOWN The cluster is down"

If a node in a Redis Cluster receiving a query, is not able to reach/connect more than half of the total number of masters in the Redis cluster, it is in a FAILED state and will return a CLUSTERDOWN response to the client issuing the query.

Handling CLUSTERDOWN:

  • Unfortunately, there is no client side mitigation for this kind of an error other than retrying the failed operation, till the cluster eventually recovers. Note that a particular node or a subset of nodes in a REDIS cluster can enter this state, because of an ongoing networking blip/connectivity issue and they will eventually recover when this issue has resolved. During this period queries landing on the nodes, that are experiencing the aforementioned condition, will fail with a CLUSTERDOWN error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment