Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
fast way to remove large number of redis keys by pattern
# to remove all keys matching a pattern in redis
# we could use the recommended way: redis-cli --scan --pattern 'abc:*' | xargs redis-cli del
# but this can be very slow if you have lots of data (like 8G redis cluster)
# we can use the following script to remove keys (considerably faster)
import time
import logging
from rediscluster import StrictRedisCluster
logger = logging.getLogger(__name__)
client = StrictRedisCluster(startup_nodes=hosts, password=password,
skip_full_coverage_check=True)
pattern = "abc:*"
start_time = time.time()
item_count = 0
batch_size = 100000
keys = []
logger.info("Start scanning keys...")
for k in client.scan_iter(pattern, count=batch_size):
keys.append(k)
if len(keys) >= batch_size:
item_count += len(keys)
logger.info("batch delete to {} ...".format(item_count))
client.delete(*keys)
keys = []
if len(keys) > 0:
item_count += len(keys)
logger.info("batch delete to {}".format(item_count))
client.delete(*keys)
end_time = time.time()
logger.info("deleted {0} keys in {1:0.3f} ms.".format(item_count, (end_time - start_time) / 1000.0))
@adamochayon

This comment has been minimized.

Copy link

@adamochayon adamochayon commented Aug 10, 2019

Have you tried pipelining the deletion?
You could then do one pipeline to scan and find all relevant keys and other pipeline to delete them all in one round trip.

@dingmaotu

This comment has been minimized.

Copy link
Owner Author

@dingmaotu dingmaotu commented Aug 10, 2019

The key difference is to use a larger scan count, that is the batch size. It is extremely slow to scan keys in a small count (and redis-cli --scan does not provide an argument to specify it; and I suppose it uses a small default value). Since I am already using 100,000 batch in a single delete command, pipelining would not make much difference here, I think.

To do this in one round trip, you have to download all data to your local machine, and then send them back to redis server. If the data is large, it can cause some problems (for example, what if your local machine runs out of memory? what if this triggers memory swap?). So I prefer to use batches.

@andsens

This comment has been minimized.

Copy link

@andsens andsens commented May 28, 2021

Or you could just change xargs to xargs -n500 -P10, which deletes 500 keys per invocation and runs 10 clients in parallel.

@jdhao

This comment has been minimized.

Copy link

@jdhao jdhao commented Jul 20, 2021

Thanks for this gitst. This is much faster than not using count=batch_size. Saves me a lot of the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment