Skip to content

Instantly share code, notes, and snippets.

@observerss
Last active June 4, 2019 01:45
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save observerss/7129377 to your computer and use it in GitHub Desktop.
Save observerss/7129377 to your computer and use it in GitHub Desktop.
Redis as database: dealing with indexes and composite indexes

Using Reids As Database

Redis is very performant, due to performance consideration, sometimes we need to use Redis as a database, when querying a redis database, we need to consider indexing and filtering.

Single Index

We can use SortedSet as Index.

For example, for a table of schema (id int, name text, age int, ...), we could save each row in a Hash, keyed by id, and save (age, id) pairs in SortedSet.

For a query like "select * from table order by age limit OFFSET, LIMIT", first we use zrange to get ids from SortedSet, then do some hmgets to fetch the whole row.

Compound Index

But if we need to do where with order, things got complicated.

An easy way to do it is to use "compound indexes", we save ((field1, field2), id) instead of (field, id), make sure that (field1, field2) is fully ordered in the common sense.

This is fine when you only do it a few times for a few queries, but it's not ok for queries that may filter on several fields and order by several fields, too. In this case, number of compound indexes will soon be too large to manage.

Combine Indexes

If compound index is not available, what do database do to a query like "select * from table where score>=90 order by age limit OFFSET, LIMIT"?

Well, a straight forward method would be like below:

  1. load ranking 1-100 from "index" age
  2. see if these rows meet the where criteria
    1. if so, append it to a temporary index
    2. else, skip it
  3. if temporary index contains more than OFFSET+LIMIT elements,
    1. we then slice the temporary index and return
    2. otherwise, go to step 1, increase the ranking numbers we load.

Network Communication Problem

The only problem for this algorithm is we need to do client-server communication hundreds or thousands of times to do a simple query.

Luckily, redis 2.6+ has lua built in, so it's possible to write lua scripting calculating in the server side(redis side), and return results like a real db.

This gist is an attempt to illustrate the possible combound index implementation in lua scripting.

#!/usr/bin/env python
import time
import redis
import random
import hashlib
r = redis.Redis()
def gen_zsets():
""" generate indexes for testing """
r.delete('index1')
r.delete('index2')
print('generating test values')
p = r.pipeline()
for id in range(100001, 500000):
p.zadd('index1', 'shop{}'.format(id), random.randint(1, 10))
p.zadd('index2', 'shop{}'.format(id), random.randint(1, 10))
p.execute()
print('generated')
def find(between, sort):
""" try to execute a query like below
select id from table
where betweens[0] between (between[1], between[2])
order by sort[0]
limit sort[1], sort[2]
"""
script = """
local idx1, lowerbound, upperbound = KEYS[1], ARGV[1]+0, ARGV[2]+0
local idx2, offset, limit = KEYS[2], ARGV[3]+0, ARGV[4]+0
local result = {}
local batchoffset = 0
local batchsize = limit * 5
local found = 0
local i = 0
while #result < limit do
local tempshopids = redis.call("zrevrange", idx2, batchoffset, batchoffset+batchsize)
for _, shopid in pairs(tempshopids) do
local score = redis.call('zscore', idx1, shopid) + 0
if score >= lowerbound and score <= upperbound then
found = found + 1
if found > offset then
result[#result+1] = shopid
if #result == limit then
break
end
end
end
end
batchoffset = batchoffset + batchsize
end
return result
"""
sha = hashlib.sha1(script).hexdigest()
try:
return r.evalsha(sha+'1', 2, between[0], sort[0], between[1], between[2], sort[1], sort[2])
except:
return r.eval(script, 2, between[0], sort[0], between[1], between[2], sort[1], sort[2])
def benchmark():
limit = 20
times = 100
t1 = time.time()
for _ in range(times):
offset = random.randint(0, 1000)
find(between=('index1', 1, 6), sort=('index2', offset, limit))
t2 = time.time()
print('query {} times offset with 1000 cost {}s'.format(times, t2-t1))
if __name__ == '__main__':
if not r.exists('index1') or not r.exists('index2'):
gen_zsets()
#print find(between=('index1', 1, 6), sort=('index2', 20, 20))
benchmark()
@williamyeny
Copy link

Just want to mention that you misspelled "Redis" in your title :^)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment