observerss/RedisAsDB.rst

## RedisAsDB.rst

      
    Raw
  

              RedisAsDB.rst
            
          
    Using Reids As Database

Redis is very performant, due to performance consideration, sometimes we need to use Redis as a database, when querying a redis database, we need to consider indexing and filtering.

Single Index

We can use SortedSet as Index.
For example, for a table of schema (id int, name text, age int, ...), we could save each row in a Hash, keyed by id, and save (age, id) pairs in SortedSet.
For a query like "select * from table order by age limit OFFSET, LIMIT", first we use zrange to get ids from SortedSet, then do some *hmget*s to fetch the whole row.

Compound Index

But if we need to do where with order, things got complicated.
An easy way to do it is to use "compound indexes", we save ((field1, field2), id) instead of (field, id), make sure that (field1, field2) is fully ordered in the common sense.
This is fine when you only do it a few times for a few queries, but it's not ok for queries that may filter on several fields and order by several fields, too. In this case, number of compound indexes will soon be too large to manage.

Combine Indexes

If compound index is not available, what do database do to a query like "select * from table where score>=90 order by age limit OFFSET, LIMIT"?
Well, a straight forward method would be like below:


load ranking 1-100  from "index" age

see if these rows meet the where criteria

if so, append it to a temporary index
else, skip it


if temporary index contains more than OFFSET+LIMIT elements,

we then slice the temporary index and return
otherwise, go to step 1, increase the ranking numbers we load.


Network Communication Problem

The only problem for this algorithm is we need to do client-server communication hundreds or thousands of times to do a simple query.
Luckily, redis 2.6+ has lua built in, so it's possible to write lua scripting calculating in the server side(redis side), and return results like a real db.
This gist is an attempt to illustrate the possible combound index implementation in lua scripting.


## redisindex.py
#!/usr/bin/env python
import time
import redis
import random
import hashlib

r = redis.Redis()

def gen_zsets():
    """ generate indexes for testing """
    r.delete('index1')
    r.delete('index2')
    print('generating test values')
    p = r.pipeline()
    for id in range(100001, 500000):
        p.zadd('index1', 'shop{}'.format(id), random.randint(1, 10))
        p.zadd('index2', 'shop{}'.format(id), random.randint(1, 10))
    p.execute()
    print('generated')

def find(between, sort):
    """ try to execute a query like below

    select id from table
    where betweens[0] between (between[1], between[2])
    order by sort[0]
    limit sort[1], sort[2]
    """
    script = """
local idx1, lowerbound, upperbound = KEYS[1], ARGV[1]+0, ARGV[2]+0
local idx2, offset, limit = KEYS[2], ARGV[3]+0, ARGV[4]+0

local result = {}
local batchoffset = 0
local batchsize = limit * 5
local found = 0
local i = 0
while #result < limit do
    local tempshopids = redis.call("zrevrange", idx2, batchoffset, batchoffset+batchsize)
    for _, shopid in pairs(tempshopids) do
        local score = redis.call('zscore', idx1, shopid) + 0
        if score >= lowerbound and score <= upperbound then
            found = found + 1
            if found > offset then
                result[#result+1] = shopid
                if #result == limit then
                    break
                end
            end
        end
    end
    batchoffset = batchoffset + batchsize
end

return result
"""
    sha = hashlib.sha1(script).hexdigest()
    try:
        return r.evalsha(sha+'1', 2, between[0], sort[0], between[1], between[2], sort[1], sort[2])
    except:
        return r.eval(script, 2, between[0], sort[0], between[1], between[2], sort[1], sort[2])

def benchmark():
    limit = 20
    times = 100
    t1 = time.time()
    for _ in range(times):
        offset = random.randint(0, 1000)
        find(between=('index1', 1, 6), sort=('index2', offset, limit))
    t2 = time.time()
    print('query {} times offset with 1000 cost {}s'.format(times, t2-t1))


if __name__ == '__main__':
    if not r.exists('index1') or not r.exists('index2'):
        gen_zsets()
    #print find(between=('index1', 1, 6), sort=('index2', 20, 20))
    benchmark()
	#!/usr/bin/env python
	import time
	import redis
	import random
	import hashlib

	r = redis.Redis()

	def gen_zsets():
	""" generate indexes for testing """
	r.delete('index1')
	r.delete('index2')
	print('generating test values')
	p = r.pipeline()
	for id in range(100001, 500000):
	p.zadd('index1', 'shop{}'.format(id), random.randint(1, 10))
	p.zadd('index2', 'shop{}'.format(id), random.randint(1, 10))
	p.execute()
	print('generated')

	def find(between, sort):
	""" try to execute a query like below

	select id from table
	where betweens[0] between (between[1], between[2])
	order by sort[0]
	limit sort[1], sort[2]
	"""
	script = """
	local idx1, lowerbound, upperbound = KEYS[1], ARGV[1]+0, ARGV[2]+0
	local idx2, offset, limit = KEYS[2], ARGV[3]+0, ARGV[4]+0

	local result = {}
	local batchoffset = 0
	local batchsize = limit * 5
	local found = 0
	local i = 0
	while #result < limit do
	local tempshopids = redis.call("zrevrange", idx2, batchoffset, batchoffset+batchsize)
	for _, shopid in pairs(tempshopids) do
	local score = redis.call('zscore', idx1, shopid) + 0
	if score >= lowerbound and score <= upperbound then
	found = found + 1
	if found > offset then
	result[#result+1] = shopid
	if #result == limit then
	break
	end
	end
	end
	end
	batchoffset = batchoffset + batchsize
	end

	return result
	"""
	sha = hashlib.sha1(script).hexdigest()
	try:
	return r.evalsha(sha+'1', 2, between[0], sort[0], between[1], between[2], sort[1], sort[2])
	except:
	return r.eval(script, 2, between[0], sort[0], between[1], between[2], sort[1], sort[2])

	def benchmark():
	limit = 20
	times = 100
	t1 = time.time()
	for _ in range(times):
	offset = random.randint(0, 1000)
	find(between=('index1', 1, 6), sort=('index2', offset, limit))
	t2 = time.time()
	print('query {} times offset with 1000 cost {}s'.format(times, t2-t1))


	if __name__ == '__main__':
	if not r.exists('index1') or not r.exists('index2'):
	gen_zsets()
	#print find(between=('index1', 1, 6), sort=('index2', 20, 20))
	benchmark()