Skip to content

Instantly share code, notes, and snippets.

@vsraptor
Created June 25, 2021 01:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vsraptor/8beb0c04fe5914c50d6d307393b34893 to your computer and use it in GitHub Desktop.
Save vsraptor/8beb0c04fe5914c50d6d307393b34893 to your computer and use it in GitHub Desktop.
Keyvi Index KV store
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## keyvi index\n",
"\n",
"Keyvi is a KV store.\n",
"\n",
"In comparison to other KV stores as Redis its underlining technology is not Hash function, but Finite State Trancducers (FTS).\n",
"\n",
"There are two structures Index and Dictionary.\n",
"Both share the same Data format as stored on a disk.\n",
"\n",
"The difference is the Dictionary is static and the Index is dynamic i.e. UPDATABLE.\n",
"\n",
"Another quirk is there could be only ONE Writer but MANY Readers.\n",
"\n",
"Below is example how Index work.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import keyvi.index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First lets create Read-Write object."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"kv = keyvi.index.Index('test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
".... then use .Set() to create KV pair"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kv.Set('abc', 'abc')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
".. then some more"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kv.Set('abcd', str(555))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"kv.Set('abxy', '23.67')"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"kv.Set('brum','789')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"kv.Set('abxyz', f'{int(55.12345678*100)}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use .Get() which returns matching object"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"match = kv.Get('abc')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"we can get back the value if such exists from this object"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'abc'"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"match.GetValue()"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"555"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kv.Get('abcd').GetValue()"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"55.120"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kv.Get('abxyz').GetValue()/100"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also can search for a key/keys even if its not exact match by specifing Levenstain distance.\n",
"\n",
"**.GetFuzzy(key, distance, len-of-exact-prefix)**"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"match2 = kv.GetFuzzy('abcd',1,1)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['abc', 555]"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[ m.GetValue() for m in match2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also open the Index as read-only as I mentioned in the begining"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"ro = keyvi.index.ReadOnlyIndex('test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We pass string as value to .Set(), but get back the correct type .."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"23.670"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ro.Get('abxy').GetValue()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"float"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(ro.Get('abaxy').GetValue())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
".. the same for integer"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"789"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ro.Get('brum').GetValue()"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"int"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(ro.Get('brum').GetValue())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because the KV file is Memory mapped reading is Faaast... and as a bonus because of the underlying technology FST everything is compressed on the fly ... win/win "
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"591 ns ± 3.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
]
}
],
"source": [
"%timeit ro.Get('brum').GetValue()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Storing sequences\n",
"\n",
"Now I will show you how to save sequences i.e. list of integers as key, value or both. \n",
"\n",
"By default Keyvi does not support that but we can convert the list to string beforehand and from string to list on the way out."
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [],
"source": [
"import struct \n",
"\n",
"# list of numbers =to=> string\n",
"def nums2str(nums, itype='H'): #B:int8,H:int16,I:int32,Q:int64\n",
" return struct.pack(f\">{len(nums)}{itype}\", *nums)\n",
"\n",
"#string =2=> tuple of numbers, use list()\n",
"def str2nums(b, itype='H'):\n",
" size = 1 if itype == 'B' else ('HIQ'.index(itype) + 1) * 2\n",
" return struct.unpack(f\">{len(b)//size}{itype}\", b)# bytes(b,'utf-8'))\t"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"b'\\x00\\x01\\x00\\x02\\x00\\x03\\x00\\x04'"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nums2str([1,2,3,4])"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1, 2, 3, 4)"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"str2nums(nums2str([1,2,3,4]))"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"kv.Set(nums2str([1,2,3,4]), 'a list key')"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'a list key'"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kv.Get(nums2str([1,2,3,4])).GetValue()"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [],
"source": [
"kv.Set(nums2str([5,6,7]), nums2str([7,8,9]))"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\x00\\x07\\x00\\x08\\x00\\t'"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"val = kv.Get(nums2str([5,6,7])).GetValue()\n",
"val"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(7, 8, 9)"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"str2nums(bytes(val,'utf-8'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> I just decided to try and seems keyvi support lists you just need to pass them as string ;)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [],
"source": [
"kv.Set('str-list','[1,2,3]')"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3]"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kv.Get('str-list').GetValue()"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"list"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(kv.Get('str-list').GetValue())"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [],
"source": [
"kv.Set('[1,2]','kwy-str-list')"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'kwy-str-list'"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kv.Get('[1,2]').GetValue()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment