Turso/libSQL vector search support is work-in-progress with the following features:
- Vector column type for storing vectors in tables.
- Vector index creation that is automatically updated on table updates.
- Exact vector search with metadata filtering using plain SQL.
- Approximate vector search using the new
vector_top_k()
function that is backed by DiskANN-based vector index.
Limitations:
The libSQL vector search has the following limitations:
CREATE INDEX
on non-empty database results in a crash when you query the index.- Indexing key is always
rowid
instead ofPRIMARY KEY
Next steps are:
- Vector binary quantization to optimize for small disk and memory footprint.
- Integration with SQLite query planner.
Follow these steps to get started with libSQL vector search.
- MacOS or Linux
- GCC or clang compiler
Fetch the source code using git
from the work-in-progress vector
branch:
git clone --branch vector --depth 1 https://github.com/tursodatabase/libsql.git
cd libsql/libsql-sqlite3 && ./configure && make
You now have the libSQL library and shell built with vector search support:
$ ./sqlite3
libSQL version 0.2.3 (based on SQLite version 3.44.0) 2023-11-01 11:23:50
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
libsql> SELECT vector_distance_cos('[1.0, 2.0, 3.0]', '[4.0, 5.0, 6.0]');
0.0253681540489197
CREATE TABLE movies (
title TEXT,
year INT,
embedding FLOAT32(3)
);
INSERT INTO movies (title, year, embedding)
VALUES
(
'Napoleon',
2023,
vector('[1,2,3]')
),
(
'Black Hawk Down',
2001,
vector('[10,11,12]')
),
(
'Gladiator',
2000,
vector('[7,8,9]')
),
(
'Blade Runner',
1982,
vector('[4,5,6]')
);
SELECT title, vector_extract(embedding), vector_distance_cos(embedding, '[5,6,7]') FROM movies;
CREATE INDEX movies_idx USING diskann_cosine_ops ON movies (embedding);
Full table scan:
SELECT title, year FROM movies ORDER BY vector_distance_cos(embedding, '[3,1,2]') LIMIT 3;
Indexed lookup:
SELECT
title,
year
FROM
vector_top_k('movies_idx', '[4,5,6]', 3)
JOIN
movies
ON
movies.rowid = id;
Full table scan:
SELECT * FROM movies WHERE year >= 2020 ORDER BY vector_distance_cos(embedding, '[3,1,2]') LIMIT 3;
Indexed lookup:
SELECT
title,
year
FROM
vector_top_k('movies_idx', '[4,5,6]', 3)
JOIN
movies
ON
movies.rowid = id
WHERE
year >= 2020;
The vector(A)
function takes a text format vector A
and converts it to a binary representation of the vector using 32-bit floating point precision. Use that function with SQL INSERT
statement to store vector data in tables.
The vector_extract(X)
function does the reverse: it translates a binary vector data X
into text format. Use the function with the SELECT
statement to inspect vector data.
The vector_distance_cos(X, Y)
function calculates cosine similarity between vectors X
and Y
.