Skip to content

Instantly share code, notes, and snippets.

@penberg
Last active June 17, 2024 07:33
Show Gist options
  • Save penberg/b3aa1ac40d60118843ea989ca1277acc to your computer and use it in GitHub Desktop.
Save penberg/b3aa1ac40d60118843ea989ca1277acc to your computer and use it in GitHub Desktop.

Turso/libSQL vector search

🚧 Project Status

Turso/libSQL vector search support is work-in-progress with the following features:

  • Vector column type for storing vectors in tables.
  • Vector index creation that is automatically updated on table updates.
  • Exact vector search with metadata filtering using plain SQL.
  • Approximate vector search using the new vector_top_k() function that is backed by DiskANN-based vector index.

Limitations:

The libSQL vector search has the following limitations:

  • CREATE INDEX on non-empty database results in a crash when you query the index.
  • Indexing key is always rowid instead of PRIMARY KEY

Next steps are:

  • Vector binary quantization to optimize for small disk and memory footprint.
  • Integration with SQLite query planner.

πŸš€ Getting Started

Follow these steps to get started with libSQL vector search.

1. Requirements

  • MacOS or Linux
  • GCC or clang compiler

2. Fetching sources

Fetch the source code using git from the work-in-progress vector branch:

git clone --branch vector --depth 1 https://github.com/tursodatabase/libsql.git

3. Building

cd libsql/libsql-sqlite3 && ./configure && make

4. Starting the SQL shell

You now have the libSQL library and shell built with vector search support:

$ ./sqlite3
libSQL version 0.2.3 (based on SQLite version 3.44.0) 2023-11-01 11:23:50
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
libsql> SELECT vector_distance_cos('[1.0, 2.0, 3.0]', '[4.0, 5.0, 6.0]');
0.0253681540489197

πŸ€– Usage

Creating a table with a vector column

CREATE TABLE movies (
  title TEXT, 
  year INT, 
  embedding FLOAT32(3)
);

Inserting vector data

INSERT INTO movies (title, year, embedding) 
VALUES 
  (
    'Napoleon', 
    2023, 
    vector('[1,2,3]')
  ), 
  (
    'Black Hawk Down', 
    2001, 
    vector('[10,11,12]')
  ), 
  (
    'Gladiator', 
    2000, 
    vector('[7,8,9]')
  ), 
  (
    'Blade Runner', 
    1982, 
    vector('[4,5,6]')
  );

Querying vector columns

SELECT title, vector_extract(embedding), vector_distance_cos(embedding, '[5,6,7]') FROM movies;

Creating an index on vector column

CREATE INDEX movies_idx USING diskann_cosine_ops ON movies (embedding);

Finding top-k similar rows

Full table scan:

SELECT title, year FROM movies ORDER BY vector_distance_cos(embedding, '[3,1,2]') LIMIT 3;

Indexed lookup:

SELECT 
  title, 
  year 
FROM 
  vector_top_k('movies_idx', '[4,5,6]', 3) 
JOIN
  movies 
ON 
  movies.rowid = id;

Filtering by metadata

Full table scan:

SELECT * FROM movies WHERE year >= 2020 ORDER BY vector_distance_cos(embedding, '[3,1,2]') LIMIT 3;

Indexed lookup:

SELECT 
  title, 
  year 
FROM 
  vector_top_k('movies_idx', '[4,5,6]', 3) 
JOIN
  movies 
ON 
  movies.rowid = id
WHERE
  year >= 2020;

πŸ“š API reference

The vector(A) function takes a text format vector A and converts it to a binary representation of the vector using 32-bit floating point precision. Use that function with SQL INSERT statement to store vector data in tables.

The vector_extract(X) function does the reverse: it translates a binary vector data X into text format. Use the function with the SELECT statement to inspect vector data.

The vector_distance_cos(X, Y) function calculates cosine similarity between vectors X and Y.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment