Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save sageTimNewton/f20583ff1f0cf0af648bc0a0a91c0083 to your computer and use it in GitHub Desktop.
Save sageTimNewton/f20583ff1f0cf0af648bc0a0a91c0083 to your computer and use it in GitHub Desktop.

Summary

This gist is meant to explain how to create both AWS RDS postgres database compatible with pgvector and also microsoft Azure DB :

AWS

The straightforward steps below should create a database compatible with pgvector.

  1. Create an RDS instance similar to the one detailled below : image
  2. Make sure you choose the postgres Engine and at least version 15 (I've only tried 15.3)
  3. To create an RDS instance you will need a VPC with multiple subnets across different availability zones.

Once your database has been created you can connect with a tool like :- https://sqlectron.github.io/ This lightweight tool worked well for connecting to postgreSQL but use your favourite client to connect to your database then run the following commands :-

CREATE EXTENSION vector;
SELECT typname FROM pg_type WHERE typname = 'vector';

The select statement should yield the following :-

image

If you see the typname of vector you have successfully enabled the pgvector extension. On AWS no more was required. I'd read about having to modify the "shared_preload_libraries" but this was not needed. Version 15 of postgreSQL seemed to have this extension already available.

Azure

Azure was a similar process to AWS. Please note this only works for postgreSQL flexible Server however the documentation I have seen suggested that version 14 of postgreSQL would be sufficient. This was NOT the case. Version 14 did not work for me. I hit the exact issue described in this webpage :- https://learn.microsoft.com/en-us/answers/questions/1292116/pgvector-extension-in-postgres-flexible-server-is?source=docs As soon as I introduced syntax related to the vector extension, e.g. :

 <-> 

My queries began to crash and my SQL sessions were terminated. However as soon as I upgraded my postgreSQL version to 15 (preview version) everything began to work. To enable the pgvector extension you can do the following :

image

i.e. Click on the Server Parameters menu then search for azure.extensions. Under this select the Vector extension. Then restart your instance, then run the commands :-

CREATE EXTENSION vector;
SELECT typname FROM pg_type WHERE typname = 'vector';

The select statement should yield the following :-

image

Matrix Queries :-

Getting Started (taken from https://github.com/pgvector/pgvector)

Enable the extension (do this once in each database where you want to use it)

CREATE EXTENSION vector;

Create a vector column with 3 dimensions

CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));

Insert vectors

INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');

Get the nearest neighbors by L2 distance

SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

Also supports inner product (<#>) and cosine distance (<=>)

Note: <#> returns the negative inner product since Postgres only supports ASC order index scans on operators

Storing

Create a new table with a vector column

CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));

Or add a vector column to an existing table

ALTER TABLE items ADD COLUMN embedding vector(3);

Insert vectors

INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');

Upsert vectors

INSERT INTO items (id, embedding) VALUES (1, '[1,2,3]'), (2, '[4,5,6]')
    ON CONFLICT (id) DO UPDATE SET embedding = EXCLUDED.embedding;

Update vectors

UPDATE items SET embedding = '[1,2,3]' WHERE id = 1;

Delete vectors

DELETE FROM items WHERE id = 1;

Querying

Get the nearest neighbors to a vector

SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

Get the nearest neighbors to a row

SELECT * FROM items WHERE id != 1 ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;

Get rows within a certain distance

SELECT * FROM items WHERE embedding <-> '[3,1,2]' < 5;

Note: Combine with ORDER BY and LIMIT to use an index

Distances

Get the distance

SELECT embedding <-> '[3,1,2]' AS distance FROM items;

For inner product, multiply by -1 (since <#> returns the negative inner product)

SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;

For cosine similarity, use 1 - cosine distance

SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;

Aggregates

Average vectors

SELECT AVG(embedding) FROM items;

Average groups of vectors

SELECT category_id, AVG(embedding) FROM items GROUP BY category_id;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment