Summary
This gist is meant to explain how to create both AWS RDS postgres database compatible with pgvector and also microsoft Azure DB :
AWS
The straightforward steps below should create a database compatible with pgvector.
- Create an RDS instance similar to the one detailled below :
- Make sure you choose the postgres Engine and at least version 15 (I've only tried 15.3)
- To create an RDS instance you will need a VPC with multiple subnets across different availability zones.
Once your database has been created you can connect with a tool like :- https://sqlectron.github.io/ This lightweight tool worked well for connecting to postgreSQL but use your favourite client to connect to your database then run the following commands :-
CREATE EXTENSION vector;
SELECT typname FROM pg_type WHERE typname = 'vector';
The select statement should yield the following :-
If you see the typname of vector you have successfully enabled the pgvector extension. On AWS no more was required. I'd read about having to modify the "shared_preload_libraries" but this was not needed. Version 15 of postgreSQL seemed to have this extension already available.
Azure
Azure was a similar process to AWS. Please note this only works for postgreSQL flexible Server however the documentation I have seen suggested that version 14 of postgreSQL would be sufficient. This was NOT the case. Version 14 did not work for me. I hit the exact issue described in this webpage :- https://learn.microsoft.com/en-us/answers/questions/1292116/pgvector-extension-in-postgres-flexible-server-is?source=docs As soon as I introduced syntax related to the vector extension, e.g. :
<->
My queries began to crash and my SQL sessions were terminated. However as soon as I upgraded my postgreSQL version to 15 (preview version) everything began to work. To enable the pgvector extension you can do the following :
i.e. Click on the Server Parameters menu then search for azure.extensions. Under this select the Vector extension. Then restart your instance, then run the commands :-
CREATE EXTENSION vector;
SELECT typname FROM pg_type WHERE typname = 'vector';
The select statement should yield the following :-
Matrix Queries :-
Getting Started (taken from https://github.com/pgvector/pgvector)
Enable the extension (do this once in each database where you want to use it)
CREATE EXTENSION vector;
Create a vector column with 3 dimensions
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
Insert vectors
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
Get the nearest neighbors by L2 distance
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
Also supports inner product (<#>
) and cosine distance (<=>
)
Note: <#>
returns the negative inner product since Postgres only supports ASC
order index scans on operators
Storing
Create a new table with a vector column
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
Or add a vector column to an existing table
ALTER TABLE items ADD COLUMN embedding vector(3);
Insert vectors
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
Upsert vectors
INSERT INTO items (id, embedding) VALUES (1, '[1,2,3]'), (2, '[4,5,6]')
ON CONFLICT (id) DO UPDATE SET embedding = EXCLUDED.embedding;
Update vectors
UPDATE items SET embedding = '[1,2,3]' WHERE id = 1;
Delete vectors
DELETE FROM items WHERE id = 1;
Querying
Get the nearest neighbors to a vector
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
Get the nearest neighbors to a row
SELECT * FROM items WHERE id != 1 ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
Get rows within a certain distance
SELECT * FROM items WHERE embedding <-> '[3,1,2]' < 5;
Note: Combine with ORDER BY
and LIMIT
to use an index
Distances
Get the distance
SELECT embedding <-> '[3,1,2]' AS distance FROM items;
For inner product, multiply by -1 (since <#>
returns the negative inner product)
SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
For cosine similarity, use 1 - cosine distance
SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;
Aggregates
Average vectors
SELECT AVG(embedding) FROM items;
Average groups of vectors
SELECT category_id, AVG(embedding) FROM items GROUP BY category_id;