sageTimNewton/Using pgvector in Postgres Cloud deployments.md

## Using pgvector in Postgres Cloud deployments.md

      
    Raw
  

              Using pgvector in Postgres Cloud deployments.md
            
          
    Summary

This gist is meant to explain how to create both AWS RDS postgres database compatible with pgvector and also microsoft Azure DB :
AWS

The straightforward steps below should create a database compatible with pgvector.

Create an RDS instance similar to the one detailled below :

Make sure you choose the postgres Engine and at least version 15 (I've only tried 15.3)
To create an RDS instance you will need a VPC with multiple subnets across different availability zones.

Once your database has been created you can connect with a tool like :-
https://sqlectron.github.io/
This lightweight tool worked well for connecting to postgreSQL but use your favourite client to connect to your database then run the following commands :-
CREATE EXTENSION vector;
SELECT typname FROM pg_type WHERE typname = 'vector';
The select statement should yield the following :-

If you see the typname of vector you have successfully enabled the pgvector extension. On AWS no more was required. I'd read about having to modify the "shared_preload_libraries" but this was not needed. Version 15 of postgreSQL seemed to have this extension already available.
Azure

Azure was a similar process to AWS. Please note this only works for postgreSQL flexible Server however the documentation I have seen suggested that version 14 of postgreSQL would be sufficient. This was NOT the case. Version 14 did not work for me. I hit the exact issue described in this webpage :-
https://learn.microsoft.com/en-us/answers/questions/1292116/pgvector-extension-in-postgres-flexible-server-is?source=docs
As soon as I introduced syntax related to the vector extension, e.g. :
 <-> 
My queries began to crash and my SQL sessions were terminated.
However as soon as I upgraded my postgreSQL version to 15 (preview version) everything began to work. To enable the pgvector extension you can do the following :

i.e. Click on the Server Parameters menu then search for azure.extensions. Under this select the Vector extension. Then restart your instance, then run the commands :-
CREATE EXTENSION vector;
SELECT typname FROM pg_type WHERE typname = 'vector';
The select statement should yield the following :-

Matrix Queries :-
Getting Started (taken from https://github.com/pgvector/pgvector)

Enable the extension (do this once in each database where you want to use it)
CREATE EXTENSION vector;
Create a vector column with 3 dimensions
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
Insert vectors
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
Get the nearest neighbors by L2 distance
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
Also supports inner product (<#>) and cosine distance (<=>)
Note: <#> returns the negative inner product since Postgres only supports ASC order index scans on operators
Storing

Create a new table with a vector column
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
Or add a vector column to an existing table
ALTER TABLE items ADD COLUMN embedding vector(3);
Insert vectors
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
Upsert vectors
INSERT INTO items (id, embedding) VALUES (1, '[1,2,3]'), (2, '[4,5,6]')
    ON CONFLICT (id) DO UPDATE SET embedding = EXCLUDED.embedding;
Update vectors
UPDATE items SET embedding = '[1,2,3]' WHERE id = 1;
Delete vectors
DELETE FROM items WHERE id = 1;
Querying

Get the nearest neighbors to a vector
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
Get the nearest neighbors to a row
SELECT * FROM items WHERE id != 1 ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
Get rows within a certain distance
SELECT * FROM items WHERE embedding <-> '[3,1,2]' < 5;
Note: Combine with ORDER BY and LIMIT to use an index
Distances

Get the distance
SELECT embedding <-> '[3,1,2]' AS distance FROM items;
For inner product, multiply by -1 (since <#> returns the negative inner product)
SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
For cosine similarity, use 1 - cosine distance
SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;
Aggregates

Average vectors
SELECT AVG(embedding) FROM items;
Average groups of vectors
SELECT category_id, AVG(embedding) FROM items GROUP BY category_id;