Skip to content

Instantly share code, notes, and snippets.

@ano
Last active April 8, 2023 23:00
Show Gist options
  • Save ano/c4dd33e8e538afa170e2a0c3eb7c842c to your computer and use it in GitHub Desktop.
Save ano/c4dd33e8e538afa170e2a0c3eb7c842c to your computer and use it in GitHub Desktop.
MySQL/ MariaDB Cosign Similarity User Defined Function
To write a user-defined function in MariaDB that calculates cosine similarity between a given vector and a vector stored in a BLOB field, you can use the following code:
``` sql
DELIMITER $$
CREATE FUNCTION cosine_similarity(vector VARCHAR(255), blob_field BLOB)
RETURNS FLOAT
DETERMINISTIC
BEGIN
-- Deserialize the BLOB field into a vector
SET @serialized_vector = blob_field;
SET @vector_type = 'float'; -- Change this to 'double' if the vector contains 64-bit values
SET @vector_size = LENGTH(@serialized_vector) / (@vector_type_size := IF(@vector_type = 'float', 4, 8));
SET @values = '';
FOR i IN 0..(@vector_size - 1) DO
SET @value = SUBSTRING(@serialized_vector, i * @vector_type_size + 1, @vector_type_size);
IF @vector_type = 'float' THEN
SET @value = CONVERT(@value, FLOAT);
ELSE
SET @value = CONVERT(@value, DOUBLE);
END IF;
SET @values = CONCAT(@values, ',', @value);
END FOR;
SET @vector = CONCAT('[', SUBSTRING(@values, 2), ']');
-- Calculate the cosine similarity
SET @dot_product = 0;
SET @norm_a = 0;
SET @norm_b = 0;
FOR i IN 1..JSON_LENGTH(vector) DO
SET @a_i = JSON_EXTRACT(vector, CONCAT('$[', i - 1, ']'));
SET @b_i = JSON_EXTRACT(@vector, CONCAT('$[', i - 1, ']'));
SET @dot_product = @dot_product + @a_i * @b_i;
SET @norm_a = @norm_a + @a_i * @a_i;
SET @norm_b = @norm_b + @b_i * @b_i;
END FOR;
SET @similarity = NULLIF(@dot_product / SQRT(@norm_a * @norm_b), 0);
RETURN @similarity;
END$$
DELIMITER ;
```
This function takes two parameters: a JSON-formatted vector and a BLOB field containing a serialized vector. It deserializes the BLOB field into a vector, and then calculates the cosine similarity between the two vectors using the dot product and unit vectors formula. The function returns the cosine similarity as a floating-point value.
To use this function, you can call it like any other SQL function:
``` sql
SELECT cosine_similarity('[1, 2, 3]', blob_field) FROM my_table;
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment