Skip to content

Instantly share code, notes, and snippets.

@simonw
Created March 14, 2024 22:10
Show Gist options
  • Save simonw/9ff9a0ab8ab64e8aa8d160c4294c062a to your computer and use it in GitHub Desktop.
Save simonw/9ff9a0ab8ab64e8aa8d160c4294c062a to your computer and use it in GitHub Desktop.

HNSW Rust Library

This is a Rust library that implements the Hierarchical Navigable Small World (HNSW) algorithm for approximate nearest neighbor search. HNSW is an efficient algorithm for finding the nearest neighbors of high-dimensional vectors in a large dataset.

Usage

To use the HNSW library in your Rust project, add the following to your Cargo.toml file:

[dependencies]
hnsw = "0.1.0"

Then, in your Rust code, you can import and use the library as follows:

use hnsw::{HnswIndex, VectorItem, EuclideanDistance};

fn main() {
    // Create a new HNSW index
    let hnsw = HnswIndex::new(10000, 1.0 / 3.0, 16, Box::new(EuclideanDistance));

    // Add items to the index
    for i in 0..1000 {
        let vector: Vec<f64> = (0..10).map(|_| rand::random::<f64>() * 10.0).collect();
        let item = VectorItem { id: i, vector };
        hnsw.add(item).unwrap();
    }

    // Perform a search query
    let query_vec = vec![1.0, 2.0];
    let query = VectorItem { id: 9999, vector: query_vec };
    match hnsw.search(&query, 5) {
        Ok(results) => {
            println!("Search results for query {:?}:", query.vector);
            for result in results {
                println!("Found item with id: {}", result.id);
            }
        },
        Err(e) => println!("Search error: {}", e),
    }
}

API

HnswIndex

The HnswIndex struct represents an HNSW index. It provides methods for adding items to the index and performing nearest neighbor searches.

new(max_elements: usize, level_lambda: f64, max_level: usize, distance_calculator: Box<dyn DistanceCalculator>) -> Self

Creates a new HnswIndex instance.

  • max_elements: The maximum number of elements the index can hold.
  • level_lambda: The level generation probability.
  • max_level: The maximum number of levels in the index.
  • distance_calculator: A boxed trait object implementing the DistanceCalculator trait for calculating distances between vectors.

add(&self, item: VectorItem) -> Result<(), String>

Adds a new item to the index.

  • item: The VectorItem to be added.

Returns Ok(()) if the item is successfully added, or an error message if the operation fails.

search(&self, query: &VectorItem, k: usize) -> Result<Vec<VectorItem>, String>

Performs a nearest neighbor search for the given query vector.

  • query: The query VectorItem.
  • k: The number of nearest neighbors to return.

Returns a Result containing a vector of the top k nearest neighbors, or an error message if the search fails.

VectorItem

The VectorItem struct represents an item in the index.

  • id: The unique identifier of the item.
  • vector: The feature vector of the item.

DistanceCalculator

The DistanceCalculator trait defines the interface for calculating distances between vectors.

calculate(&self, item1: &VectorItem, item2: &VectorItem) -> f64

Calculates the distance between two VectorItems.

  • item1: The first VectorItem.
  • item2: The second VectorItem.

Returns the calculated distance as a f64 value.

Example

The main.rs file provides an example of how to use the HNSW library. It demonstrates creating an index, adding items to it, and performing nearest neighbor searches.

To run the example, execute the following command in the project directory:

cargo run

This will build and run the example code, displaying the search results for the test queries.

@simonw
Copy link
Author

simonw commented Mar 14, 2024

cat /tmp/hnsw.txt | llm -m claude-3-opus -s 'Write detailed usage documentation for this Rust library'

@simonw
Copy link
Author

simonw commented Mar 14, 2024

I created /tmp/hnsw.txt like this:

for file in *.rs; do                                                                                  
    if [ -f "$file" ]; then
        echo "------"
        echo "$file"
        echo "------"
        cat "$file"
    fi
done

I wrote the first version of that code with Claude 3 Haiku:

llm -m claude-3-haiku 'write Bash code that loops through every *.rs file in the current directory and for each one outputs its name, then the content of the file, then ----'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment