Skip to content

Instantly share code, notes, and snippets.

@mooreniemi
Created February 13, 2022 20:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mooreniemi/92a7250fc78bca61c9209844d8acc577 to your computer and use it in GitHub Desktop.
Save mooreniemi/92a7250fc78bca61c9209844d8acc577 to your computer and use it in GitHub Desktop.
use std::time::Instant;
use faiss::{error::Error, index_factory, Index, MetricType};
use itertools_num::linspace;
fn main() -> Result<(), Error> {
let d: u32 = 64;
// let my_data: Vec<f32> = (1u16..(d * 10) as u16).map(f32::from).collect();
// above way of generating f32 ranges is limited in d size wrt default traits
// https://users.rust-lang.org/t/collect-f32-range-into-a-vector/15936/3
// we multiply by d because of how train and add take data as a single contiguous array
let my_data: Vec<f32> = linspace::<f32>(0., 1., (d * 9985) as usize).collect();
// default is L2 but this allows us to see param
// https://www.pinecone.io/learn/composite-indexes/
let mut index = index_factory(d, "IVF8,PQ32x8", MetricType::L2)?;
// allows you to see the training iterations
index.set_verbose(true);
println!("Training...");
let start = Instant::now();
// although not given same documentation, train also makes same assumption as add (see below)
index.train(&my_data)?;
println!("Training took: {:?}", start.elapsed());
let start = Instant::now();
// "This assumes a C-contiguous memory slice of vectors, where the total number of vectors is my_data.len() / d."
index.add(&my_data)?;
println!("Adding data took: {:?}", start.elapsed());
let k = 5;
let start = Instant::now();
let result = index.search(&my_data, k)?;
println!("Searching data took: {:?}", start.elapsed());
for (i, (l, d)) in result
.labels
.iter()
.take(k)
.zip(result.distances.iter())
.enumerate()
{
println!("#{}: {} (D={})", i + 1, *l, *d);
}
Ok(())
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment