This document describes how to run Elastiknn on the big-ann-benchmarks challenge. It's admittedly a little late in the game for this benchmarking challenge. IIRC the deadline is October 22, 2021, and I'm writing this on October 15. But hey, the neighbors aren't gonna find themselves. We can still use this as an opportunity to improve Elastiknn.
The setup is currently pretty experimental, so bring your elbow grease.
Part 1: Setup the Elastiknn project
- Clone the alexklibisz/elastiknn repo and checkout the
elastiknn-278-lucene-benchmarks
branch. That's where I've been working on the big-ann-benchmarks integration and improvements.
git clone git@github.com:alexklibisz/elastiknn.git
git fetch --all
git checkout elastiknn-278-lucene-benchmarks
- Make sure you can produce a Jar from the project. It might help to refer to the developer guide.
$ ./gradlew shadowJar
...
BUILD SUCCESSFUL ...
$ find . -name 'ann-benchmarks-*.jar'
./elastiknn-ann-benchmarks/build/libs/ann-benchmarks-7.14.1.1-all.jar
- To be extra sure things work, you can try running the test suite:
$ task cluster:run
... docker containers booting up ...
$ task jvm:test
Part 2: Setup the big-ann-benchmarks project
- Clone the harsha-simhadri/big-ann-benchmarks repo and checkout my elastiknn branch. _Make sure that this is in an adjacent directory with the elastiknn project, e.g.,
~/elastiknn
and~/big-ann-benchmarks
.
$ git clone git@github.com:harsha-simhadri/big-ann-benchmarks.git
$ git remote add alexklibisz git@github.com:alexklibisz/big-ann-benchmarks.git
$ git fetch --all
$ git checkout alexklibisz/elastiknn
- Setup your python environment according to the readmes in the repo. I just used virtualenv.
- Make sure you can run the unit tests. Here they are as standalone commands, copied from the big-ann-benchmarks CI workflow:
$ export LIBRARY=httpann_example
$ export ALGORITHM=httpann_example
$ export DATASET=random-xs
$ python install.py
$ python create_dataset.py --dataset $DATASET
$ python run.py --algorithm $ALGORITHM --max-n-algorithms 2 --dataset $DATASET --timeout 600
$ sudo chmod -R 777 results/
$ python plot.py --dataset $DATASET --output plot.png
The last command should produce output like this:
Computing knn metrics
0: http-ann-example-euclidean-1.0 1.000 3790.862
Found cached result
1: http-ann-example-euclidean-0.2 0.390 3144.295
Computing knn metrics
2: http-ann-example-euclidean-0.8 0.925 3792.054
- If that didn't work, don't proceed. Nothing else will work.
- Run the
test.sh
script. This will go over to the elastiknn directory, build the JAR, come back to the big-ann-benchmarks directory, build the docker container, download a dataset, and start running ann on that dataset. I recommend setting the DATASET variable in test.sh toDATASET=msturing-1M
. That dataset has only 1M vectors, so it's fast enough to see if things work. The problem with this dataset is that is has no ground-truth, so you can't actually compute recall. - Then set
DATASET=deep-10M
to run on a 10x larger dataset which does have grouth-truth.
Part 3: Solve ANN
Some tips:
- All of the elastiknn code for big-ann-benchmarks is in
elastiknn-ann-benchmarks/src/main/scala/com/elastiknn/annb
. The main entrypoint is Server.scala, which is an akka-http server that accepts requests from the Python Elastiknn model. - The elastiknn model in big-ann-benchmarks lives in
benchmarks/algorithms/elastiknn.py
. It's based on the HttpANN algorithm. Read this PR to understand how that works. - The runner.py is modified to expose JVM/JMX metrics on port 9091. This means we can use VisualVM to connect to this port and profile the JVM.
- Here are some hyperameter settings and results I've been able to get on the deep-10M dataset. These are running on my Dell XPS i7 w/ 6 cores, 12 threads, and 12 Lucene segments. They are abismally slow. We definitely need some algorithmic improvements to get to billion scale:
L | k | w | candidates | probes | recall | qps |
---|---|---|---|---|---|---|
100 | 3 | 1 | 100 | 3 | 0.832 | 2.641 |
100 | 3 | 1 | 1000 | 1 | 0.896 | 1.441 |
100 | 3 | 1 | 100 | 0 | 0.523 | 7.001 |
100 | 3 | 1 | 100 | 1 | 0.700 | 4.348 |
100 | 3 | 1 | 1000 | 0 | 0.756 | 1.840 |
100 | 3 | 1 | 100 | 6 | 0.893 | 1.756 |