Skip to content

Instantly share code, notes, and snippets.

@tboeghk
Created May 26, 2021 13:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tboeghk/0844d310641b994dac924acaf94fd78f to your computer and use it in GitHub Desktop.
Save tboeghk/0844d310641b994dac924acaf94fd78f to your computer and use it in GitHub Desktop.
SOLR-15437 demonstration

💡 SOLR-15437 demonstration

I set up this small Gist to demonstrate a possible bug in Solr's Re-ranking / LTR / QueryComponent.

When combining Re-Ranking and Sorting in a query in a Solr Cloud environment on a collection with multiple shards, the result is sorted randomly.

Reproducing the error

1. Launch a small Solr ensemble

This will launch a Zookeeper node and two Solr nodes

$ docker-compose up -d

2. Create collection & data set

We use the films example data set for this. We create the two sharded films collection and load the example data:

$ curl --user solr:solr \
    "http://localhost:8983/solr/admin/collections?action=CREATE&name=films&numShards=2&collection.configName=_default"
$ curl -X POST -H 'Content-type:application/json' \
    --data-binary '{"add-field": {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' \
    "http://localhost:8983/solr/films/schema"
$ curl -X POST -H 'Content-type:application/json' \
    --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' \
    "http://localhost:8983/solr/films/schema"
$ docker exec -it solr_1 sh -c \
    "java -jar -Dc=films -Dauto /opt/solr/example/exampledocs/post.jar /opt/solr/example/films/*.json"

🔥 Steps to reproduce

The links below link to the running Docker ensemble on localhost:8983

  1. Query animation films: /select?q=animation

  2. Query animation films and boost fantasy films: /select?q=animation&rq={!rerank reRankQuery=$rqq reRankDocs=10 reRankWeight=1000}&rqq=fantasy

👍 This works like a charm, fantasy films are boosted by a factor of 1000 (re-ranked docs have a score > 1000)

  1. Sort films by id descending but retrieve results from shard1 only: /select?q=animation&rq={!rerank reRankQuery=$rqq reRankDocs=10 reRankWeight=1000}&rqq=fantasy&sort=id asc&shards=shard1

👍 The result set is in the first pass sorted by id. The first 10 documents are re-ranked according to their matches of fantasy.

  1. Now expand the query above on to the other shard: /select?q=animation&rq={!rerank reRankQuery=$rqq reRankDocs=10 reRankWeight=1000}&rqq=fantasy&sort=id asc

🔥 The result set seems to be randomly sorted

version: "2.4"
services:
zookeeper:
image: zookeeper:3.6
container_name: zookeeper
ports:
- "2181:2181"
solr_1:
image: solr:8.8.2-slim
container_name: solr_1
depends_on:
- zookeeper
ports:
- 8983:8983
environment:
- ZK_HOST=zookeeper:2181
- SOLR_HEAP=1g
solr_2:
image: solr:8.8.2-slim
container_name: solr_2
depends_on:
- zookeeper
ports:
- 8984:8984
environment:
- ZK_HOST=zookeeper:2181
- SOLR_HEAP=1g
- SOLR_PORT=8984
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment