Skip to content

Instantly share code, notes, and snippets.

@jobergum
Created May 28, 2022 11:06
Show Gist options
  • Save jobergum/124f487a2afa9ee709fd3ea2922ee2d1 to your computer and use it in GitHub Desktop.
Save jobergum/124f487a2afa9ee709fd3ea2922ee2d1 to your computer and use it in GitHub Desktop.
schema passage {
document passage {
field doc_id type long {} #duplicated for every passage extracted from doc
field title type string {} #duplicated for every passage from doc
field passage type string {}
field embedding type tensor<float>(x[768]) {}
}
rank-profile hybrid {
inputs {
query(q): tensor<float>(x[768])
}
first-phase {
expression: 0.12*bm25(title) + 0.5*bm25(passage) + 11.3*closeness(field, embedding)
}
}
vespa query \
'yql=select title, doc_id from passage where {targetHits:100}nearestNeighbor(embedding,q) or userQuery() limit 0 | all(group(doc_id) max(10) each(output(count()) max(2) each(output(summary()))))' \
'query=hybrid is the way' \
'type=weakAnd' \
'ranking=hybrid' \
'input.query(q)=[2...........]' \
#Search for passages using hybrid retriveal, group by document id and list at max 10 unique doc_ids ordered by max relevance (as assigned by hybrid),
display 2 best scoring passages from doc per unique doc_id.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment