-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding linear retriever to support weighted sums of sub-retrievers #120222
Conversation
Documentation preview: |
Hi @pmpailis, I've created a changelog YAML for you. |
…rch into add_linear_retriever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking much better. I have a concern around testing:
Do we have a test that specifically exercises the path when the different retrievers return different doc IDs? (e.g. they match non-overlapping doc sets).
...lugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java
Outdated
Show resolved
Hide resolved
Added a test to account for this in ea1787f |
...gin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverComponent.java
Show resolved
Hide resolved
...plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/MinMaxScoreNormalizer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💔 Backport failed
You can use sqren/backport to manually backport by running |
This PR adds a new
linear
retriever to facilitate hybrid search, that would be able to linearly combine the results of other sub-retrievers and compute the final score of a document based on the weighted sum of each sub-components.Each sub-component can specify the following elements:
retriever
-> specifies how we will compute the top documentsnormalizer
-> specifies how we want to normalize the top documents for this retriever (so that we can ensure that all scores fall within the same range)weight
-> theweight
for the normalized score if the final weighted sum computationPagination is similar to that of
rrf
's retriever, i.e. we compute the globalrank_window_size
docs and pagination is only available within these bounds.So, working through an example, let's say that we perform a hybrid search query where:
standard
retriever, and normalize the scores to a[0, 1]
rangeknn
retriever, without normalizing the documents as wellscore = 1.5 * standard + 2.5 * knn
Sample syntax: