-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE]Binary vector support with Lucene #1857
Comments
I will pick this one up, thanks! |
IssueThe main issue here is that the Lucene It looks like this issue has been discussed a few places in Lucene:
And ultimately what Lucene has done, at least for now, is rather than introduce Changes NeededMapper related changes:
@heemin32 @navneet1v @jmazanec15 do these changes sound reasonable to you? Let me know if you think I missed any big parts too. |
I think there's potentially some more discussion to be had with Lucene as well. It seems that now there are 2 ways to provide distance calculations:
This seems confusing because this means the format can completely ignore what is encoded into it's own file, so at the very least I think we can probably start a discussion on how the future of |
@jed326 Thanks for all the details.
If a codec is not present in backwards compatible codec then it doesn't mean that it shouldn't be use. Actually its the other way round. Nevertheless, as per this comment: apache/lucene#13288 (comment) Lucene doesn't officially support the BitVectors so we have to create our own KNNVectorsFormat and Scorer. But we can take some reference from BitVectorsFormat from Lucene. Another reason for the implementation is, even if we make changes in Lucene, Lucene is already moved to 10.x version and Opensearch currently have no plans to upgrade Lucene to 10. So, in the light of that I think we should implement out own format.
On this I completely agree that we should start a discussion on Lucene and understand what is the long term plan/direction. |
Thanks @navneet1v. I opened a discussion on Lucene for this: apache/lucene#14025, please take a look when you get a chance! |
Benchmarking Results:
The benchmarking was conducted on two clusters that were identical aside from their data nodes. The FP32 vectors used 8 In line with the Faiss results, the Lucene binary vector implementation was able to deliver comparable query latency times while reducing storage by 97%. |
Thanks @owenhalpert for gathering the benchmarking results! |
Similar to #1764, I like to see binary vector support with Lucene engine
The text was updated successfully, but these errors were encountered: