-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexedSeq instead of Iterator in NearestNeighborIterator [Priority Queue Serialization Error] #7
base: master
Are you sure you want to change the base?
Conversation
LGTM but does this work? I uncommented the test in
|
Hi, |
Hi @uzadude Very glad to hear that this library helped. I am no longer at LinkedIn though so I don't have permissions to change collaborators. Also given it is under |
Well, in my company (PayPal), we work with our private accounts in the public GitHub space, so we would have permissions also in the future. Maybe you still know someone there that can add you back as a collaborator with your current user account? |
There seemed to have been an issue with concurrency when returning the Iterator in the NearestNeighborIterator class inside of LSHNearestNeighborSearchModel.scala.
Iterator[(ItemId, Iterator[ItemIdDistancePair])] was changed to Iterator[(ItemId, IndexedSeq[ItemIdDistancePair])]. The iterator within the iterator is not serialized and causing a problem with the groupByKey in the getAllNearestNeighbors function. What I think was happening is that during the groupByKey the iterator within the iterator was pointing to a location in memory on a particular node, but when that iterator is copied to another node during the groupByKey it is then pointing to a random position in memory not where one expects.
As a bonus I also rewrote the groupByKey as aggregateByKey, as an aggregate would be more efficient in this case than a groupByKey. I have not done any benchmarking, but from my experience have found aggregateByKey to be more efficient.
Code compiled with ./gradlew build and passed all tests.