-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider add a distinct stage? #22
Comments
Use statefulMapConcat? |
For what it's worth, for streaming "uniqueness detection in face of a crap-ton of elements" bloom filters can be used. Not sure how to handle dependencies for that here though. |
There are many ways to do that, BloomFilters is only one. Alternative is to keep a buffer of bounded size of seen elements, and remove the oldest entry once the buffer gets full. I think we can have multiple style of dedupe operators. |
I consider Bloom filter as a higher order choice. I use Akka stream + Cassandra to provide a reactive query api. And in my scenario all the streams are short lived and a buffer in memory dose not really hurt. As to dedupe, I think distinct is a bit different from it and would be more intuitive. |
Guava has a BloomFilter in beta phase though. |
Recently I need a distinct stage which at first sight I think it should take a buffer holding distinct elements and preventing duplicated elements from pushing downstream.
Considering that the stream may never stop, it could be dangerous when the buffer keeps growing. But in my scenario the stream will definitely stop and the buffer size is predictable.
Any suggestion?
BTW, I've seen akka/akka#19395 proposing for adding a dedupe stage.
The text was updated successfully, but these errors were encountered: