-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely slow filtering #132
Comments
Hey @xstephen95x apologies for the slow reply, I've been out on travel. I haven't seen such poor performance from the filtering logic even in topics with hundreds of millions of records. The filtering is done via a Kafka interceptor here. How many partitions does your topic have? Do you have similar performance issues using the websocket streams vs just paging through records in kafka? |
I have the same problem. My records are in avro format and the topic consists of over two million records. The filtering takes also extremely long - couple of hours. My topic has 2 partitions and in comparison paging through records in kafka is very fast. |
I have a python script for reading from kafka topics (protobufs), and it reads a couple thousand records per second. So theres a considerable slowdown somewhere in the stack. Topic only has 1 or 2 partitions, but i dont think thats related. Just timed it, it filters about 300 records/minute. so with millions of records it will take days. |
I've tried using https://github.com/jvm-profiling-tools/perf-map-agent to run perf-top, and i've not been able to get anything useful. In terms of streaming vs paging, I have not been able to successfully stream. I see a null pointer exception each time i try to do stream.
and yes i have anon access set up |
I have found that during the filtering of each record:
gets logged to stdout. |
Hey @xstephen95x thanks for the detailed responses! I'll try to hit all of them here, but let me know if I missed something.
Which version are you running of Kafka-WebView? That sounds a lot like a bug fixed in 2.1.3 Issue-127 Let me know if you're running version 2.1.3 or newer, and I may need to revisit this. If you're running version 2.1.2 or older, upgrading should resolve this issue.
I think that is considered "normal" Basically if you define any non-standard configuration property that the kafka library isn't explicitly aware of, it will toss out that warning. In this case, I set a custom property to configure kafka-webviews record filter. RE: The performance issue. The fact that you have a small number of partitions, and it sounds like paging thru the topic without filtering enabled, definitely makes me believe something is up with the filtering logic, I must be doing something silly, I just can't seem to spot it with my eyes. I believe you're right, performance profiling is going to be the best way to determine the cause here. Short of doing that, I may be able to put together a custom build for you that adds debug timing log statements to help track down the source. Is this something you would be interested in trying if I put together? |
Thank you for all of your responses. So, I upgraded to 2.1.4, and ran from the compiled jar instead of I've been working on getting a perf analysis, but im having a hard time getting it to work with the jvm. If perf isn't going to cooperate, then yes perhaps the best option is to start logging timestamps. |
Not sure if this is known or not,
but I've got kafka-webview up and running with my data system. (just having the ability to view topics has been awesome!)
Our kafka topics typically contain millions of records serialized with protocol buffers.
When I attempted to write my own filter, as well as use the example string filter, I'm filtering approximately 5 records per second.
This makes filters virtually unusable for any topic with more than a handful of records.
Is this know? Are there any plans to improve this, or a known reason why? Also, perhaps its something on my end...
Also, I'm running the backend off a baremetal server, 12 physical cores, 65GB RAM. It doesnt seem to be using anywhere close to 10% of system resources.
The text was updated successfully, but these errors were encountered: