Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow filtering #132

Open
xstephen95x opened this issue Feb 7, 2019 · 7 comments
Open

Extremely slow filtering #132

xstephen95x opened this issue Feb 7, 2019 · 7 comments

Comments

@xstephen95x
Copy link

xstephen95x commented Feb 7, 2019

Not sure if this is known or not,

but I've got kafka-webview up and running with my data system. (just having the ability to view topics has been awesome!)

Our kafka topics typically contain millions of records serialized with protocol buffers.
When I attempted to write my own filter, as well as use the example string filter, I'm filtering approximately 5 records per second.

This makes filters virtually unusable for any topic with more than a handful of records.

Is this know? Are there any plans to improve this, or a known reason why? Also, perhaps its something on my end...

Also, I'm running the backend off a baremetal server, 12 physical cores, 65GB RAM. It doesnt seem to be using anywhere close to 10% of system resources.

@Crim
Copy link
Collaborator

Crim commented Feb 14, 2019

Hey @xstephen95x apologies for the slow reply, I've been out on travel.

I haven't seen such poor performance from the filtering logic even in topics with hundreds of millions of records. The filtering is done via a Kafka interceptor here.

How many partitions does your topic have? Do you have similar performance issues using the websocket streams vs just paging through records in kafka?

@tomas12
Copy link

tomas12 commented Feb 27, 2019

I have the same problem. My records are in avro format and the topic consists of over two million records. The filtering takes also extremely long - couple of hours. My topic has 2 partitions and in comparison paging through records in kafka is very fast.

@xstephen95x
Copy link
Author

xstephen95x commented Feb 27, 2019

I have a python script for reading from kafka topics (protobufs), and it reads a couple thousand records per second. So theres a considerable slowdown somewhere in the stack.
Perhaps its the deserializer? Perhaps its the interceptor you linked here? No way to know without perf profiling. Do you have a recommended way to do perf analysis on this? I've never perfed java, just c/c++.

Topic only has 1 or 2 partitions, but i dont think thats related.

Just timed it, it filters about 300 records/minute. so with millions of records it will take days.

@xstephen95x
Copy link
Author

xstephen95x commented Feb 27, 2019

I've tried using https://github.com/jvm-profiling-tools/perf-map-agent to run perf-top, and i've not been able to get anything useful.

In terms of streaming vs paging, I have not been able to successfully stream. I see a null pointer exception each time i try to do stream.

java.lang.NullPointerException: null
        at org.sourcelab.kafka.webview.ui.controller.stream.StreamController.getLoggedInUser(StreamController.java:189) ~[classes/:na]

and yes i have anon access set up

@xstephen95x
Copy link
Author

I have found that during the filtering of each record:

2019-02-27 17:29:02.961 WARN 25762 --- [p-nio-80-exec-3] o.a.k.clients.consumer.ConsumerConfig : The configuration 'RecordFilterInterceptor.recordFilterDefinitions' was supplied but isn't a known config.

gets logged to stdout.

@Crim
Copy link
Collaborator

Crim commented Feb 28, 2019

Hey @xstephen95x thanks for the detailed responses! I'll try to hit all of them here, but let me know if I missed something.

In terms of streaming vs paging, I have not been able to successfully stream. I see a null pointer exception each time i try to do stream.

Which version are you running of Kafka-WebView? That sounds a lot like a bug fixed in 2.1.3 Issue-127 Let me know if you're running version 2.1.3 or newer, and I may need to revisit this. If you're running version 2.1.2 or older, upgrading should resolve this issue.

I have found that during the filtering of each record:

2019-02-27 17:29:02.961 WARN 25762 --- [p-nio-80-exec-3] o.a.k.clients.consumer.ConsumerConfig : The configuration 'RecordFilterInterceptor.recordFilterDefinitions' was supplied but isn't a known config.
gets logged to stdout.

I think that is considered "normal" Basically if you define any non-standard configuration property that the kafka library isn't explicitly aware of, it will toss out that warning. In this case, I set a custom property to configure kafka-webviews record filter.

RE: The performance issue. The fact that you have a small number of partitions, and it sounds like paging thru the topic without filtering enabled, definitely makes me believe something is up with the filtering logic, I must be doing something silly, I just can't seem to spot it with my eyes. I believe you're right, performance profiling is going to be the best way to determine the cause here. Short of doing that, I may be able to put together a custom build for you that adds debug timing log statements to help track down the source. Is this something you would be interested in trying if I put together?

@xstephen95x
Copy link
Author

Thank you for all of your responses.

So, I upgraded to 2.1.4, and ran from the compiled jar instead of ./buildAndRun.sh, and i am now getting about 800 records filtered / minute. Good speed up, but still gonna take days to filter millions of records. So i believe thats around 80ms per record, which isn't that great.

I've been working on getting a perf analysis, but im having a hard time getting it to work with the jvm.
I've attached a flamegraph from my last attempt.

If perf isn't going to cooperate, then yes perhaps the best option is to start logging timestamps.
Although, i would also need to add them in my deserializer and filter, so not sure the best way to go about that.

flamegraph-40079.svg.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants