Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] <title>Optimization suggestion: Implementation of refined JMX monitoring metrics #2908

Closed
doveLin0818 opened this issue Dec 27, 2024 · 12 comments
Assignees
Labels
good first issue Good for newcomers question Further information is requested

Comments

@doveLin0818
Copy link
Contributor

Question

Current Situation Analysis (Using JMX Monitoring of Kafka as an Example):

Currently, many components' monitoring information is obtained through the JMX protocol, and the backend uses a generic method to retrieve monitoring metrics for all JMX-based monitoring logic. This leads to poor scalability of monitoring metrics and a suboptimal user experience. For example, consider the following scenarios:

First: Some metrics on the page can be merged for display. Otherwise, they may appear unfriendly and unattractive, as shown in the diagram below. Of course, users could implement this through a custom protocol, but having users define these metrics introduces a learning curve and negatively impacts the user experience. If handled by the backend, it would become easier and more visually appealing.
image

Second: The generic JMX protocol-based method for retrieving metrics does not fully leverage JMX's capabilities. Since the backend must consider universality, it inevitably struggles to accommodate custom needs. For instance, the following diagram shows aggregated information for all topics in a Kafka broker. However, in daily use or within organizations, there is often more focus on the consumption status of individual topics. Let’s assume I am a HertzBeat user. To view these metrics, I first need to learn the JMX protocol for monitoring Kafka's metrics and then modify the template, as shown in the diagram below.
image
image

However, HertzBeat users can only see the metrics for each topic, without knowing which specific topic the data corresponds to, as the JMX protocol does not provide a topicName metric. Displaying this requires backend implementation.
image

Third: Companies and enterprises are also concerned about Kafka rebalance issues. This is difficult to achieve with the generic JMX monitoring method or user-customized monitoring templates.

and many metrics cannot be implemented through the generic protocol...

In a word: The generic JMX method brings issues related to scalability. Custom development for components (such as Kafka) is needed. On the one hand, requiring users to learn JMX protocol metrics is a cost, which may lead to HertzBeat losing some competitive edge. On the other hand, the generic JMX monitoring method cannot fully harness JMX's capabilities.

If the community deems it necessary, I can attempt to customize JMX-based monitoring metrics for Kafka as a preliminary exploration.I will attempt to make Kafka's metrics more comprehensive and universal, while maintaining HertzBeat's design principles

@doveLin0818 doveLin0818 added the question Further information is requested label Dec 27, 2024
@zhangshenghang
Copy link
Member

thanks , @doveLin0818

I think this is a good idea. Can you consider optimizing both Kafka Client and Kafka JMX?
First, you can design it first, and then everyone can discuss. One thing to note: While optimizing, the newly added code should be as universal as possible.

For example, which indicators you will monitor.

@tomsun28
Copy link
Contributor

+1 i think it's a good idea 👍 . It is recommended to describe your custom design here before implementing it.

@doveLin0818
Copy link
Contributor Author

@tomsun28 @zhangshenghang ok, I will try to balance both practicality and HertzBeat's design principles.

@doveLin0818
Copy link
Contributor Author

doveLin0818 commented Dec 28, 2024

Background: Due to the limited scalability of generic JMX collection methods, it is necessary to develop a customized JMX collection solution.

Modification Process: In order to retain HertzBeat's design philosophy while supporting JMX customization, we need to first check if the current collection metrics are already registered for customization before collection. Taking Kafka's objectName=kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=* as an example, we need to customize this monitoring. The process involves registering Kafka as an app in the CustomizedJmxFactory (JMX Monitoring Customization Factory) and then creating a KafkaJmxValidator to specifically handle Kafka's customization. The details are as follows:
image

image

In summary, these components are used to determine whether the current monitoring needs to be customized. The logic for determining this is as follows:
image

If the current monitoring scenario is not effectively registered, the normal process will be followed.
If it is registered, the customization logic will be triggered, as shown in the diagram below (using Kafka as an example):
image

image image image image

Here is the preliminary demo:
image

This modification process effectively addresses the scalability issue of the generic JMX protocol, while also preserving the design principle that allows users to customize their protocols.

The newly added functionality has been abstracted into a factory, making it more generic. If the community agrees with this approach, I will continue to improve it. @tomsun28 @zhangshenghang

@tomsun28
Copy link
Contributor

👍👍👍 Hi, Thanks! Can you provide the implementation of this code request.getObjectInstanceSet()? It seem is a keyone.
image

@zhangshenghang
Copy link
Member

@doveLin0818 Thanks. Does Kafka's JMX monitoring support all versions? The differences among different versions also need to be considered.

@doveLin0818
Copy link
Contributor Author

doveLin0818 commented Dec 29, 2024

👍👍👍 Hi, Thanks! Can you provide the implementation of this code request.getObjectInstanceSet()? It seem is a keyone. image

@tomsun28 hi tom,this is the parameter passed from upstream, refer to the general code of JMX:
image
image

@doveLin0818
Copy link
Contributor Author

@doveLin0818 Thanks. Does Kafka's JMX monitoring support all versions? The differences among different versions also need to be considered.

The point you made is very important, but jmx theoretically supports all kafka versions. I will consider this issue, such as using the general jmx code as the backup logic.

@zhangshenghang
Copy link
Member

@doveLin0818 Thanks. Does Kafka's JMX monitoring support all versions? The differences among different versions also need to be considered.

The point you made is very important, but jmx theoretically supports all kafka versions. I will consider this issue, such as using the general jmx code as the backup logic.

Yes, confirm to avoid different keys of JMX. I have found this problem on many services. The key formats of different versions of JMX are different.

@tomsun28
Copy link
Contributor

hi, I found that the difference between old and new is the currentObject.getKeyProperty("topic"). Can we customize the implementation by designing jmx template protocol instead of hard coding it?
image

image

you can find below. as the metrics Name, can we design a way or config to use the value of keyproperty as an metric value? so that user donot need hard code.

image

@tomsun28
Copy link
Contributor

Of course, your design is also universal, for other custom metrics that cannot be configured in the protocol. We recommend that support custom configuration in the template first, then hard code the way

@doveLin0818
Copy link
Contributor Author

hi, I found that the difference between old and new is the currentObject.getKeyProperty("topic"). Can we customize the implementation by designing jmx template protocol instead of hard coding it? image

image

you can find below. as the metrics Name, can we design a way or config to use the value of keyproperty as an metric value? so that user donot need hard code.

image

hi tom, @tomsun28
I currently cannot obtain keyproperty through aliasFields. If I want to obtain keyproperty, I still need hard coding to support it. However, this is only a small function for kafka. Other monitoring apps may not need keyproperty, so it is not appropriate to write it in a general template.
image

And when I tried to merge ReplicaManager monitoring information through aliasFields today, I also encountered some incompatibility issues. When designing the template kafka.server:type=ReplicaManager,name=*, this MBean, I found that aliasFields did not support me to merge because their attributes are all "Value". In fact, ReplicaManager has 12 indicators, and heartbeat currently only shows two. If users want to see these 12 indicators, they need to write 12 redundant copies.
image
image

So if it is necessary to fully utilize the capabilities of JMX, I currently have no good way to change it through templates, and I prefer customized development, because customization does not affect users to customize monitoring templates themselves, but only increases the workload of writing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers question Further information is requested
Projects
Development

No branches or pull requests

3 participants