[Question] <title>Optimization suggestion: Implementation of refined JMX monitoring metrics #2908

doveLin0818 · 2024-12-27T10:56:32Z

Question

Current Situation Analysis (Using JMX Monitoring of Kafka as an Example):

Currently, many components' monitoring information is obtained through the JMX protocol, and the backend uses a generic method to retrieve monitoring metrics for all JMX-based monitoring logic. This leads to poor scalability of monitoring metrics and a suboptimal user experience. For example, consider the following scenarios:

First: Some metrics on the page can be merged for display. Otherwise, they may appear unfriendly and unattractive, as shown in the diagram below. Of course, users could implement this through a custom protocol, but having users define these metrics introduces a learning curve and negatively impacts the user experience. If handled by the backend, it would become easier and more visually appealing.

Second: The generic JMX protocol-based method for retrieving metrics does not fully leverage JMX's capabilities. Since the backend must consider universality, it inevitably struggles to accommodate custom needs. For instance, the following diagram shows aggregated information for all topics in a Kafka broker. However, in daily use or within organizations, there is often more focus on the consumption status of individual topics. Let’s assume I am a HertzBeat user. To view these metrics, I first need to learn the JMX protocol for monitoring Kafka's metrics and then modify the template, as shown in the diagram below.

However, HertzBeat users can only see the metrics for each topic, without knowing which specific topic the data corresponds to, as the JMX protocol does not provide a topicName metric. Displaying this requires backend implementation.

Third: Companies and enterprises are also concerned about Kafka rebalance issues. This is difficult to achieve with the generic JMX monitoring method or user-customized monitoring templates.

and many metrics cannot be implemented through the generic protocol...

In a word: The generic JMX method brings issues related to scalability. Custom development for components (such as Kafka) is needed. On the one hand, requiring users to learn JMX protocol metrics is a cost, which may lead to HertzBeat losing some competitive edge. On the other hand, the generic JMX monitoring method cannot fully harness JMX's capabilities.

If the community deems it necessary, I can attempt to customize JMX-based monitoring metrics for Kafka as a preliminary exploration.I will attempt to make Kafka's metrics more comprehensive and universal, while maintaining HertzBeat's design principles

zhangshenghang · 2024-12-27T15:34:30Z

thanks , @doveLin0818

I think this is a good idea. Can you consider optimizing both Kafka Client and Kafka JMX?
First, you can design it first, and then everyone can discuss. One thing to note: While optimizing, the newly added code should be as universal as possible.

For example, which indicators you will monitor.

tomsun28 · 2024-12-28T03:42:32Z

+1 i think it's a good idea 👍 . It is recommended to describe your custom design here before implementing it.

doveLin0818 · 2024-12-28T06:00:23Z

@tomsun28 @zhangshenghang ok, I will try to balance both practicality and HertzBeat's design principles.

doveLin0818 · 2024-12-28T12:13:11Z

Background: Due to the limited scalability of generic JMX collection methods, it is necessary to develop a customized JMX collection solution.

Modification Process: In order to retain HertzBeat's design philosophy while supporting JMX customization, we need to first check if the current collection metrics are already registered for customization before collection. Taking Kafka's objectName=kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=* as an example, we need to customize this monitoring. The process involves registering Kafka as an app in the CustomizedJmxFactory (JMX Monitoring Customization Factory) and then creating a KafkaJmxValidator to specifically handle Kafka's customization. The details are as follows:

In summary, these components are used to determine whether the current monitoring needs to be customized. The logic for determining this is as follows:

If the current monitoring scenario is not effectively registered, the normal process will be followed.
If it is registered, the customization logic will be triggered, as shown in the diagram below (using Kafka as an example):

Here is the preliminary demo:

This modification process effectively addresses the scalability issue of the generic JMX protocol, while also preserving the design principle that allows users to customize their protocols.

The newly added functionality has been abstracted into a factory, making it more generic. If the community agrees with this approach, I will continue to improve it. @tomsun28 @zhangshenghang

tomsun28 · 2024-12-29T01:22:33Z

👍👍👍 Hi, Thanks! Can you provide the implementation of this code request.getObjectInstanceSet()? It seem is a keyone.

zhangshenghang · 2024-12-29T01:30:47Z

@doveLin0818 Thanks. Does Kafka's JMX monitoring support all versions? The differences among different versions also need to be considered.

doveLin0818 · 2024-12-29T01:37:09Z

👍👍👍 Hi, Thanks! Can you provide the implementation of this code request.getObjectInstanceSet()? It seem is a keyone.

@tomsun28 hi tom,this is the parameter passed from upstream, refer to the general code of JMX：

doveLin0818 · 2024-12-29T01:50:13Z

@doveLin0818 Thanks. Does Kafka's JMX monitoring support all versions? The differences among different versions also need to be considered.

The point you made is very important, but jmx theoretically supports all kafka versions. I will consider this issue, such as using the general jmx code as the backup logic.

zhangshenghang · 2024-12-29T01:55:42Z

@doveLin0818 Thanks. Does Kafka's JMX monitoring support all versions? The differences among different versions also need to be considered.

The point you made is very important, but jmx theoretically supports all kafka versions. I will consider this issue, such as using the general jmx code as the backup logic.

Yes, confirm to avoid different keys of JMX. I have found this problem on many services. The key formats of different versions of JMX are different.

tomsun28 · 2024-12-29T03:09:08Z

hi, I found that the difference between old and new is the currentObject.getKeyProperty("topic"). Can we customize the implementation by designing jmx template protocol instead of hard coding it?

you can find below. as the metrics Name, can we design a way or config to use the value of keyproperty as an metric value? so that user donot need hard code.

tomsun28 · 2024-12-29T03:13:05Z

Of course, your design is also universal, for other custom metrics that cannot be configured in the protocol. We recommend that support custom configuration in the template first, then hard code the way

doveLin0818 · 2024-12-29T13:46:21Z

hi, I found that the difference between old and new is the currentObject.getKeyProperty("topic"). Can we customize the implementation by designing jmx template protocol instead of hard coding it?

you can find below. as the metrics Name, can we design a way or config to use the value of keyproperty as an metric value? so that user donot need hard code.

hi tom, @tomsun28
I currently cannot obtain keyproperty through aliasFields. If I want to obtain keyproperty, I still need hard coding to support it. However, this is only a small function for kafka. Other monitoring apps may not need keyproperty, so it is not appropriate to write it in a general template.

And when I tried to merge ReplicaManager monitoring information through aliasFields today, I also encountered some incompatibility issues. When designing the template kafka.server:type=ReplicaManager,name=*, this MBean, I found that aliasFields did not support me to merge because their attributes are all "Value". In fact, ReplicaManager has 12 indicators, and heartbeat currently only shows two. If users want to see these 12 indicators, they need to write 12 redundant copies.

So if it is necessary to fully utilize the capabilities of JMX, I currently have no good way to change it through templates, and I prefer customized development, because customization does not affect users to customize monitoring templates themselves, but only increases the workload of writing code.

doveLin0818 added the question Further information is requested label Dec 27, 2024

tomsun28 assigned doveLin0818 Dec 28, 2024

tomsun28 added the good first issue Good for newcomers label Dec 28, 2024

github-project-automation bot added this to Apache HertzBeat (Incubating) and hertzbeat-v1.0 Dec 28, 2024

github-project-automation bot moved this to To do in Apache HertzBeat (Incubating) Dec 28, 2024

doveLin0818 closed this as completed Jan 5, 2025

github-project-automation bot moved this from To do to Done in Apache HertzBeat (Incubating) Jan 5, 2025

github-project-automation bot moved this to Done in hertzbeat-v1.0 Jan 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] <title>Optimization suggestion: Implementation of refined JMX monitoring metrics #2908

[Question] <title>Optimization suggestion: Implementation of refined JMX monitoring metrics #2908

doveLin0818 commented Dec 27, 2024

zhangshenghang commented Dec 27, 2024

tomsun28 commented Dec 28, 2024

doveLin0818 commented Dec 28, 2024

doveLin0818 commented Dec 28, 2024 •

edited

Loading

tomsun28 commented Dec 29, 2024

zhangshenghang commented Dec 29, 2024

doveLin0818 commented Dec 29, 2024 •

edited

Loading

doveLin0818 commented Dec 29, 2024

zhangshenghang commented Dec 29, 2024

tomsun28 commented Dec 29, 2024

tomsun28 commented Dec 29, 2024

doveLin0818 commented Dec 29, 2024

[Question] <title>Optimization suggestion: Implementation of refined JMX monitoring metrics #2908

[Question] <title>Optimization suggestion: Implementation of refined JMX monitoring metrics #2908

Comments

doveLin0818 commented Dec 27, 2024

Question

zhangshenghang commented Dec 27, 2024

tomsun28 commented Dec 28, 2024

doveLin0818 commented Dec 28, 2024

doveLin0818 commented Dec 28, 2024 • edited Loading

tomsun28 commented Dec 29, 2024

zhangshenghang commented Dec 29, 2024

doveLin0818 commented Dec 29, 2024 • edited Loading

doveLin0818 commented Dec 29, 2024

zhangshenghang commented Dec 29, 2024

tomsun28 commented Dec 29, 2024

tomsun28 commented Dec 29, 2024

doveLin0818 commented Dec 29, 2024

doveLin0818 commented Dec 28, 2024 •

edited

Loading

doveLin0818 commented Dec 29, 2024 •

edited

Loading