Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE #4701]Fix use tcp protocol client send message, it throw a DecoderException #4702

Merged
merged 7 commits into from
Jan 10, 2024

Conversation

mxsm
Copy link
Member

@mxsm mxsm commented Jan 1, 2024

Fixes #4701

Motivation

Explain the content here.
Explain why you want to make the changes and what problem you're trying to solve.

  • use tcp protocol client send message, it throw a DecoderException

Modifications

Describe the modifications you've done.

  • Refactored the Codec's code and fixed the current issue

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
  • If a feature is not applicable for documentation, explain why?
  • If a feature is not documented yet in this PR, please create a followup issue for adding the documentation

Copy link

codecov bot commented Jan 1, 2024

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (bd2aeb9) 17.39% compared to head (32656bc) 17.46%.
Report is 12 commits behind head on master.

Files Patch % Lines
...che/eventmesh/common/protocol/tcp/codec/Codec.java 65.21% 6 Missing and 2 partials ⚠️
.../apache/eventmesh/client/tcp/common/TcpClient.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4702      +/-   ##
============================================
+ Coverage     17.39%   17.46%   +0.07%     
- Complexity     1759     1770      +11     
============================================
  Files           797      797              
  Lines         29850    29850              
  Branches       2579     2565      -14     
============================================
+ Hits           5192     5213      +21     
+ Misses        24177    24156      -21     
  Partials        481      481              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pandaapo
Copy link
Member

pandaapo commented Jan 1, 2024

May I ask if this bug is caused by this PR #4678? And what is the specific cause of this bug?

@mxsm
Copy link
Member Author

mxsm commented Jan 1, 2024

May I ask if this bug is caused by this PR #4678? And what is the specific cause of this bug?

Yes

@pandaapo
Copy link
Member

pandaapo commented Jan 1, 2024

May I ask if this bug is caused by this PR #4678? And what is the specific cause of this bug?

Yes

And what is the specific reason of this bug?

@mxsm
Copy link
Member Author

mxsm commented Jan 1, 2024

May I ask if this bug is caused by this PR #4678? And what is the specific cause of this bug?

Yes

And what is the specific reason of this bug?

When the TCP-sent data packet is not sufficient to construct a Package, it will result in an error. If using a custom approach, you need to handle the issues of packet fragmentation and incomplete packets. The previously merged TCP enhancement pull request did not address this, and there would be no issue if packet fragmentation and incomplete packets did not occur.

@pandaapo
Copy link
Member

pandaapo commented Jan 1, 2024

When the TCP-sent data packet is not sufficient to construct a Package, it will result in an error. If using a custom approach, you need to handle the issues of packet fragmentation and incomplete packets. The previously merged TCP enhancement pull request did not address this, and there would be no issue if packet fragmentation and incomplete packets did not occur.

Thanks for your explanation. So your PR essentially moves the logic of Codec.EnCoder and Codec.Decoder into Codec. This raises a new question for me: in the data packet processing for message stickiness and message truncation in the encoding and decoding logic of Codec, what is the difference between calling the internal class object and handling it directly?

谢谢您的说明。所以您的PR相当于将Codec.EnCoderCodec.Decoder的逻辑移到了Codec中。这让我产生了新的疑问,在Codec编解码逻辑中,调用内部类对象处理数据粘包、半包问题的方式和直接处理的方式,有什么不同?

@mxsm
Copy link
Member Author

mxsm commented Jan 1, 2024

When the TCP-sent data packet is not sufficient to construct a Package, it will result in an error. If using a custom approach, you need to handle the issues of packet fragmentation and incomplete packets. The previously merged TCP enhancement pull request did not address this, and there would be no issue if packet fragmentation and incomplete packets did not occur.

Thanks for your explanation. So your PR essentially moves the logic of Codec.EnCoder and Codec.Decoder into Codec. This raises a new question for me: in the data packet processing for message stickiness and message truncation in the encoding and decoding logic of Codec, what is the difference between calling the internal class object and handling it directly?

谢谢您的说明。所以您的PR相当于将Codec.EnCoderCodec.Decoder的逻辑移到了Codec中。这让我产生了新的疑问,在Codec编解码逻辑中,调用内部类对象处理数据粘包、半包问题的方式和直接处理的方式,有什么不同?

The functional code of encoding has not changed; only the positions were replaced. The decoding code has been repaired to address the issues of packet fragmentation and incomplete packets. It's not just a simple code move. You can compare it with the previous code.

@pandaapo
Copy link
Member

pandaapo commented Jan 1, 2024

The functional code of encoding has not changed; only the positions were replaced. The decoding code has been repaired to address the issues of packet fragmentation and incomplete packets. It's not just a simple code move. You can compare it with the previous code.

So this bug is not actually introduced by PR #4678, but rather a bug that existed before, right?

if (in == null) {
return;
}
if (in.readableBytes() < CONSTANT_MAGIC_FLAG.length + VERSION.length + 4 + 4) {
Copy link
Member

@pandaapo pandaapo Jan 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these 2 magic numbers be changed to constants to indicate that they are the length of the package length value and the length of the header length value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these magic numbers be changed to constants to indicate that they are the length of the package length value and the length of the header length value?

These magic numbers are actually part of the protocol. I recommend keeping them. You can think of them as a special marker for EventMesh's custom protocol, similar to the magic number in Java class files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can think of them as a special marker for EventMesh's custom protocol

Actually, I think CONSTANT_MAGIC_FLAG(= serializeBytes("EventMesh")) is also a special marker for EventMesh's custom protocol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these 2 magic numbers be changed to constants to indicate that they are the length of the package length value and the length of the header length value?

Using constants can improve readability and maintainability. But in this case, the numbers are integral to the protocol's structure, not arbitrary choices. It is better to stick with protocol-defined markers, otherwise it could lead to ambiguity or unexpected behavior.

validateFlagAndVersion(flagBytes, versionBytes, ctx);
final int packageLength = in.readInt();
final int headerLength = in.readInt();
if (in.readableBytes() < packageLength - 13) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change 13 to a constant to indicate that it is the length of CONSTANT_MAGIC_FLAG plus the length of VERSION?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change 13 to a constant to indicate that it is the length of CONSTANT_MAGIC_FLAG plus the length of VERSION?

Of course

@mxsm
Copy link
Member Author

mxsm commented Jan 1, 2024

The functional code of encoding has not changed; only the positions were replaced. The decoding code has been repaired to address the issues of packet fragmentation and incomplete packets. It's not just a simple code move. You can compare it with the previous code.

So this bug is not actually introduced by PR #4678, but rather a bug that existed before, right?

Yes, Reverting to the previous version is without issues.

@pandaapo
Copy link
Member

pandaapo commented Jan 1, 2024

The functional code of encoding has not changed; only the positions were replaced. The decoding code has been repaired to address the issues of packet fragmentation and incomplete packets. It's not just a simple code move. You can compare it with the previous code.

So this bug is not actually introduced by PR #4678, but rather a bug that existed before, right?

Yes, Reverting to the previous version is without issues.

I'm confused. #4678 did not change the decoding logic, only changed the direct decoding logic to call the decoding logic with an internal class object. Why does #4678 cause this bug?

@mxsm
Copy link
Member Author

mxsm commented Jan 1, 2024

image
The ReplayingDecoder you marked in the screenshot contains processing logic. After processing with ReplayingDecoder, it then calls the decode method as shown in the previous image. You might want to focus on the parent class of ReplayingDecoder, which is ByteToMessageDecoder, and take a look at the channelRead method. If you call it directly, you skip the steps of handling partial packets and packet concatenation. However, if your data length is just right, there's no issue. Problems may arise when the data length is not sufficient to construct a custom Package, or it can be parsed into multiple packages, or it is greater than one but cannot be built into an integer.
In summary, direct calling skips handling packet fragmentation and incomplete packets.

@pandaapo
Copy link
Member

pandaapo commented Jan 2, 2024

Thank you for your explanation. Since the version before PR #4678 not only ensure accuracy, but also make decoding logic clearer and simpler, would it be better to revert to the previous version without adding extra decoding complexity?

@mxsm
Copy link
Member Author

mxsm commented Jan 2, 2024

Thank you for your explanation. Since the version before PR #4678 not only ensure accuracy, but also make decoding logic clearer and simpler, would it be better to revert to the previous version without adding extra decoding complexity?

It is also possible to roll back to the previous version. This is based on the current transformation. We can see the opinions of others in the community. The previous implementation was to implement decoding and encoding separately using two decoders and two encoders. Now, decoding and encoding are implemented through Codec, which is essentially no different from the previous implementation. It is just the coding method and implementation. The previous implementation was processed through Netty's own internal generic implementation. The current PR is handled by the developer's customization.

@pandaapo
Copy link
Member

pandaapo commented Jan 2, 2024

Thank you for your explanation. Since the version before PR #4678 not only ensure accuracy, but also make decoding logic clearer and simpler, would it be better to revert to the previous version without adding extra decoding complexity?

It is also possible to roll back to the previous version. This is based on the current transformation. We can see the opinions of others in the community. The previous implementation was to implement decoding and encoding separately using two decoders and two encoders. Now, decoding and encoding are implemented through Codec, which is essentially no different from the previous implementation. It is just the coding method and implementation. The previous implementation was processed through Netty's own internal generic implementation. The current PR is handled by the developer's customization.

Compared to the previous version before PR #4678, the logic in this PR still adds some complexity, such as embedding the following decoding logic on the basis of previous code, while the previous version was not affected by these details of EventMesh custom protocol.

...
if (in.readableBytes() < prefixLength + 4 + 4) {
  // Not enough data to read the package length and header length
  return;
}
...
if (in.readableBytes() < packageLength - prefixLength) {
  // Not enough data yet, reset the reader index and wait for more data
  in.resetReaderIndex();
  return;
}
...

I completely agree with what you said: we can see the opinions (modify like this or revert) of others in the community.

@karsonto
Copy link
Contributor

karsonto commented Jan 5, 2024

Can we use Netty's built-in LengthFieldBasedFrameDecoder to solve the problem of sticky or half packet?

@pandaapo
Copy link
Member

pandaapo commented Jan 6, 2024

I think LengthFieldBasedFrameDecoder can not solve this problem. length = CONSTANT_MAGIC_FLAG.length + VERSION.length + headerLength + bodyLength; image The actual size of the entire package is four bytes larger than the length. It does not conform to the decoding scenario of the LengthFieldBasedFrameDecoder.

@mxsm

It seems not to be the case. For example, one of the constructors is

LengthFieldBasedFrameDecoder(
  int maxFrameLength,
  int lengthFieldOffset,
  int lengthFieldLength,
  int lengthAdjustment,
  int initialBytesToStrip)

.
If you define it like this:

new LengthFieldBasedFrameDecoder(
  /* maximum packet length */
  max frame length, 
  /* the offset of "package length" field, i,e CONSTANT_MAGIC_FLAG.length + VERSION.length */
  13,
  /** 
   * the length of "package length" field, which should be 4
   */
  4,
  /** 
   * adjustment to the length field value, i.e. adding "'package length' field's length + 'Header length' field's length - 
   * CONSTANT_MAGIC_FLAG.length - VERSION.length - 'package length' field's length" to "package length" field value.
   * It is equivalent to padding the length field value to the actual length 
   * of the entire data packet and then subtracting the first 17 bytes from it.
   * Ultimately, this gives the length of the 'Header length' field, the 'Header' field, and the 'body' field
   */
  -9,
  /**
   * strip off the first 21 bytes of the decoded frame, i.e. "CONSTANT_MAGIC_FLAG.length + VERSION.length +
   *  'package length' field's length + 'Header length' field's length"
   */
  21)

, the resulting decoded ByteBuf should be the concatenation of Header and body.


好像并非如此。比如构造函数之一是

LengthFieldBasedFrameDecoder(
  int maxFrameLength,
  int lengthFieldOffset,
  int lengthFieldLength,
  int lengthAdjustment,
  int initialBytesToStrip)

,如果您这样定义

new LengthFieldBasedFrameDecoder(
  /* 数据包长度上限 */
  max frame length,
  /* 长度字段即“package length”的偏移量:CONSTANT_MAGIC_FLAG.length + VERSION.length */
  13,
  /**
   * 长度字段即“package length”的长度:"package length" field's length
   */
  4, 
  /**
   * 对长度字段值的补长,即给长度字段值加上“'package length' field's length + 'Header length' field's length - 
   * CONSTANT_MAGIC_FLAG.length - VERSION.length - 'package length' field's length”,
   * 相当于将长度字段值补成了整个数据包的真正长度,然后减去前17个字节,
   * 最终得到的是“Header length”字段、“Header”字段和“body”字段的长度
   */
  -9, 
  /**
   * 剥离掉解码帧的最前面21个字节,即“CONSTANT_MAGIC_FLAG.length + VERSION.length + 
   * 'package length' field's length + 'Header length' field's length”
   */
  21)

,最终的解码出来的ByteBuf应该就是Header + body。

@mxsm
Copy link
Member Author

mxsm commented Jan 6, 2024

Can we use Netty's built-in LengthFieldBasedFrameDecoder to solve the problem of sticky or half packet?

@karsonto I has resovled. please review the code

@@ -161,7 +160,9 @@ private Header parseHeader(ByteBuf in, int headerLength) throws JsonProcessingEx
}
final byte[] headerData = new byte[headerLength];
in.readBytes(headerData);
LogUtils.debug(log, "Decode headerJson={}", deserializeBytes(headerData));
if (log.isDebugEnabled()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding these modifications to the log printing, you don't have to address them. PR #4698 is currently handling these issues uniformly.

关于日志打印的这些修改,您可以不用处理,PR #4698正在统一处理这些问题。

public static class Decoder extends LengthFieldBasedFrameDecoder {

public Decoder() {
super(FRAME_MAX_LENGTH, 13, 4, -9, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion still pertains to the hardcoding of these numbers: 13, 4, -9.
To enhance code maintainability, I believe either defining two more constants like PACKAGE_LENGTH_FIELD_LENGTH (= 4) and HEADER_LENGTH_FIELD_LENGTH (= 4), and then entirely replacing these numbers with constants or variables, or using comments to explain these numbers would be beneficial. I prefer the first approach.

我的建议依然是关于数字的这种硬编码:13、4、-9。
为了让代码易于维护,我觉得要么再定义两个常量比如PACKAGE_LENGTH_FIELD_LENGTH (= 4)、HEADER_LENGTH_FIELD_LENGTH (= 4),然后完全用常量或变量来代替这些数字,要么用注释来说明这些数字。我倾向于第一种方式。

@mxsm mxsm requested a review from pandaapo January 7, 2024 16:09
@karsonto
Copy link
Contributor

karsonto commented Jan 8, 2024

Can we use Netty's built-in LengthFieldBasedFrameDecoder to solve the problem of sticky or half packet?

@karsonto I has resovled. please review the code

LGTM

Copy link
Member

@harshithasudhakar harshithasudhakar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -171,7 +171,7 @@ private Object parseBody(ByteBuf in, Header header, int bodyLength) throws JsonP
}
final byte[] bodyData = new byte[bodyLength];
in.readBytes(bodyData);
LogUtils.debug(log, "Decode bodyJson={}", deserializeBytes(bodyData));
LogUtils.debug(log, "Decode headerJson={}", deserializeBytes(bodyData));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bodyJson here instead of headerJson.

@@ -223,7 +223,9 @@ private static Object deserializeBody(String bodyJsonString, Header header) thro
case REDIRECT_TO_CLIENT:
return JsonUtils.parseObject(bodyJsonString, RedirectInfo.class);
default:
LogUtils.warn(log, "Invalidate TCP command: {}", command);
if (log.isWarnEnabled()) {
log.warn("Invalidate TCP command: {}", command);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same situation as mentioned earlier: #4702 (comment)

和前面同样的情况:#4702 (comment)

public static class Decoder extends LengthFieldBasedFrameDecoder {

public Decoder() {
super(FRAME_MAX_LENGTH, PREFIX_LENGTH, Integer.BYTES, -9, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Integer.BYTES and -9 are still not clear enough for developers. Integer.BYTES: The lengths of version field, package length field, and header length field are all Integer.BYTES. -9: Hard-coded.

If you disagree with replacing them with expressions composed of more descriptive constant or variable names, as mentioned earlier, you can provide detailed comments to explain the usage of Integer.BYTES and -9 here.

我认为Integer.BYTES-9对开发者来说依旧不够清晰。Integer.BYTES:Version字段, package length字段, header length字段的长度都是Integer.BYTES-9:硬编码。
如果您不赞同将它们换成有助理解的常量名或变量名组成的表达式,正如我之前所说,您可以用详细的注释来说明这里的Integer.BYTES-9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxsm I'm curious to understand the thought processes behind choosing a hard-coded value over a dynamic value. I'm wondering about potential future flexibility. Have we considered scenarios where the header structure might evolve, and how a dynamic approach might handle those cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxsm I'm curious to understand the thought processes behind choosing a hard-coded value over a dynamic value. I'm wondering about potential future flexibility. Have we considered scenarios where the header structure might evolve, and how a dynamic approach might handle those cases?

@harshithasudhakar For a custom protocol, once it is determined, it will not undergo significant changes for a considerable period of time. This is because any changes to the protocol structure may introduce compatibility issues, potentially requiring existing users to perform a comprehensive upgrade to resolve them. In such cases, a major version upgrade, such as upgrading to version 2.x, may be necessary to undertake protocol modifications. Once the protocol is modified, it will be necessary to reprocess the encoding and decoding of the protocol. Therefore, I believe there is limited need for dynamic handling of these issues. Dynamically setting values increases the complexity of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for clarifying!

@mxsm mxsm merged commit 1f0bf0c into apache:master Jan 10, 2024
13 checks passed
@mxsm mxsm deleted the bug-mxsm-4701 branch January 16, 2024 15:12
@xwm1992 xwm1992 added this to the 1.11.0 milestone Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Use tcp protocol client send message, it throw a DecoderException
5 participants