-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary corruption and control characters added in printed binaries #8685
Comments
Hi! Could you print the size of the binary? It will help us narrow it down to the printing routine. |
Adding |
Completely by random I today found and fixed an error #8686 in the alias analysis that could lead to corruption of binaries if you append to a binary coming from a literal map. Does the corruption go away if you compile with |
The corruption remains. Note that this binary is not appended to. It is parsed from data from the socket and kept from that point in a record, never to be modified. It may or may not be a sub-binary. Anyway it should be fairly trivial to reproduce with the steps I provided. It happens 100% of the time for us and multiple people have confirmed it. |
Right, then it's not #8686. I must confess that I didn't look at the reproducer, I saw this when I was about to leave for the day, and wanted to stop anyone wasting time trying to triage a bug which could be my fault. |
Please note that we've now merged a fix to the issue that was causing the process crash, which led to the corrupted line being logged. To reproduce now, you need to checkout a commit from before the fix: |
Note that the |
Thank you for all the additional info. We don't have a machine that could build RabbitMQ unfortunately. (For security reasons we're advised against cloning a repo without Ericsson's approval.) We would really appreciate it if you could isolate this issue more. This could very well be a bug in our code, and it's probably more severe than a printing problem. It's just difficult to track it down and fix it with the info we have now. |
There didn't use to be this constraint before. We've submitted bugs in the past that needed RabbitMQ to reproduce, and I've personally submitted Cowboy issues that needed cloning Cowboy or Cowlib to reproduce. These are some of the biggest Erlang projects so perhaps you could request a permanent exception from Ericsson if that's really an issue. We can try to simplify but it will take time and we don't know if we will succeed as this seems to be the only binary that's misbehaving over the entire RabbitMQ codebase. |
I was curious about the root cause and I wanted to take a look, but debugging the whole rabbitmq project is a bit daunting if you never did it before. Do you folks have any pointers about where the root cause can be? Also, how are you printing it using the |
Hello Jose, I will try your suggestion but I expect more of the same: the erroneous binary has been printed through logger[1] and also seen in crash logs, written to disk via Note that while at the moment we can only reproduce using RabbitMQ, there actually isn't that much of RabbitMQ in use to reproduce: sending one message to one priority queue. Because of how priority queues are implemented as multiple queues internally, all in a single process, and each internal queue has this binary in its state (again, all part of a bigger state for this one process), we think the problem is related to that. |
Thanks, @lhoguin. And you are right, if |
@lhoguin: Have you verified (using either tracing or a plain As I feel responsible for the compiler's destructive update optimizations (and not running on Ericsson hardware), I have tried to reproduce this using both OTP-26.1.2 (OTP-26 was the first version supporting destructive update of binaries) and OTP-27.0.1, but unfortunately I'm unable to build rabbitmq-server successfully. What really would help is if you could do a bisect and find out which OTP commit breaks your system |
@frej I will see if I can get to it before the end of the week. I am off for two weeks after that. Otherwise maybe @mkuratczyk will have time to try to bisect when he's back from vacation on Monday. Alternatively, you may want to try one of the pre-built RabbitMQs at https://github.com/rabbitmq/rabbitmq-server/releases/tag/v4.0.0-beta.3 |
@lhoguin before spending the time for a full bisect, just figuring out when the binary is corrupted (as I referred to earlier) is probably worthwhile.
Unfortunately pre-built blobs aren't very helpful when hunting down potential miscompilation :( |
Good news: turns out this is totally our "fault". It's not a bug at all - it's a 10+ years old intentional behaviour in a rarely used RabbitMQ feature. No-one currently working on RabbitMQ was aware of it and it didn't even occur to us, that this could be deliberate. :) Feel free to close this issue. I can't do it, since it was created by @lhoguin. For those interested: when a classic queues is declared with That's why we were seeing both the correct resource value ( |
Sigh. Sorry for the noise. |
Describe the bug
There seems to be some corruption related to binaries. When printing the binary or writing it to a file, we get either the correct
<<"cq">>
or the incorrects<<"cq", 0, 0>>
,<<"cq", 0, 1>>
or<<"cq", 0, 2>>
.To Reproduce
On one terminal:
On another:
Then open the logs file (path provided at the end of startup in the first terminal) and search for
{resource,
to find the bad binaries.Sometimes you will get the correct
{resource,<<"/">>,queue,<<"cq">>},
and sometimes something like{resource,<<"/">>,queue,<<99,113,0,2>>},
even (and especially) in the same crash report!Also can happen with another message that you can find searching
after unclean shutdown
where in this case the bad binary is formatted and we see control characters (assuming your editor prints them).Expected behavior
Just get
<<"cq">>
and not extra bytes. When printed, no control characters after the expected string.Affected versions
Tested on OTP-26.1.2 and OTP-27.0.1.
Additional context
Is it harmless?
The text was updated successfully, but these errors were encountered: