-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SquadRcon.onData Crashing Recently #200
Comments
Is this still an issue? |
Yes, though its rare. |
Resolved on #258 |
I've root caused this, and the issue mentioned in #249. The essentials are this: We are using the 'endpackets' to fire off the callbacks pending for each execute request, Line 80 in 164bad3
These endpackets are generated here: Line 323 in 164bad3
As described in: https://developer.valvesoftware.com/wiki/Source_RCON_Protocol#Multiple-packet_Responses The issue lies in part in here: Line 119 in 164bad3
We expect to receive a series of two packets, One length 10 with no extra data, one with extra data ( We attempt to discard this empty data here: Line 135 in 164bad3
However, getting to that check requires that we pass multiple conditions we can sometimes fail We DO not receive packets from nodejs, we receive chunks of a stream. This at times means that the packets can span two or more onData events. https://nodejs.org/api/net.html#class-netsocket Given a sequence such as the following:
We will split out two "endpackets" and leave the junk data In fact, this can happen in several cases where we do not actually receive the rest of the 2nd packet in the current chunk we are processing. In the above case, we split off packet one:
And fulfill its callback, then
And attempt to fulfill a second callback Followed by prepending the left over data onto the next packet This leads to the issue in This was obvious when all of our junk packets had a 'size' of 256, or As for the issue posted here, If the callBackQueue size is 1, and we attempt to fulfill a second callback, we get the issues as raised here
We shift an empty array, get back If the callback queue is is 2, we instead send an empty response to the waiting callback
From this point forward, again, we are now out of sync. We will fail to match data on all incoming requests when we process the bodies back into results. We can also deadlock if the Squad restarts while we have pending callbacks, as again, we do not clear these and are blindly assuming the I've checked multiple logs since finding a way to reproduce this issue and have never seen anything related to Line 285 in 164bad3
As for the window for these errors, and why they are rare: We need to fail the check here: Line 129 in 164bad3
If we have say, 15, or 16 bytes, we will pass through to Line 149 in 164bad3
Where we cut the packet, and leave the extra 1-2 bytes in the buffer. We then fail the "while" check here: Line 119 in 164bad3
|
I have a fix in progress for this issue, however the wider case is it still won't prevent all deadlocks, and may actually cause more when squad restarts while we have pending requests. IE, we have sent some requests, squad crashes or gets restarted in the seconds before we have gotten back a full response. |
Fixed |
Description of Issue
Crashed twice in the past two days on the following callstack; One immediately after a server crash, so I assumed it was related to a missing packet from the crash.
However, this morning we crashed shortly after hitting live during Fallujah Skimish v2
Errors or Screenshots of Issue
file:///opt/SquadJS/core/rcon.js:80
this.responseCallbackQueue.shift()(
^
TypeError: this.responseCallbackQueue.shift(...) is not a function
at SquadRcon.onData (file:///opt/SquadJS/core/rcon.js:80:49)
at Socket.emit (events.js:315:20)
at addChunk (internal/streams/readable.js:309:12)
at readableAddChunk (internal/streams/readable.js:284:9)
at Socket.Readable.push (internal/streams/readable.js:223:10)
at TCP.onStreamRead (internal/stream_base_commons.js:188:23)
Squad Information
If potentially relevant, please provide regarding the state of the Squad server at the time of error, e.g. the current layer.
System Information
The text was updated successfully, but these errors were encountered: