Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SquadRcon.onData Crashing Recently #200

Closed
ect0s opened this issue Jul 4, 2021 · 7 comments
Closed

SquadRcon.onData Crashing Recently #200

ect0s opened this issue Jul 4, 2021 · 7 comments
Labels
core bug Bug related to the core SquadJS API

Comments

@ect0s
Copy link
Contributor

ect0s commented Jul 4, 2021

Description of Issue

Crashed twice in the past two days on the following callstack; One immediately after a server crash, so I assumed it was related to a missing packet from the crash.

However, this morning we crashed shortly after hitting live during Fallujah Skimish v2

Errors or Screenshots of Issue

file:///opt/SquadJS/core/rcon.js:80
this.responseCallbackQueue.shift()(
^

TypeError: this.responseCallbackQueue.shift(...) is not a function
at SquadRcon.onData (file:///opt/SquadJS/core/rcon.js:80:49)
at Socket.emit (events.js:315:20)
at addChunk (internal/streams/readable.js:309:12)
at readableAddChunk (internal/streams/readable.js:284:9)
at Socket.Readable.push (internal/streams/readable.js:223:10)
at TCP.onStreamRead (internal/stream_base_commons.js:188:23)

Squad Information

  • SquadJS Version: v3
  • Squad Version: 2.9

If potentially relevant, please provide regarding the state of the Squad server at the time of error, e.g. the current layer.

System Information

  • Operating System: Linux
  • NodeJS Version: Node 14
  • Yarn Version:
@ect0s ect0s added the core bug Bug related to the core SquadJS API label Jul 4, 2021
@enderneuer
Copy link

I'm having this same issue, however off of Windows.
Did you ever solve this yourself?

image

@werewolfboy13
Copy link
Collaborator

Is this still an issue?

@ect0s
Copy link
Contributor Author

ect0s commented Jan 7, 2023

Is this still an issue?

Yes, though its rare.

@werewolfboy13 werewolfboy13 linked a pull request Jan 13, 2023 that will close this issue
@werewolfboy13
Copy link
Collaborator

Resolved on #258

@ect0s
Copy link
Contributor Author

ect0s commented Jan 13, 2023

I've root caused this, and the issue mentioned in

#249.

The essentials are this:

We are using the 'endpackets' to fire off the callbacks pending for each execute request,

this.responseCallbackQueue.shift()(

These endpackets are generated here:

if (type !== SERVERDATA_AUTH) {

As described in:

https://developer.valvesoftware.com/wiki/Source_RCON_Protocol#Multiple-packet_Responses

The issue lies in part in here:

while (this.incomingData.byteLength >= 4) {

We expect to receive a series of two packets,

One length 10 with no extra data, one with extra data (0x0000 0001 0000 0000).

We attempt to discard this empty data here:

if (decodedProbePacket.body === '\x00\x00\x00\x01\x00\x00\x00') {

However, getting to that check requires that we pass multiple conditions we can sometimes fail

We DO not receive packets from nodejs, we receive chunks of a stream. This at times means that the packets can span two or more onData events.

https://nodejs.org/api/net.html#class-netsocket

Given a sequence such as the following:

0a 00 00 00 02 00 00 00 00 00 00 00 00 00 0a 00 00 00 02 00 00 00 00 00 00 00 00 00 | 00 01

We will split out two "endpackets" and leave the junk data 00 01 in the buffer.

In fact, this can happen in several cases where we do not actually receive the rest of the 2nd packet in the current chunk we are processing.

In the above case, we split off packet one:

0a 00 00 00 02 00 00 00 00 00 00 00 00 00 0a 00 00 00 02 00 00 00 00 00 00 00 00 00 | 00 01

And fulfill its callback,

then

0a 00 00 00 02 00 00 00 00 00 00 00 00 00 | 00 01

And attempt to fulfill a second callback

Followed by prepending the left over data onto the next packet 00 01

This leads to the issue in
#249, where we have a junk packet. And from this point on, all our packets are likely to be junk because we are now splitting the data incorrectly.

This was obvious when all of our junk packets had a 'size' of 256, or 00 01 00 00

As for the issue posted here,

If the callBackQueue size is 1, and we attempt to fulfill a second callback, we get the issues as raised here

TypeError: this.responseCallbackQueue.shift(...) is not a function.

We shift an empty array, get back undefined (not a function), and cannot invoke it.

If the callback queue is is 2, we instead send an empty response to the waiting callback

"[SquadServer][1] Failed to update layer information. TypeError: Cannot read property '1' of null
    at SquadRcon.getNextMap 

From this point forward, again, we are now out of sync.

We will fail to match data on all incoming requests when we process the bodies back into results.

We can also deadlock if the Squad restarts while we have pending callbacks, as again, we do not clear these and are blindly assuming the incomingData buffer and incomingResponse buffers are correct.

I've checked multiple logs since finding a way to reproduce this issue and have never seen anything related to

const onError = (err) => {

As for the window for these errors, and why they are rare:

We need to fail the check here:

if (size === 10 && this.incomingData.byteLength >= probeSize) {

If we have say, 15, or 16 bytes, we will pass through to

const packet = this.incomingData.slice(0, packetSize);

Where we cut the packet, and leave the extra 1-2 bytes in the buffer.

We then fail the "while" check here:

while (this.incomingData.byteLength >= 4) {

@ect0s
Copy link
Contributor Author

ect0s commented Jan 13, 2023

I have a fix in progress for this issue, however the wider case is it still won't prevent all deadlocks, and may actually cause more when squad restarts while we have pending requests.

IE, we have sent some requests, squad crashes or gets restarted in the seconds before we have gotten back a full response.

@werewolfboy13
Copy link
Collaborator

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core bug Bug related to the core SquadJS API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants