-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REQMOD stuck when adapted request (body) is not forwarded #1966
REQMOD stuck when adapted request (body) is not forwarded #1966
Conversation
Squid forwards request bodies using BodyPipe objects. A BodyPipe object has two associated agents: producer and consumer. Those agents are set independently, at different processing stages. If BodyPipe consumer is not set, the producer may get stuck waiting for BodyPipe buffer space. When producer creates a BodyPipe, it effectively relies on some code somewhere to register a consumer (or declare a registration failure), but automatically tracking that expectation fulfillment is impractical. For REQMOD transactions involving adapted request bodies, including ICAP 204 transactions, Client::startRequestBodyFlow() sets body consumer. If that method is not called, there will be no consumer, and REQMOD may get stuck. Many `if` statements can block request forwarding altogether or block a being-forwarded request from reaching that method. For example, adapted_http_access and miss_access denials block request forwarding. Without REQMOD, request processing can probably get stuck for similar lack-of-consumer reasons, but regular requests ought to be killed by various I/O or forwarding timeouts. There are no such timeouts for those REQMOD transactions that are only waiting for request body consumer to clear adapted BodyPipe space (e.g., after all ICAP 204 I/Os are over). Relying on timeouts is also very inefficient. For a `mgr:mem` observer, stuck REQMOD transactions look like a ModXact memory leak. A `mgr:jobs` report shows ModXact jobs with RBS(1) status. ---- Cherry-picked SQUID-1030-ModXact-stuck-without-consumer-bag65 changes as of commit 37cde62014bcc985e05eb6e985365417838564ba (effectively).
The bug was discovered in v6-based code. It can be reproduced in a lab using a 204-returning ICAP service by filling request BodyPipe buffer while denying that request forwarding using adapted_http_access or miss_access. The fix this PR is based on was tested using the affected v6-based deployment. Implementing the added TODO requires significant and risky changes outside this bug fix scope. |
Squid forwards request bodies using BodyPipe objects. A BodyPipe object has two associated agents: producer and consumer. Those agents are set independently, at different processing stages. If BodyPipe consumer is not set, the producer may get stuck waiting for BodyPipe buffer space. When producer creates a BodyPipe, it effectively relies on some code somewhere to register a consumer (or declare a registration failure), but automatically tracking that expectation fulfillment is impractical. For REQMOD transactions involving adapted request bodies, including ICAP 204 transactions, Client::startRequestBodyFlow() sets body consumer. If that method is not called, there will be no consumer, and REQMOD may get stuck. Many `if` statements can block request forwarding altogether or block a being-forwarded request from reaching that method. For example, adapted_http_access and miss_access denials block request forwarding. Without REQMOD, request processing can probably get stuck for similar lack-of-consumer reasons, but regular requests ought to be killed by various I/O or forwarding timeouts. There are no such timeouts for those REQMOD transactions that are only waiting for request body consumer to clear adapted BodyPipe space (e.g., after all ICAP 204 I/Os are over). Relying on timeouts is also very inefficient. For a `mgr:mem` observer, stuck REQMOD transactions look like a ModXact memory leak. A `mgr:jobs` report shows ModXact jobs with RBS(1) status.
@rousskov Anubis' reason for M-failed-staging-other is
Any idea what could it be? Nothing has changed from past practice here |
Trying a rebase to remove the requirement for the CodeQL check and clear the slate on pull request checks |
@eduard-bagdasaryan, please assist @kinkie in fixing CI. We have a bunch of PRs that are ready for merging, but Anubis has started reporting failures for their staged commits. I have not checked why. I do not know whether the current Anubis configuration and the currently running Anubis version are correct.
@kinkie, I agree that relevant past practices remain unchanged, but let's focus on something we can fix. I bet that recent CI problems were triggered by changes in GitHub repository configuration and/or Anubis environment or their side effects. According to your recent comments, GitHub configuration and Anubis configuration have been changed (at least). I do not know what (side) effects of those changes were, but I hope that, with Eduard's help, CI functionality will be restored. |
These errors are documented here, however these statuses ARE set by Github actions which are a Github app. |
Nice find, thank you!
I am not sure this assertion is true: IIRC, Anubis copies some of the PR status checks to the staged commit. While nothing has changed in Anubis recently, it is possible that GitHub started to validate status check sources, either because of GitHub internal improvements or because of our GitHub configuration changes (both are unknown to me). For copied statuses, the source app might be Anubis! I do not know whether Anubis copies the statuses in question (e.g., |
Here is the list of 387bcd6 statuses copied by Anubis:
The error message generated by Anubis happens during merge and lists the same statuses:
I have a feeling that since we have changed nothing, something like that happened. |
It is possible to instruct github to disregard the origin of the checks; if the PR currently being assessed show this behaviour, I'll try that |
Perhaps we need to adjust the protected branch (master) settings, e.g., add the "Anubis" source (if it is there among other checkboxes) or set it to "any source". I cannot do it myself since I do not have permissions. |
PR #1972 landed nicely. I suppose there is some stale state somewhere. Let's try to force things thorugh. Worst case we can redo the PRs |
Squid forwards request bodies using BodyPipe objects. A BodyPipe object has two associated agents: producer and consumer. Those agents are set independently, at different processing stages. If BodyPipe consumer is not set, the producer may get stuck waiting for BodyPipe buffer space. When producer creates a BodyPipe, it effectively relies on some code somewhere to register a consumer (or declare a registration failure), but automatically tracking that expectation fulfillment is impractical For REQMOD transactions involving adapted request bodies, including ICAP 204 transactions, Client::startRequestBodyFlow() sets body consumer. If that method is not called, there will be no consumer, and REQMOD may get stuck. Many `if` statements can block request forwarding altogether or block a being-forwarded request from reaching that method. For example, adapted_http_access and miss_access denials block request forwarding Without REQMOD, request processing can probably get stuck for similar lack-of-consumer reasons, but regular requests ought to be killed by various I/O or forwarding timeouts. There are no such timeouts for those REQMOD transactions that are only waiting for request body consumer to clear adapted BodyPipe space (e.g., after all ICAP 204 I/Os are over). Relying on timeouts is also very inefficient For a `mgr:mem` observer, stuck REQMOD transactions look like a ModXact memory leak. A `mgr:jobs` report shows ModXact jobs with RBS(1) status
Squid forwards request bodies using BodyPipe objects. A BodyPipe object
has two associated agents: producer and consumer. Those agents are set
independently, at different processing stages. If BodyPipe consumer is
not set, the producer may get stuck waiting for BodyPipe buffer space.
When producer creates a BodyPipe, it effectively relies on some code
somewhere to register a consumer (or declare a registration failure),
but automatically tracking that expectation fulfillment is impractical
For REQMOD transactions involving adapted request bodies, including ICAP
204 transactions, Client::startRequestBodyFlow() sets body consumer. If
that method is not called, there will be no consumer, and REQMOD may get
stuck. Many
if
statements can block request forwarding altogether orblock a being-forwarded request from reaching that method. For example,
adapted_http_access and miss_access denials block request forwarding
Without REQMOD, request processing can probably get stuck for similar
lack-of-consumer reasons, but regular requests ought to be killed by
various I/O or forwarding timeouts. There are no such timeouts for those
REQMOD transactions that are only waiting for request body consumer to
clear adapted BodyPipe space (e.g., after all ICAP 204 I/Os are over).
Relying on timeouts is also very inefficient
For a
mgr:mem
observer, stuck REQMOD transactions look like a ModXactmemory leak. A
mgr:jobs
report shows ModXact jobs with RBS(1) status