-
-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent OOM kills - Isolated tohttp2
- 50 GB of RAM usage on MacBook, causing system OOM
#361
Comments
http2
- 50 GB of RAM usage on MacBook, caused system OOMhttp2
- 50 GB of RAM usage on MacBook, caused system OOM
Thanks for your report! I had some investigation and I found some weird huge memory consumption for Could you tell me about your server's technology? (language, library, etc..) |
The But essentially, the problem appears to happen when speaking to an AWS ALB with HTTP2. The local problem I had where it used 50 GB of RAM was potentially pointed at the Node.js app using HTTP2 or at the dotnet 8 Kestrel server using HTTP1.1 (I don't have HTTP2 enabled for the Kestrel server, but I can). The details are fuzzy on this because it only happened once so far. I could probably setup an AWS ALB route for you that just returns a constant response string and I bet the issue will happen with that. |
I've got a solid lead now. I was running the code under the debugger and looking heal profiles using pprof. What I noticed from normal operations is that memory accumulates steadily. The primary usage, I think, is coming from a vector that holds all the results, so that makes sense. Maybe that can be reduced if the detailed stats are not needed until the end, but maybe it cannot. Then I was thinking that Node.js, locally, never sends a GOAWAY message on an http2 socket while the ALB likely does send that after a number of requests, say, 10,000 or 100,000 per socket. I realized that the problem was likely when the sockets were gracefully closed by the server. To simulate what would happen in that case I just ctrl-c'd my node.js process and, sure enough, tl;dr - To Reproduce
|
Thank you. It's very helpful! I've succeeded in reproducing. I will work on this Saturday. |
- Partial fix for hatoo#361 - ONLY implemented for the `-z 10s` (work_until) case - TODO: - [ ] The futures are not aborted when the timer is hit, which will cause long running requests to delay the program exit - this is only due to a borrow/move problem that I cannot figure out - [ ] Implement for the non-`work_until` cases - [ ] Add a timeout to the TCP socket setup - this appears to be where some of the delay on shutdown is happening if the server closes after startup - [ ] Consider adding a delay to the reconnect loop so that it will not try to connect more than 1 time per second per concurrent connection - Without this the connect loop will spin at ~23k connect attempts/second for `-c 20`, for example - Test cases: - Start with the server not running at all (never connects) - Currently this will exit on time - IMPROVED: Previously this would attempt to connect once for each `-c`, fail, and immediately exit - IMPROVED: Currently this will repeatedly try to connect until the specified timeout expires, then it will exit - Start with the server running and leave it running - This works fine as before - Start with the server running, exit the server, then restart the server before the test completes - This initially makes requests - IMPROVED: Previously this would OOM even if the server restarted - IMPROVED: Currently this will reconnect and continue making requests if the server restarts
http2
- 50 GB of RAM usage on MacBook, caused system OOMhttp2
- 50 GB of RAM usage on MacBook, caused system OOM
http2
- 50 GB of RAM usage on MacBook, caused system OOMhttp2
- 50 GB of RAM usage on MacBook, causing system OOM
I have submitted a partial PR that operates mostly similarly to the way that this is handled for HTTP1.1: #363 I have a couple to-dos on the PR description. |
I think we mostly fixed this |
Intro
First off - awesome program! This solves the problems I have with
hey
where it just gets slower when the RTT times increase even though the remote service can support the throughput. The animated UI really helps understand what's happening without having to wait for the whole thing to finish, which I love. Thanks for this!Pairing
I can pair on this with you if you want. Google Meet or similar is fine. My email is on my profile.
Problem
http2
and/or may only happen withhttp2
oha
was using 50 GB of RAM!AWS CloudShell
runs for ~40 seconds each time before getting OOM killed--http2
is removed and-c
is adjusted to have the same total as-c * -p
with--http2
Killed
Terminations on CloudShellNote: this endpoint is not open to random IPs but if you want to test against it, it is located in
us-east-2
and I can add an IP to the security group for you to test with if you'd like.Killed at 43 seconds
Happens with
--no-tui
TooDoes NOT Happen without
--http2
with Same Worker CountNearly Final Memory
Problem Starts Memory - ~30 seconds of operation
Initial Memory - 2,000 RPS and Stable (0.4% memory usage)
The text was updated successfully, but these errors were encountered: