-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bootstrap: Build jemalloc with support for 64K pages #135081
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
bootstrap: Build jemalloc with support for 64K pages By default, jemalloc is built to only support the same page size as the host machine. Set an env variable so that jemalloc is built with support for page sizes up to 64K regardless of the host machine. r? `@Kobzol` Resolves rust-lang#134563 Potentially resolves rust-lang#133748 (needs verification) ---- Results from local rustc-perf testing below, within 0.5% on every metric except max-rss. AArch64: ![Screenshot 2025-01-03 at 5 53 13 pm](https://github.com/user-attachments/assets/71705c59-7d7b-4753-a184-8c784233e603) x86_64: ![Screenshot 2025-01-03 at 5 54 16 pm](https://github.com/user-attachments/assets/ea28aded-3b90-43f4-a965-b081b07b95ab)
Is it remotely possible to have some kind of regression test for this? |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
If we see a similar max-rss regression, I wonder if it would be worth comparing against a mimalloc compilation -- if that doesn't have the same 64k/4k page size issues, iirc the max-rss regression from switching to it was of similar magnitude? |
I'm just running a mimalloc benchmark here :) If we see max RSS regressions here, I would just do the 64 KiB switch only on aarch64. |
Doesn't that "unfairly" hurt non-64k page sizes on other platforms (i.e., those with a 4k page size but still aarch64)? Also, should we try 16k pages as the limit in jemalloc (I guess that's what we see in practice if I'm reading the issue reports right?) |
If we do that we'll end up with similar reports for 64K page systems soon enough, they're just not very common at the moment. Ubuntu is already shipping 64K page AArch64 server images for instance. |
It probably does, but if I understood the jemalloc issue, it's kind of how jemalloc is set up to work. So the decision is whether to support all page sizes or to have slightly better perf for 4k pages. For aarch64, I'd personally use the general solution, but for x64, I'd stick with 4k, since we haven't received complaints about that so far. But we'll see how perf. turns out. |
Yeah that's the sensible approach I'd say. Iiuc x86 doesn't do 4K+ pages so this may not be an issue for x86 in the same way. |
Finished benchmarking commit (4402a4e): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 4.9%, secondary 4.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -1.0%, secondary 1.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 762.379s -> 759.643s (-0.36%) |
The binary size wins feel... surprising? Why is a larger page size so much smaller? In any case, the regressions feel pretty reasonable -- from what I can tell they're mostly in smaller rss workloads, where the jump is likely more significant. I also wonder if we're seeing poor usage of the allocator's internal storage due to rustc_arena's preference for 4kb pages ( rust/compiler/rustc_arena/src/lib.rs Line 111 in 8d2c06d
In any case, I'd be happy to land this personally; mimalloc looks worse to me in the results just gathered. r=me but will leave this for a bit in case others have thoughts :) |
Regarding the binary size: I did a few experiments with jemalloc and found out that some global variables (like Could you try to bump the page size of Other than that, the results aren't terrible. Messing with page size can have large effects on performance (https://kobzol.github.io/rust/rustc/2023/10/21/make-rust-compiler-5percent-faster.html), typically at the cost of max RSS. I would still only enable this for aarch64 to keep the status quo though. |
I think we should not do this on x86-64 and other non-aarch64 targets. The Max RSS regressions are not nice, but on aarch64 we need to do, so we need to accept them. But for other targets, I don't see a reason to. |
c0a1eb0
to
888b3b7
Compare
Restricted to AArch64 & now picks the value up from the environment if one is already set. |
This comment has been minimized.
This comment has been minimized.
888b3b7
to
83ff26d
Compare
Thanks! Let's check that x64 is unaffected. I'll do the rustc-arena 64 KiB page test in a different PR. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
bootstrap: Build jemalloc with support for 64K pages By default, jemalloc is built to only support the same page size as the host machine. Set an env variable so that jemalloc is built with support for page sizes up to 64K regardless of the host machine. r? `@Kobzol` Resolves rust-lang#134563 Potentially resolves rust-lang#133748 (needs verification) ---- Results from local rustc-perf testing below, within 0.5% on every metric except max-rss. AArch64: ![Screenshot 2025-01-03 at 5 53 13 pm](https://github.com/user-attachments/assets/71705c59-7d7b-4753-a184-8c784233e603) x86_64: ![Screenshot 2025-01-03 at 5 54 16 pm](https://github.com/user-attachments/assets/ea28aded-3b90-43f4-a965-b081b07b95ab)
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (5917c7b): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -0.7%, secondary -2.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 1.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 763.649s -> 764.164s (0.07%) |
Good! Sorry, realized one more thing - could you please add a comment on top of the added if condition to explain why is it there (with a link to this PR or the original issue)? Bootstrap is full of one-off hacks like this, and it's always a good idea to at least document why is some custom code there. |
By default, jemalloc is built to only support the same page size as the host machine. For AArch64 targets, set an env variable so that jemalloc is built with support for page sizes up to 64K regardless of the host machine.
83ff26d
to
53a5857
Compare
No problem, done |
Thanks! We can try to reapply the ARM dist builder PRs after this change has landed. @bors r+ rollup |
Indeed :) should I repost it as a new PR or can bors just reapply it from the old one? |
We'll need two new PRs (one for the dist-arm runner and other for the arm64 optimized runner). |
…llaumeGomez Rollup of 9 pull requests Successful merges: - rust-lang#135081 (bootstrap: Build jemalloc with support for 64K pages) - rust-lang#135174 ([AIX] Port test case run-make/reproducible-build ) - rust-lang#135177 (llvm: Ignore error value that is always false) - rust-lang#135182 (Transmute from NonNull to pointer when elaborating a box deref (MCP807)) - rust-lang#135187 (apply a workaround fix for the release roadblock) - rust-lang#135189 (Remove workaround from pull request template) - rust-lang#135193 (don't bless `proc_macro_deps.rs` unless it's necessary) - rust-lang#135198 (Avoid naming variables `str`) - rust-lang#135199 (Eliminate an unnecessary `Symbol::to_string`; use `as_str`) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#135081 - mrkajetanp:jemalloc-64k, r=Kobzol bootstrap: Build jemalloc with support for 64K pages By default, jemalloc is built to only support the same page size as the host machine. Set an env variable so that jemalloc is built with support for page sizes up to 64K regardless of the host machine. r? `@Kobzol` Resolves rust-lang#134563 Potentially resolves rust-lang#133748 (needs verification) ---- Results from local rustc-perf testing below, within 0.5% on every metric except max-rss. AArch64: ![Screenshot 2025-01-03 at 5 53 13 pm](https://github.com/user-attachments/assets/71705c59-7d7b-4753-a184-8c784233e603) x86_64: ![Screenshot 2025-01-03 at 5 54 16 pm](https://github.com/user-attachments/assets/ea28aded-3b90-43f4-a965-b081b07b95ab)
By default, jemalloc is built to only support the same page size as the host machine. Set an env variable so that jemalloc is built with support for page sizes up to 64K regardless of the host machine.
r? @Kobzol
Resolves #134563
Potentially resolves #133748 (needs verification)
Results from local rustc-perf testing below, within 0.5% on every metric except max-rss.
AArch64:
x86_64: