-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance memcpy and remove redundant implementations #22513
Conversation
lib/compiler_rt/memcpy.zig
Outdated
@@ -5,24 +5,141 @@ const builtin = @import("builtin"); | |||
comptime { | |||
if (builtin.object_format != .c) { | |||
@export(&memcpy, .{ .name = "memcpy", .linkage = common.linkage, .visibility = common.visibility }); | |||
@export(&memcpy, .{ .name = "memmove", .linkage = common.linkage, .visibility = common.visibility }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not an alias that is not portable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I left it so I could show you how it would fail and then add a commit to fix it. (Or it doesn't fail and then I need to reexamine why I thought it wasn't portable)
edit: yeah the failures are due to weak aliases not working. perhaps this should be a semantic analysis error 🤔
These are provided instead by compiler_rt. Part of #2879
why the hell are there asm files in wasi libc to begin with?
@alexrp I'm getting a bunch of arm and thumb failures like this:
Since these look like LLVM bugs and they're in more obscure targets, I'm inclined to disable these tests and make sure there is an issue tracking them. Unless you have an appetite for looking into this? |
mm, on these targets a simpler memcpy should be used. I suspect the code is just too hard for llvm to lower. |
Do you have a repro on hand? If so, I can try to |
triggers an assertion in LegalizeDAG otherwise
Alright so the fact that we can't use the good impl for arm because of llvm bugs makes deleting the musl arm assembly implementations of memcpy a little sus but I'm still going to do it. The next person to work on memcpy (probably @dweiller) will need to ensure arm perf is measured as well. By deleting musl implementation, the memcpy implementation actually starts to matter, and it also becomes more straightforward to take real world perf data points. I hope this also helps people understand the implications of #2879. |
strong symbols always take precedence over weak symbols.
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "armv7a-unknown-unknown-eabi"
define ptr @memmove2() #0 {
store <8 x i8> zeroinitializer, ptr null, align 1
ret ptr null
}
attributes #0 = { "target-features"="+aclass,+d32,+db,+dsp,+fp64,+fpregs,+fpregs64,+v4t,+v5t,+v5te,+v6,+v6k,+v6m,+v6t2,+v7,+v7clrex,+v8m,+neon,+perfmon,+thumb2,+armv7-a,+vfp2,+vfp2sp,+vfp3,+vfp3d16,+vfp3d16sp,+vfp3sp,-32bit,-8msecext,-aapcs-frame-chain,-acquire-release,-aes,-atomics-32,-avoid-movs-shop,-avoid-partial-cpsr,-bf16,-big-endian-instructions,-cde,-cdecp0,-cdecp1,-cdecp2,-cdecp3,-cdecp4,-cdecp5,-cdecp6,-cdecp7,-cheap-predicable-cpsr,-clrbhb,-crc,-crypto,-dfb,-disable-postra-scheduler,-dont-widen-vmovs,-dotprod,-execute-only,-expand-fp-mlx,-fix-cmse-cve-2021-35465,-fix-cortex-a57-aes-1742098,-fp16,-fp16fml,-fp-armv8,-fp-armv8d16,-fp-armv8d16sp,-fp-armv8sp,-fpao,-fpregs16,-fullfp16,-fuse-aes,-fuse-literals,-harden-sls-blr,-harden-sls-nocomdat,-harden-sls-retbr,-v8,-v8.1a,-v8.1m.main,-v8.2a,-v8.3a,-v8.4a,-v8.5a,-v8.6a,-v8.7a,-v8.8a,-v8.9a,-v8m.main,-v9.1a,-v9.2a,-v9.3a,-v9.4a,-v9.5a,-v9a,-hwdiv,-hwdiv-arm,-i8mm,-iwmmxt,-iwmmxt2,-lob,-long-calls,-loop-align,-mclass,-mp,-muxed-units,-mve,-mve1beat,-mve2beat,-mve4beat,-mve.fp,-nacl-trap,-neon-fpmovs,-neonfp,-no-branch-predictor,-no-bti-at-return-twice,-no-movt,-no-neg-immediates,-noarm,-nonpipelined-vfp,-pacbti,-prefer-ishst,-prefer-vmovsr,-prof-unpr,-ras,-rclass,-read-tp-tpidrprw,-read-tp-tpidruro,-read-tp-tpidrurw,-reserve-r9,-ret-addr-stack,-sb,-sha2,-slow-fp-brcc,-slow-load-D-subreg,-slow-odd-reg,-slow-vdup32,-slow-vgetlni32,-slowfpvfmx,-slowfpvmlx,-soft-float,-splat-vfp-neon,-strict-align,-thumb-mode,-trustzone,-use-mipipeliner,-use-misched,-armv4,-armv4t,-armv5t,-armv5te,-armv5tej,-armv6,-armv6j,-armv6k,-armv6kz,-armv6-m,-armv6s-m,-armv6t2,-armv7e-m,-armv7-m,-armv7-r,-armv7ve,-armv8.1-a,-armv8.1-m.main,-armv8.2-a,-armv8.3-a,-armv8.4-a,-armv8.5-a,-armv8.6-a,-armv8.7-a,-armv8.8-a,-armv8.9-a,-armv8-a,-armv8-m.base,-armv8-m.main,-armv8-r,-armv9.1-a,-armv9.2-a,-armv9.3-a,-armv9.4-a,-armv9.5-a,-armv9-a,-vfp4,-vfp4d16,-vfp4d16sp,-vfp4sp,-virtualization,-vldn-align,-vmlx-forwarding,-vmlx-hazards,-wide-stride-vfp,-xscale,-zcz" "use-soft-float"="true" } Remove There is some silliness on the LLVM side where you have to massage the VFP/NEON target features when using If you change Lines 412 to 415 in f38d7a9
(And yes, I do have plans to make this less hacky eventually.) It looks like #22434 accidentally broke this hack by moving ABI detection after those CPU feature hacks. So it's on me for not noticing that during review. The good news is that those features aren't relevant for ABI detection on Arm and Hexagon, so we can simply move them after ABI detection. #22526 should fix this. |
perf data point: building hello world with a compiler with different memcpy versions.
stage4/bin/zig build -p fast -Doptimize=ReleaseFast -Dno-lib -Dforce-link-libc -Dtarget=native-native-musl
Difference between these is the second one has the different compiler_rt memcpy. Both have musl libc memcpy deleted.