[pull] main from llvm:main #5546

pull · 2025-01-16T01:14:23Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

…er (#121854) Recently I had a scenario where I had: 1. A class C with many members m_1...m_n of the same type T 2. T's default constructor was deleted 3. I accidentally omitted an explicitly constructed member in the initializer list C() : m_1(foo), m_2(bar), ... { } Clang told me that T's default constructor was deleted, and told me that the call to T() was in C() (which it implicitly was), but didn't tell me which member was being default constructed. It was difficult to fix this problem because I had no easy way to list all the members of type T in C and C's superclasses which would have let me find which member was missing, clang/test/CXX/class/class.init/p1.cpp is a simplified version of this problem (a2 is missing from the initializer list of B)

…124278) Implement parsing and symbol resolution for directives that take arguments. There are a few, and most of them take objects. Special handling is needed for two that take more specialized arguments: DECLARE MAPPER and DECLARE REDUCTION. This only affects directives in METADIRECTIVE's WHEN and OTHERWISE clauses. Parsing and semantic checks of other cases is unaffected.

…Form. NFC Fixes #125551

… chains until the end of `finishPendingActions`. (#121245) The call to `hasBody` inside `finishPendingActions` that bumps the `PendingIncompleteDeclChains` size from `0` to `1`, and also sets the `LazyVal->LastGeneration` to `6` which matches the `LazyVal->ExternalSource->getGeneration()` value of `6`. Later, the iterations over `redecls()` (which calls `getNextRedeclaration`) is expected to trigger the reload, but it **does not** since the generation numbers match. The proposed solution is to perform the marking of incomplete decl chains at the end of `finishPendingActions`. This way, **all** of the incomplete decls are marked incomplete as a post-condition of `finishPendingActions`. It's also safe to delay this operation since any operation being done within `finishPendingActions` has `NumCurrentElementsDeserializing == 1`, which means that any calls to `CompleteDeclChain` would simply add to the `PendingIncompleteDeclChains` without doing anything anyway.

@lukejriddle

…ions is iterable (#125557) My colleague, @lukejriddle made the SBMemoryRegionList object iterable in #117358. This isn't documented anywhere and so I added a blurb about it to SBProcess.

While attempting to teach ScalarEvolution about samesign in another effort, a complicated testcase with nested loops, and zero-extends of the induction-variable regresses, but only when the target datalayout is present. The regression was originally reported on IndVarSimplify, but an improvement of symbolic BTC was also observed on SCEV. Check in the test into both IndVarSimplify and SCEV, to ease investigation and further development.

DataLayout is already available as a member variable.

… value in libcxx container summary (#125294) This has two changes: 1. Set show value for libcxx and libstdcxx summary provider. This will print the pointer value for both pointer type and reference type. 2. Remove pointer value printing in libcxx container summary. Discussion: https://discourse.llvm.org/t/lldb-hides-raw-pointer-value-for-libcxx-and-libstdcxx-pointer-types-in-summary-string/84226

This shows missed opportunity to fold (fshl ld1, ld0, c) -> (ld0[ofs]) if the load chain results are used.

…d0[ofs]) combine. (#124871) Happened to notice some odd things related to chains in this code. The code calls hasOneUse on LoadSDNode* which will check users of the data and the chain. I think this was trying to check that the data had one use so one of the loads would definitely be removed by the transform. Load chains don't always have users so our testing may not have noticed that the chains being used would block the transform. The code makes all users of ld1's chain use the new load's chain, but we don't know that ld1 becomes dead. This can cause incorrect dependencies if ld1's chain is used and it isn't deleted. I think the better thing to do is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1 depend on the new load and the original loads. If the olds loads become dead, their chain will be cleaned up later. I'm having trouble getting a test for any ordering issue with the current code. areNonVolatileConsecutiveLoads requires the two loads to have the same input chain. Given that, I don't know how to use one of the load chain results without also using the other. If they are both used we don't do the transform because SDNode::hasOneUse will return false for both.

…5526) This is attempt 2 to merge this, the first one is #117622. This properly disables the tests when building for playstation, since the warning is disabled there. When a hidden object is built into multiple shared libraries, each instance of the library will get its own copy. If the object was supposed to be globally unique (e.g. a global variable or static data member), this can cause very subtle bugs. An object might be incorrectly duplicated if it: Is defined in a header (so it might appear in multiple TUs), and Has external linkage (otherwise it's supposed to be duplicated), and Has hidden visibility (or else the dynamic linker will handle it) The duplication is only a problem semantically if one of the following is true: The object is mutable (the copies won't be in sync), or Its initialization has side effects (it may now run more than once), or The value of its address is used (different copies have different addresses). To detect this, we add a new -Wunique-object-duplication warning. It warns on cases (1) and (2) above. To be conservative, we only warn in case (2) if we are certain the initializer has side effects, and we don't warn on new because the only side effect is some extra memory usage. We don't currently warn on case (3) because doing so is prone to false positives: there are many reasons for taking the address which aren't inherently problematic (e.g. passing to a function that expects a pointer). We only run into problems if the code inspects the value of the address. The check is currently disabled for windows, which uses its own analogue of visibility (declimport/declexport). The check is also disabled inside templates, since it can give false positives if a template is never instantiated. Resolving the warning The warning can be fixed in several ways: If the object in question doesn't need to be mutable, it should be made const. Note that the variable must be completely immutable, e.g. we'll warn on const int* p because the pointer itself is mutable. To silence the warning, it should instead be const int* const p. If the object must be mutable, it (or the enclosing function, in the case of static local variables) should be made visible using __attribute((visibility("default"))) If the object is supposed to be duplicated, it should be be given internal linkage. Testing I've tested the warning by running it on clang itself, as well as on chromium. Compiling clang resulted in [10 warnings across 6 files](https://github.com/user-attachments/files/17908069/clang-warnings.txt), while Chromium resulted in [160 warnings across 85 files](https://github.com/user-attachments/files/17908072/chromium-warnings.txt), mostly in third-party code. Almost all warnings were due to mutable variables. I evaluated the warnings by manual inspection. I believe all the resulting warnings are true positives, i.e. they represent potentially-problematic code where duplication might cause a problem. For the clang warnings, I also validated them by either adding const or visibility annotations as appropriate. Limitations I am aware of four main limitations with the current warning: We do not warn when the address of a duplicated object is taken, since doing so is prone to false positives. I'm hopeful that we can create a refined version in the future, however. We only warn for side-effectful initialization if we are certain side effects exist. Warning on potential side effects produced a huge number of false positives; I don't expect there's much that can be done about this in modern C++ code bases, since proving a lack of side effects is difficult. Windows uses a different system (declexport/import) instead of visibility. From manual testing, it seems to behave analogously to the visibility system for the purposes of this warning, but to keep things simple the warning is disabled on windows for now. We don't warn on code inside templates. This is unfortuate, since it masks many real issues, e.g. a templated variable which is implicitly instantiated the same way in multiple TUs should be globally unique, but may accidentally be duplicated. Unfortunately, we found some potential false positives during testing that caused us to disable the warning for now.

Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect E to be nonnull.

Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect typeDecl to be nonnull. Note that getObjCInterfaceType starts out dereferencing Decl.

Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect referent to be nonnull.

This way we don't need to duplicate the list of supported targets in the release-tasks workflow.

Signed-off-by: Mikhail R. Gadelha <[email protected]>

)

This also changes the container version numbers in the tag from unix timestamps to the abbreviated commit hash for the workflow. This ensures that the amd64 and arm64 containers have the same tag. For amd64 we now generate 4 tags: * ghcr.io/llvm/ci-ubuntu-22.04:latest * ghcr.io/llvm/ci-ubuntu-22.04:$GITHUB_SHA * ghcr.io/llvm/amd64/ci-ubuntu-22.04:latest * ghcr.io/llvm/amd64/ci-ubuntu-22.04:$GITHUB_SHA For arm64 we generate 2 tags: * ghcr.io/tstellar/arm64v8/ci-ubuntu-22.04:latest * ghcr.io/tstellar/arm64v8/ci-ubuntu-22.04:$GITHUB_SHA

This change introduces lowering from CIR to LLVM IR of global integer and floating-point variables, using defaults for attributes that aren't yet implemented.

…ationship (#125300) This enables finding the backed thread from the backing thread without going through the thread list, and it will be useful for subsequent commits.

… in getSingleShuffleSrc. (#125455) I have been unsuccessful at further reducing the test. The failure requires a shuffle with 2 scalable->fixed extracts with the same source. 0 is the only valid index for a scalable->fixed extract so the 2 sources must be the same extract. Shuffles with the same source are aggressively canonicalized to a unary shuffle. So it requires the extracts to become identical through other optimizations without the shuffle being canonicalized before it is lowered. Fixes #125306.

…15099) This is similar in spirit to previous changes to make _mm_mfence builtins to avoid conflicts with winnt.h and other MSVC ecosystem headers that pre-declare compiler intrinsics as extern "C" symbols. Also update the feature flag for _mm_prefetch to sse, which is more accurate than mmx. This should fix issue #87515.

From #115099

This is a test library which is not part of libMLIR, so it should use normal LINK_LIBS instead of mlir_target_link_libraries. This fixes an issue introduced in #123910 and follows up on the fix in #125004, which added the library to DEPENDS, which is not sufficient.

Changes: 1. Fix inconsistencies in register pressure set printing. "Max Pressure" printing is inconsistent with "Bottom Pressure" and "Top Pressure". For the former, register class begins on the same line vs newline for latter. Also for the former, the first register class is on the same line, but subsequent register classes are newline separated. That's removed so all are on the same line. Before: Max Pressure: FPR8=1 GPR32=14 Top Pressure: GPR32=2 Bottom Pressure: FPR8=7 GPR32=17 After: Max Pressure: FPR8=1 GPR32=14 Top Pressure: GPR32=2 Bottom Pressure: FPR8=7 GPR32=17 2. After scheduling an instruction, don't print pressure diff if there isn't one. Also s/UpdateRegP/UpdateRegPressure. E.g., Before: UpdateRegP: SU(3) %0:gpr64common = ADDXrr %58:gpr64common, gpr64 to UpdateRegP: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 390, 12 to GPR32 -1 After: UpdateRegPressure: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 12 to GPR32 -1 3. Don't print excess pressure sets if there are none.

Previously the grammar tokens SimpleValue2 through SimpleValue9 were unreferenced. This ties them together so that the grammar makes more sense.

…125819) Closes #125438

This commit moves the rotate builtin to the CLC library. It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n) intrinsic, for both scalar and vector types. The previous implementation was too cautious in its handling of the shift amount; the OpenCL rules state that the shift amount is always treated as an unsigned value modulo the bitwidth.

PR #124961 adds intrinsics for the tcgen05 alloc/dealloc PTX instructions. This patch adds NVVM Ops for the same. Tests are added to verify the lowering to the corresponding intrinsics in tcgen05-alloc.mlir file. PTX ISA link: https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-memory-alloc-manage-instructions Signed-off-by: Durgadoss R <[email protected]>

The Fortran libraries are not part of MLIR, so they should use target_link_libraries() rather than mlir_target_link_libraries(). This fixes an issue introduced in #120966.

These were referring to nonexistent grammar tokens instead of `Value`.

For consistency with input def handling.

Removed the TOSA quantization attribute used in various MLIR TOSA dialect operations in favour of using builtin attributes. Update any lit tests, conversions and transformations appropriately. Signed-off-by: Tai Ly <[email protected]> Co-authored-by: Tai Ly <[email protected]>

This PR moves maximum number of threads in a block and block in a grid to nvgpu dialect to avoid replicated code. The limits are defined here: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#features-and-technical-specifications-technical-specifications-per-compute-capability

Make lifetime management more explicit. We're only using this for CXXPseudoDestructorExprs for now but we need this to handle std::construct_at/placement-new after destructor calls later anyway.

When building mlir with `-DMLIR_NVVM_EMBED_LIBDEVICE=ON`, there will be a warning ``` build/tools/mlir/lib/Target/LLVM/libdevice_embedded.c:1: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘143’ to ‘-113’ [-Woverflow] ``` which is followed by a large number of characters in stdout. Fix this to avoid stdout outputting a large number of characters (3e5).

The previous implementation had false positive/negative cases in the analysis of the loop carried dependency. A missed dependency case is caused by incorrect analysis of address increments. This is fixed by strict analysis of recursive definitions. See added test swp-carried-dep4.mir. Excessive dependency detection is fixed by improving the formula for determining the overlap of address ranges to be accessed. See added test swp-carried-dep5.mir.

If the input contains odd number of shuffled vectors, the 2 last shuffles are shuffled with the same first vector. Need to correctly process such situation: when the first vector is requested for the first time - extract it from the source vector, when it is requested the second time - reuse previous result. The second vector should be extracted in both cases. Fixes #125269 Reviewers: topperc, preames Reviewed By: preames Pull Request: #125693

This patch updates the cost model for fmuladd on vector types to scale with LMUL. This was found when analyzing a hot loop in 519.lbm_r that was unprofitably vectorized, but doesn't directly impact that case and is split off so it doesn't get forgotten. Unlike other FP arithmetic ops, it's not scaled by 2 because the scalar cost isn't scaled by 2.

…act (#125560) This modifies the conversion patterns so that, in the case where the index is known statically to be poison, the insertion/extraction is replaced by an arbitrary junk constant value, and in the dynamic case, the index is sanitized at runtime. This avoids triggering a UB in both cases. The dynamic case is definitely a pessimisation of the generated code, but the use of dynamic indexes is expected to be very rare and already slow on real-world GPU compilers ingesting SPIR-V, so the impact should be negligible. Resolves #124162.

We don't want to allow partial reductions resulting in a vscale x 1 type as we can't lower it in the backend.

HTML starting tags that span multiple lines were previously not allowed (or rather, only the starting line was lexed as HTML). Doxygen allows those tags. This PR allows the starting tags to span multiple lines. They can't span multiple (C-)Comments, though (it's likely a user-error). Multiple BCPL comments are fine as those are single lines (shown below). Example: ```c /// <a /// href="foo" /// >Aaa</a>b int Test; ``` Fixes #28321.

…118471) This patch adds initial support for target_device selector set - Section 9.2 (Spec 6.0)

#124839) There's no reason not to, and it's easy enough to do using enable_if. As a drive-by change, also add a missing _LIBCPP_NO_CFI attribute on __add_alignment_assumption.

…25796) Summary: Some attributes have gnu extensions that share names with clang attributes. If these imply the same thing, we can specially declare this to be an alternate but equivalent spelling. This patch enables this for `no_sanitize` and provides the infrastructure for more to be added if needed. Discussions welcome on whether or not we want to bind ourselves to GNU behavior, since theoretically it's possible for GNU to silently change the semantics away from our implementation, but I'm not an expert. Fixes: #125760

pull bot added the ⤵️ pull label Jan 16, 2025

bpfoley and others added 29 commits February 3, 2025 19:57

[RISCV][GISel] Remove unused function leftover from a removed SDNodeX…

fc3ec13

…Form. NFC Fixes #125551

[LLDB][Documentation] Add a doc string to sbprocess to show MemoryReg…

a3321ea

…ions is iterable (#125557) My colleague, @lukejriddle made the SBMemoryRegionList object iterable in #117358. This isn't documented anywhere and so I added a blurb about it to SBProcess.

IndVarSimplify: strip redundant getDataLayout (NFC) (#125546)

a29ed04

DataLayout is already available as a member variable.

[X86] Add test case for #124871. NFC

c3b7894

This shows missed opportunity to fold (fshl ld1, ld0, c) -> (ld0[ofs]) if the load chain results are used.

Fix "not all control paths return a value" warning; NFC

070c338

[Hexagon] Avoid repeated hash lookups (NFC) (#125459)

546d03c

[Analysis] Avoid repeated hash lookups (NFC) (#125462)

36fb886

[CodeGen] Avoid repeated hash lookups (NFC) (#125463)

22bc029

[ProfileData] Avoid repeated hash lookups (NFC) (#125464)

09d945d

workflows/release-tasks: Re-use release-binaries-all workflow (#125378)

d194c6b

This way we don't need to duplicate the list of supported targets in the release-tasks workflow.

[RISCV] Precommit test for #124932

d156b85

Signed-off-by: Mikhail R. Gadelha <[email protected]>

workflows/premerge: Cancel in progress jobs when a PR is merged (#125329

2deba08

)

[AArch64] Move arith-fp-sve.ll to sve-arith-fp.ll. NFC

a9b3e11

[CIR] Initial implementation of CIR-to-LLVM IR lowering pass (#125260)

622ee03

This change introduces lowering from CIR to LLVM IR of global integer and floating-point variables, using defaults for attributes that aren't yet implemented.

[lldb] Implement bidirectional access for backing<->backed thread rel…

90a51a4

…ationship (#125300) This enables finding the backed thread from the backing thread without going through the thread list, and it will be useful for subsequent commits.

[clang] Unbreak build

5dccfd9

From #115099

RKSimon and others added 30 commits February 5, 2025 08:54

[SLP][X86] Add test coverage for #124993

4fdd28b

[NFC][ValueTracking] Hoist the matching of RHS constant (#125818)

8bba8a5

[TableGen][Docs] Fix productionlists for SimpleValue (#123751)

439de72

Previously the grammar tokens SimpleValue2 through SimpleValue9 were unreferenced. This ties them together so that the grammar makes more sense.

[LLD][COFF] Emit locally imported EC symbols for ARM64X (#125527)

8cb3d7b

[IR][NFC] Remove obsolete comments in BinaryOperator::swapOperands (#…

6c84d64

…125819) Closes #125438

[flang][cmake] Fix bcc dependencies (#125822)

f9af5c1

The Fortran libraries are not part of MLIR, so they should use target_link_libraries() rather than mlir_target_link_libraries(). This fixes an issue introduced in #120966.

[TableGen][Docs] Fix productionlists for assert and dump (#123739)

b275309

These were referring to nonexistent grammar tokens instead of `Value`.

[LLD][COFF] Use EC symbol table for output DEF file on ARM64X (#125531)

e596387

For consistency with input def handling.

[clang][bytecode] Handle CXXPseudoDestructorExprs (#125835)

ee25a85

Make lifetime management more explicit. We're only using this for CXXPseudoDestructorExprs for now but we need this to handle std::construct_at/placement-new after destructor calls later anyway.

[bazel] Port for baf2786

7945a33

[CodeGen][NewPM] Port RenameIndependentSubregs to NPM (#125192)

f77f777

[CodeGen][NewPM] Port SIWholeQuadMode to NPM. (#125833)

b83c960

[CodeGen][NewPM] Port GCNPreRALongBranchReg to NPM. (#125844)

814db6c

[AArch64] Disallow vscale x 1 partial reductions (#125252)

c7995a6

We don't want to allow partial reductions resulting in a vscale x 1 type as we can't lower it in the backend.

[libc++] Fix stray usage of _LIBCPP_HAS_NO_WIDE_CHARACTERS on Windows

bcfd9f8

[OpenMP]Initial parsing/sema support for target_device selector set (#…

8c36665

…118471) This patch adds initial support for target_device selector set - Section 9.2 (Spec 6.0)

[libc++] Also provide an alignment assumption for vector in C++03 mode (

ccb08b9

#124839) There's no reason not to, and it's easy enough to do using enable_if. As a drive-by change, also add a missing _LIBCPP_NO_CFI attribute on __add_alignment_assumption.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from llvm:main #5546

[pull] main from llvm:main #5546

pull bot commented Jan 16, 2025 •

edited

Loading

[pull] main from llvm:main #5546

Are you sure you want to change the base?

[pull] main from llvm:main #5546

Conversation

pull bot commented Jan 16, 2025 • edited Loading

pull bot commented Jan 16, 2025 •

edited

Loading