Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from llvm:main #5546

Open
wants to merge 2,383 commits into
base: main
Choose a base branch
from
Open

[pull] main from llvm:main #5546

wants to merge 2,383 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Jan 16, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Jan 16, 2025
bpfoley and others added 29 commits February 3, 2025 19:57
…er (#121854)

Recently I had a scenario where I had:
1. A class C with many members m_1...m_n of the same type T
2. T's default constructor was deleted
3. I accidentally omitted an explicitly constructed member in the
initializer list C() : m_1(foo), m_2(bar), ... { }

Clang told me that T's default constructor was deleted, and told me that
the call to T() was in C() (which it implicitly was), but didn't tell me
which member was being default constructed.

It was difficult to fix this problem because I had no easy way to list
all the members of type T in C and C's superclasses which would have let
me find which member was missing,

clang/test/CXX/class/class.init/p1.cpp is a simplified version of this
problem (a2 is missing from the initializer list of B)
…124278)

Implement parsing and symbol resolution for directives that take
arguments. There are a few, and most of them take objects. Special
handling is needed for two that take more specialized arguments: DECLARE
MAPPER and DECLARE REDUCTION.

This only affects directives in METADIRECTIVE's WHEN and OTHERWISE
clauses. Parsing and semantic checks of other cases is unaffected.
… chains until the end of `finishPendingActions`. (#121245)

The call to `hasBody` inside `finishPendingActions` that bumps the `PendingIncompleteDeclChains`
size from `0` to `1`, and also sets the `LazyVal->LastGeneration` to `6` which matches
the `LazyVal->ExternalSource->getGeneration()` value of `6`. Later, the iterations over `redecls()`
(which calls `getNextRedeclaration`) is expected to trigger the reload, but it **does not** since
the generation numbers match.

The proposed solution is to perform the marking of incomplete decl chains at the end of `finishPendingActions`.
This way, **all** of the incomplete decls are marked incomplete as a post-condition of `finishPendingActions`.
It's also safe to delay this operation since any operation being done within `finishPendingActions` has
`NumCurrentElementsDeserializing == 1`, which means that any calls to `CompleteDeclChain` would simply
add to the `PendingIncompleteDeclChains` without doing anything anyway.
…ions is iterable (#125557)

My colleague, @lukejriddle made the SBMemoryRegionList object iterable
in #117358. This isn't documented anywhere and so I added a blurb about
it to SBProcess.
While attempting to teach ScalarEvolution about samesign in another
effort, a complicated testcase with nested loops, and zero-extends of
the induction-variable regresses, but only when the target datalayout is
present. The regression was originally reported on IndVarSimplify, but
an improvement of symbolic BTC was also observed on SCEV. Check in the
test into both IndVarSimplify and SCEV, to ease investigation and
further development.
DataLayout is already available as a member variable.
… value in libcxx container summary (#125294)

This has two changes:
1. Set show value for libcxx and libstdcxx summary provider. This will
print the pointer value for both pointer type and reference type.
2. Remove pointer value printing in libcxx container summary.

Discussion:

https://discourse.llvm.org/t/lldb-hides-raw-pointer-value-for-libcxx-and-libstdcxx-pointer-types-in-summary-string/84226
This shows missed opportunity to fold (fshl ld1, ld0, c) -> (ld0[ofs])
if the load chain results are used.
…d0[ofs]) combine. (#124871)

Happened to notice some odd things related to chains in this code.

The code calls hasOneUse on LoadSDNode* which will check users
of the data and the chain. I think this was trying to check that
the data had one use so one of the loads would definitely be
removed by the transform. Load chains don't always have users so
our testing may not have noticed that the chains being used would
block the transform.

The code makes all users of ld1's chain use the new load's chain, but
we don't know that ld1 becomes dead. This can cause incorrect dependencies if
ld1's chain is used and it isn't deleted. I think the better thing to do
is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1
depend on the new load and the original loads. If the olds loads become
dead, their chain will be cleaned up later.

I'm having trouble getting a test for any ordering issue with the current code.
areNonVolatileConsecutiveLoads requires the two loads to have the same
input chain. Given that, I don't know how to use one of the load chain
results without also using the other. If they are both used we don't
do the transform because SDNode::hasOneUse will return false for both.
…5526)

This is attempt 2 to merge this, the first one is #117622. This properly
disables the tests when building for playstation, since the warning is
disabled there.

When a hidden object is built into multiple shared libraries, each
instance of the library will get its own copy. If
the object was supposed to be globally unique (e.g. a global variable or
static data member), this can cause very subtle bugs.

An object might be incorrectly duplicated if it:

Is defined in a header (so it might appear in multiple TUs), and
Has external linkage (otherwise it's supposed to be duplicated), and
Has hidden visibility (or else the dynamic linker will handle it)
The duplication is only a problem semantically if one of the following
is true:

The object is mutable (the copies won't be in sync), or
Its initialization has side effects (it may now run more than once), or
The value of its address is used (different copies have different
addresses).
To detect this, we add a new -Wunique-object-duplication warning. It
warns on cases (1) and (2) above. To be conservative, we only warn in
case (2) if we are certain the initializer has side effects, and we
don't warn on new because the only side effect is some extra memory
usage.

We don't currently warn on case (3) because doing so is prone to false
positives: there are many reasons for taking the address which aren't
inherently problematic (e.g. passing to a function that expects a
pointer). We only run into problems if the code inspects the value of
the address.

The check is currently disabled for windows, which uses its own analogue
of visibility (declimport/declexport). The check is also disabled inside
templates, since it can give false positives if a template is never
instantiated.

Resolving the warning
The warning can be fixed in several ways:

If the object in question doesn't need to be mutable, it should be made
const. Note that the variable must be completely immutable, e.g. we'll
warn on const int* p because the pointer itself is mutable. To silence
the warning, it should instead be const int* const p.
If the object must be mutable, it (or the enclosing function, in the
case of static local variables) should be made visible using
__attribute((visibility("default")))
If the object is supposed to be duplicated, it should be be given
internal linkage.
Testing
I've tested the warning by running it on clang itself, as well as on
chromium. Compiling clang resulted in [10 warnings across 6
files](https://github.com/user-attachments/files/17908069/clang-warnings.txt),
while Chromium resulted in [160 warnings across 85
files](https://github.com/user-attachments/files/17908072/chromium-warnings.txt),
mostly in third-party code. Almost all warnings were due to mutable
variables.

I evaluated the warnings by manual inspection. I believe all the
resulting warnings are true positives, i.e. they represent
potentially-problematic code where duplication might cause a problem.
For the clang warnings, I also validated them by either adding const or
visibility annotations as appropriate.

Limitations
I am aware of four main limitations with the current warning:

We do not warn when the address of a duplicated object is taken, since
doing so is prone to false positives. I'm hopeful that we can create a
refined version in the future, however.
We only warn for side-effectful initialization if we are certain side
effects exist. Warning on potential side effects produced a huge number
of false positives; I don't expect there's much that can be done about
this in modern C++ code bases, since proving a lack of side effects is
difficult.
Windows uses a different system (declexport/import) instead of
visibility. From manual testing, it seems to behave analogously to the
visibility system for the purposes of this warning, but to keep things
simple the warning is disabled on windows for now.
We don't warn on code inside templates. This is unfortuate, since it
masks many real issues, e.g. a templated variable which is implicitly
instantiated the same way in multiple TUs should be globally unique, but
may accidentally be duplicated. Unfortunately, we found some potential
false positives during testing that caused us to disable the warning for
now.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect E to be nonnull.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect typeDecl to be nonnull.  Note that
getObjCInterfaceType starts out dereferencing Decl.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect referent to be nonnull.
This way we don't need to duplicate the list of supported targets in the
release-tasks workflow.
Signed-off-by: Mikhail R. Gadelha <[email protected]>
This also changes the container version numbers in the tag from unix
timestamps to the abbreviated commit hash for the workflow. This ensures
that the amd64 and arm64 containers have the same tag.

For amd64 we now generate 4 tags:

* ghcr.io/llvm/ci-ubuntu-22.04:latest
* ghcr.io/llvm/ci-ubuntu-22.04:$GITHUB_SHA
* ghcr.io/llvm/amd64/ci-ubuntu-22.04:latest
* ghcr.io/llvm/amd64/ci-ubuntu-22.04:$GITHUB_SHA

For arm64 we generate 2 tags:

* ghcr.io/tstellar/arm64v8/ci-ubuntu-22.04:latest
* ghcr.io/tstellar/arm64v8/ci-ubuntu-22.04:$GITHUB_SHA
This change introduces lowering from CIR to LLVM IR of global integer
and floating-point variables, using defaults for attributes that aren't yet
implemented.
…ationship (#125300)

This enables finding the backed thread from the backing thread without
going through the thread list, and it will be useful for subsequent
commits.
… in getSingleShuffleSrc. (#125455)

I have been unsuccessful at further reducing the test. The
failure requires a shuffle with 2 scalable->fixed extracts with
the same source. 0 is the only valid index for a scalable->fixed
extract so the 2 sources must be the same extract. Shuffles with
the same source are aggressively canonicalized to a unary shuffle.
So it requires the extracts to become identical through other
optimizations without the shuffle being canonicalized before it is
lowered.

Fixes #125306.
…15099)

This is similar in spirit to previous changes to make _mm_mfence
builtins to avoid conflicts with winnt.h and other MSVC ecosystem
headers that pre-declare compiler intrinsics as extern "C" symbols.

Also update the feature flag for _mm_prefetch to sse, which is more accurate than mmx.

This should fix issue #87515.
RKSimon and others added 30 commits February 5, 2025 08:54
This is a test library which is not part of libMLIR, so it should use
normal LINK_LIBS instead of mlir_target_link_libraries.

This fixes an issue introduced in #123910 and follows up on the fix in
#125004, which added the library to DEPENDS, which is not sufficient.
Changes:
1. Fix inconsistencies in register pressure set printing. "Max Pressure"
   printing is inconsistent with "Bottom Pressure" and "Top Pressure".
   For the former, register class begins on the same line vs newline for
   latter. Also for the former, the first register class is on the same
   line, but subsequent register classes are newline separated. That's
   removed so all are on the same line.

   Before:
     Max Pressure: FPR8=1
     GPR32=14
     Top Pressure:
     GPR32=2
     Bottom Pressure:
     FPR8=7
     GPR32=17

   After:
     Max Pressure: FPR8=1 GPR32=14
     Top Pressure: GPR32=2
     Bottom Pressure: FPR8=7 GPR32=17

2. After scheduling an instruction, don't print pressure diff if there
   isn't one. Also s/UpdateRegP/UpdateRegPressure. E.g.,

   Before:
     UpdateRegP: SU(3) %0:gpr64common = ADDXrr %58:gpr64common, gpr64
                 to
     UpdateRegP: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 390, 12
                 to GPR32 -1

   After:
     UpdateRegPressure: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 12
                        to GPR32 -1
3. Don't print excess pressure sets if there are none.
Previously the grammar tokens SimpleValue2 through SimpleValue9 were
unreferenced. This ties them together so that the grammar makes more
sense.
This commit moves the rotate builtin to the CLC library.

It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n)
intrinsic, for both scalar and vector types. The previous implementation
was too cautious in its handling of the shift amount; the OpenCL rules
state that the shift amount is always treated as an unsigned value
modulo the bitwidth.
PR #124961 adds intrinsics for the tcgen05
alloc/dealloc PTX instructions. This patch
adds NVVM Ops for the same.

Tests are added to verify the lowering to
the corresponding intrinsics in tcgen05-alloc.mlir file.

PTX ISA link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-memory-alloc-manage-instructions

Signed-off-by: Durgadoss R <[email protected]>
The Fortran libraries are not part of MLIR, so they should use
target_link_libraries() rather than mlir_target_link_libraries().

This fixes an issue introduced in
#120966.
These were referring to nonexistent grammar tokens instead of `Value`.
Removed the TOSA quantization attribute used in various MLIR TOSA
dialect operations in favour of using builtin attributes.

Update any lit tests, conversions and transformations appropriately.

Signed-off-by: Tai Ly <[email protected]>
Co-authored-by: Tai Ly <[email protected]>
This PR moves maximum number of threads in a block and block in a grid
to nvgpu dialect to avoid replicated code.

The limits are defined here:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/#features-and-technical-specifications-technical-specifications-per-compute-capability
Make lifetime management more explicit. We're only using this for
CXXPseudoDestructorExprs for now but we need this to handle
std::construct_at/placement-new after destructor calls later anyway.
When building mlir with `-DMLIR_NVVM_EMBED_LIBDEVICE=ON`, there will be
a warning
```
build/tools/mlir/lib/Target/LLVM/libdevice_embedded.c:1: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘143’ to ‘-113’ [-Woverflow]
```
which is followed by a large number of characters in stdout.

Fix this to avoid stdout outputting a large number of characters (3e5).
The previous implementation had false positive/negative cases in the
analysis of the loop carried dependency.

A missed dependency case is caused by incorrect analysis of address
increments. This is fixed by strict analysis of recursive definitions.
See added test swp-carried-dep4.mir.

Excessive dependency detection is fixed by improving the formula
for determining the overlap of address ranges to be accessed. See added test
swp-carried-dep5.mir.
If the input contains odd number of shuffled vectors, the 2 last
shuffles are shuffled with the same first vector. Need to correctly
process such situation: when the first vector is requested for the first
time - extract it from the source vector, when it is requested the
second time - reuse previous result. The second vector should be
extracted in both cases.

Fixes #125269

Reviewers: topperc, preames

Reviewed By: preames

Pull Request: #125693
This patch updates the cost model for fmuladd on vector types to scale with LMUL. This was found when analyzing a hot loop in 519.lbm_r that was unprofitably vectorized, but doesn't directly impact that case and is split off so it doesn't get forgotten.

Unlike other FP arithmetic ops, it's not scaled by 2 because the scalar cost isn't scaled by 2.
…act (#125560)

This modifies the conversion patterns so that, in the case where the
index is known statically to be poison, the insertion/extraction is
replaced by an arbitrary junk constant value, and in the dynamic case,
the index is sanitized at runtime. This avoids triggering a UB in both
cases. The dynamic case is definitely a pessimisation of the generated
code, but the use of dynamic indexes is expected to be very rare and
already slow on real-world GPU compilers ingesting SPIR-V, so the impact
should be negligible.

Resolves #124162.
We don't want to allow partial reductions resulting in a vscale x 1 type
as we can't lower it in the backend.
HTML starting tags that span multiple lines were previously not allowed
(or rather, only the starting line was lexed as HTML). Doxygen allows
those tags.

This PR allows the starting tags to span multiple lines. They can't span
multiple (C-)Comments, though (it's likely a user-error). Multiple BCPL
comments are fine as those are single lines (shown below).

Example:

```c
/// <a
///     href="foo"
/// >Aaa</a>b
int Test;
```

Fixes #28321.
…118471)

This patch adds initial support for target_device selector set - Section
9.2 (Spec 6.0)
#124839)

There's no reason not to, and it's easy enough to do using enable_if. As
a drive-by change, also add a missing _LIBCPP_NO_CFI attribute on
__add_alignment_assumption.
…25796)

Summary:
Some attributes have gnu extensions that share names with clang
attributes. If these imply the same thing, we can specially declare this
to be an alternate but equivalent spelling. This patch enables this for
`no_sanitize` and provides the infrastructure for more to be added if
needed.

Discussions welcome on whether or not we want to bind ourselves to GNU
behavior, since theoretically it's possible for GNU to silently change
the semantics away from our implementation, but I'm not an expert.

Fixes: #125760
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment