Commit Graph

568824 Commits

Author SHA1 Message Date
Yury Plyakhin
81f445b7df [clang-sycl-linker][offload] Set TheImageKind based on IsAOTCompileNeeded flag (#180269)
Previously, TheImageKind was set to IMG_None and relied on a runtime
heuristic to determine the correct image type. This commit sets it
explicitly to IMG_Object for AOT-compiled images and IMG_SPIRV for
SPIR-V images based on the IsAOTCompileNeeded flag.

Also it adds test for this change, which required minor changes in
OffloadBinary and OffloadDump.
2026-02-11 10:21:22 -08:00
Nikolas Klauser
8c93fb0c05 [libc++] Refactor benchmarking std::make_heap and std::sort_heap together (#180935)
We're trying to get the time it takes to run all the benchmarks down, so
that we can run them on a regular basis. This patch saves us ~18 minutes
per run.
2026-02-11 13:17:44 -05:00
Nikolas Klauser
bc1e852241 [libc++] Refactor std::sort_heap benchmark (#180941)
We're trying to get the time it takes to run all the benchmarks down, so
that we can run them on a regular basis. This patch saves us ~80 seconds
per run.
2026-02-11 13:17:28 -05:00
Nikolas Klauser
f32fd56f0e [libc++] Reduce the number of runs on the ranges::min{,max} benchmarks (#179912)
Testing a bunch of range sizes has relatively little value. This reduces
the number of benchmarks so we can run them on a regular basis. This
saves ~10 minutes when running benchmarks.

Fixes #179698
2026-02-11 13:17:03 -05:00
Xing Xue
f6bdc928a8 [libc++][AIX] Enable LIBCPP_SHARED_PTR_DEFINE_LEGACY_INLINE_FUNCTIONS on AIX (#179784)
This PR enables the `LIBCPP_SHARED_PTR_DEFINE_LEGACY_INLINE_FUNCTIONS`
macro on AIX because the functions guarded by this macro are required for
backward compatibility.
2026-02-11 13:03:15 -05:00
Justin Fargnoli
45412b6790 [LoopUnrollPass] Indent LLVM_DEBUG() messages based on our depth in the tryToUnrollLoop() call graph (#178945)
Unify the ad-hoc use of whitespace in `LLVM_DEBUG()` messages. 

This approach should also make it easier to see which loop debug
messages correspond to and which part of the loop unrolling heuristics
each message corresponds to.
2026-02-11 17:59:42 +00:00
Charles Zablit
8d34f2851f [lldb-dap][windows] add --check-python command (#180784)
Implement [`[RFC] Add a Python check command to lldb-dap and
lldb`](https://discourse.llvm.org/t/rfc-add-a-python-check-command-to-lldb-dap-and-lldb/88972).

This patch adds the `--check-python` flag to `lldb-dap` on Windows, to
allow dap clients to verify the correct Python version is available
before trying to start `lldb-dap`.

Depends on:
- https://github.com/llvm/llvm-project/pull/180786

rdar://165442474
2026-02-11 18:48:31 +01:00
Dave Lee
52bcdc6ad1 [lldb] Implement bytecode based SyntheticChildren (#179832)
Initial implementation of a [bytecode][1] synthetic provider. This is a follow up to
https://github.com/llvm/llvm-project/pull/114333 which implemented the bytecode
interpreter, support for summary formatters, and more.

rdar://169727764

[1]: https://lldb.llvm.org/resources/formatterbytecode.html
2026-02-11 09:43:59 -08:00
Demetrius Kanios
7323cb3d69 [WebAssembly] Add a WASM table to llvm/test/MC/WebAssembly/wasm64.s. NFC (#180861)
Adds an `externref` WASM table to the MC level testing of Wasm64 to make
sure it emits the table with the `IS_64` (64-bit/i64 indices) flag.
2026-02-11 09:39:06 -08:00
Hank
e1b9d033cd [MLIR][Math] Fix math.ceil expansion to avoid undefined behavior on Inf/NaN (#170028)
Fixes #151786

The original `ceilf` expansion lowers to `fptosi`, which produces poison
for Inf, and any subsequent use leads to undefined behavior. This patch
adds a safe path, similar to the existing `round` expansion, for large
or special inputs and avoids the UB.
2026-02-11 12:36:32 -05:00
vangthao95
90a56a18ee AMDGPU/GlobalISel: RegBankLegalize for global atomic ordered add (#180829) 2026-02-11 09:30:11 -08:00
Simon Pilgrim
182eb9d21a [X86] Move getTargetVShift helpers earlier in the source file. NFC. (#180972)
Avoid having to add forward declarations for earlier functions to use them.
2026-02-11 17:19:32 +00:00
Harrison Hao
aad0ae4a39 [CMake] Only pass PYTHON_EXECUTABLE to native build if defined (#180964)
This addresses the following warning when PYTHON_EXECUTABLE is not set
in the host build:
```bash
  CMake Warning:
    Manually-specified variables were not used by the project:
      PYTHON_EXECUTABLE
```

Reference:
https://github.com/llvm/llvm-project/pull/163574
2026-02-12 01:18:44 +08:00
vangthao95
0286641f5c [AMDGPU] Add known bits for G_AMDGPU_COPY_SCC_VCC (#180560) 2026-02-11 09:06:28 -08:00
Michael Liao
cdc1f8afe8 [CIR] Match codegen change on fmin and fmax
- #113133 adds 'nsz' on the emitted 'llvm.minmum'/'llvm.maxnum' from
  fmin/fmax following the semantic clarification from #112852.
2026-02-11 12:03:40 -05:00
Matthias Springer
34eb59dd4b [mlir][IR][NFC] Simplify "splat" handling in DenseIntOrFPElementsAttr (#180965)
Since #180397, all elements of a `DenseIntOrFPElementsAttr` are padded
to full bytes. This enables additional simplifications: whether a
`DenseIntOrFPElementsAttr` is a splat or not can now be inferred from
the size of the buffer. This was not possible before because a single
byte sometimes contained multiple `i1` elements.

Discussion:
https://discourse.llvm.org/t/denseelementsattr-i1-element-type/62525
2026-02-11 17:59:20 +01:00
David Green
14fe03f6c6 [AArch64] Add extra fcmp+select tests. NFC
Some of these currently get scalarized with nnan.
2026-02-11 16:49:33 +00:00
Justice Adams
91dae0efec [green dragon] fix typo in clang-stage1-RA 2026-02-11 08:33:14 -08:00
Folkert de Vries
6a81656f7d [RISCV] improve musttail support (#170547)
Basically https://github.com/llvm/llvm-project/pull/168506 but for
riscv, so to be clear the hard work here is @heiher 's. I figured we may
as well get some extra eyeballs on this from riscv too.

Previously the riscv backend could not handle `musttail` calls with more
arguments than fit in registers, or any explicit `byval` or `sret`
parameters/return values. Those have now been implemented.

This is part of my push to get more LLVM backends to support `byval` and
`sret` parameters so that rust can stabilize guaranteed tail call
support. See also:

- https://github.com/llvm/llvm-project/pull/168956
- https://github.com/rust-lang/rust/issues/148748

---------

Co-authored-by: WANG Rui <wangrui@loongson.cn>
2026-02-11 17:27:51 +01:00
Justice Adams
2c1af70c06 [Green Dragon] add green dragon jenkinsfile definitions for multibranch pipelines (#180793)
Add CI job definitions using our new templated pipelines to
llvm-project, this way we can enable multi branch pipelines which
trigger for changes on a given branch.

By storing the Jenkinfile definitions in llvm-project, we gain the
benefit of enabling Jenkins multi branch pipelines. This means in the
future, expanding a job configuration to build with a new branch is as
simple as updating a regular expression in Jenkins (the regular
expression represents which branches should be built). The work required
for enabling testing new branches becomes minimal, and furthermore we
would have a great deal of confidence that job configurations across
branches remain identical.


I will verify these new jenkinsfiles work before deprecating the old
definitions in zorg
2026-02-11 08:19:25 -08:00
Delaram Talaashrafi
208b4dbf12 [flang][optimize] Use ArraySectionAnalyzer to better handle aliasing sections (#180595)
When alias analysis reports potential aliasing between LHS and RHS when
inlining `hlfir.assign`, use `ArraySectionAnalyzer` to determine if the
sections are disjoint or identical, which is safe for element-wise
assignment.

Co-authored-by: Delaram Talaashrafi <dtalaashrafi@rome5.pgi.net>
2026-02-11 07:59:38 -08:00
Matthias Springer
c6964b1b4d [mlir][IR] DenseElementsAttr: Remove i1 dense packing special case (#180397)
`DenseElementsAttr` stores elements in a `ArrayRef<char>` buffer, where
each element is padded to a full byte. Before this commit, there used to
be a special storage format for `i1` elements: they used to be densely
packed, i.e., 1 bit per element. This commit removes the dense packing
special case for `i1`.

This commit removes complexity from `DenseElementsAttr`. If dense
packing is needed in the future it could be implemented in a general way
that works for all element types (based on #179122).

Discussion:
https://discourse.llvm.org/t/denseelementsattr-i1-element-type/62525
2026-02-11 15:56:08 +00:00
Karl Friebel
0776af16c5 [Clang] [Sema] Fix FixIt for implicit-int diagnostics. (#179356)
When encountering a declaration without a type specifier, in contexts
where they could reasonably be assumed to default to int, clang emits a
diagnostic with FixIt. This FixIt does not produce working code.

This patch updates `SemaType` to correctly insert a single int type
specifier per group of declarations, and adds coverage in the FixIt lit test suite.

Fixes #179354
2026-02-11 15:47:36 +00:00
David Spickett
a2149815cc [lldb] Move parts of OutputFormattedUsageText into utility function (#180947)
As seen in #177570, this code has a bunch of corner cases, does not
handle ANSI codes properly and does not handle unicode at all. That's
enough to fix that we need some tests to make it clear where we're
starting from.

The body of OutputFormattedUsageText is moved into a utility in the
AnsiTerminal.h header and tests added to the existing
AnsiTerminalTest.cpp.

Some results are known to be wrong. Some that cause crashes are
commented out, to be enabled once fixed.
2026-02-11 15:43:40 +00:00
Florian Hahn
54177e95d1 [Matrix] Use tiled loops automatically for large kernels. (#179325)
Update LowerMatrixIntrinsics to use tiled loops automatically in for
larger matrixes. The fully unrolled codegen creates a huge amount of
code, which performs noticably worse then the tiled loop nest variant.

We new try to estimate the number of instructions needed for the
multiply, and if it is too large, tiled loops are used. The current
threshold is anything roughly larger than 6x6x6 double multiply.

Eventually I think we want to only generate tiled loops. This patch is a
first step, trying to opt in for cases where we know it is beneficial.
Checked on AArch64, but should help on other architectures similarly,
and also drastically reduce binary size + compile time.

PR: https://github.com/llvm/llvm-project/pull/179325
2026-02-11 15:36:34 +00:00
Frank Schlimbach
f5e5745e86 [mlir][shard, mpi] Allow more than one last axis to be "unsplit" (#180754)
A resharding pattern allowed only a single trailing axis to be
"unsplit".
This PR allows multiple trailing axes to be "unsplit".
2026-02-11 16:31:29 +01:00
Aiden Grossman
768cc03cee [ProfCheck Add WinEH Tests to XFail List
This pass recently had NewPM coverage added which means we now can see
profcheck issues with the pass. Disable it for now until we can get it
fixed, although its not crucial for anything given it is only run for
32-bit X86 Windows.
2026-02-11 14:31:45 +00:00
Eugene Epshteyn
a58268a77c [flang][NFC] Converted five tests from old lowering to new lowering (part 16) (#180866)
Tests converted from test/Lower: fail_image.f90,
test/Lower/forall: array-constructor.f90, array-pointer.f90,
array-subscripts.f90, character-1.f90
2026-02-11 09:28:59 -05:00
Charles Zablit
d7e5a7dacd [lldb-dap][windows] drain the ConPTY before attaching (#180578)
Add a step to drain the init sequences emitted by the ConPTY before
attaching it to the debuggee.

A ConPTY (PseudoConsole) emits init sequences which flush the screen and
contain the name of the program (ESC[2J for clear screen, ESC[H for
cursor home and more). It's not desirable to filter them out: if a
debuggee also emits them, lldb would filter that output as well. To work
around this, the ConPTY is drained by attaching a dummy process to it,
consuming the init sequences and then attaching the actual debuggee.

---------

Co-authored-by: Nerixyz <nero.9@hotmail.de>
2026-02-11 14:17:17 +00:00
Matt Arsenault
0b0dca5668 clang/AMDGPU: Do not look for rocm device libs if environment is llvm (#180922)
clang/AMDGPU: Do not look for rocm device libs if environment is llvm

Introduce usage of the llvm environment type. This will be useful as
a switch to eventually stop depending on externally provided libraries,
and only take bitcode from the resource directory.

I wasn't sure how to handle the confusing mess of -no-* flags. Try
to handle them all. I'm not sure --no-offloadlib makes sense for OpenCL
since it's not really offload, but interpret it anyway.
2026-02-11 15:16:26 +01:00
Joseph Huber
6d6feb7655 [libc] Add RPC helpers for dispatching functions to the host (#179085)
Summary:
The RPC interface is useful for forwarding functions. This PR adds
helper functions for doing a completely bare forwarding of a function
from the client to the server. This is intended to facilitate
heterogenous libraries that implement host functions on the GPU (like
MPI or Fortran).
2026-02-11 08:13:52 -06:00
Steven Perron
3f73f839e2 [HLSL] Implement Sample* methods for Texture2D (#179322)
This commit implement the methods:

- SampleBias
- SampleCmp
- SampleCmpLevelZero
- SampleGrad
- SampleLevel

They are added to the Texture2D resource type. All overloads except for
those with the `status` argument.

Part of https://github.com/llvm/llvm-project/issues/175630

Assisted-by: Gemini

---------

Co-authored-by: Helena Kotas <hekotas@microsoft.com>
2026-02-11 09:12:22 -05:00
David Spickett
4677fc3d83 Revert "[lldb] Step over non-lldb breakpoints" (#180944)
Reverts llvm/llvm-project#174348 due to reported failures on MacOS and
Arm 32-bit Linux.
2026-02-11 14:03:01 +00:00
Nikita Popov
a8f211904a [ExpandIRInsts] Support saturating fptoi (#179710)
Add support for expanding fptosi.sat and fptoui.sat via IR expansions.
Similar to fptosi/fptoui we would get legalization errors otherwise.

The previous expansion for fptosi/fptoui was already saturating -- but
those instructions do not actually require saturation, and the
implementation of the saturation was incorrect in lots of ways. What
this PR does is:

* For fptosi, remove the unnecessary saturation handling.
* For fptoui, remove the unnecessary saturation handling and sign
multiplication.
* For fptosi, use the previous saturation handling with fixes: We need
to map NaNs to 0 and the saturation condition on the exponent was
incorrect. (I'm performing the NaN check via fcmp -- there's no
requirement to do everything bitwise here.)
* For fptoui use a variation of the signed saturation handling: Negative
values need to go to zero and we saturate to unsigned max.

Proofs: https://alive2.llvm.org/ce/z/Xv9FNd
2026-02-11 14:51:30 +01:00
Eugene Epshteyn
98fcc11d1a [flang][NFC] Converted five tests from old lowering to new lowering (part 17) (#180869)
Tests converted from test/Lower: goto-do-body.f90, mixed_loops.f90,
while_loop.f90
From test/Lower/forall: degenerate.f90, forall-2.f90
2026-02-11 08:50:34 -05:00
Tomer Shafir
5456d6352a [AArch64] Lower factor-of-2 interleaved stores to STNP (#177938)
This patch prioritizes lowering to `stnp` over `st2` store instructions
marked !nontemporal.

From performance perspective, we should conservatively prioritize STNP
lowering for non-temporal stores, because currently NT stores requires
explicit usage of `__builtin_nontemporal_store()` intrinsic, so I think
its reasonable to assume the developer explicitly intends to optimize
D-cache usage of some hot non-temporal execution. He can rollback if it
doesnt help.

The cost here is it adds a few instructions for code size (thus we
predicate when not optimizing for code size), few extra fast
instructions to execute, few extra short dep chains - should be commonly
handled by OOO execution, I-cache alignment effects, few extra
registers. In the future we can may be able to approximate a cost model
to select by.

The patch implements an AArch64 specific static function to model what
NT stores are directly legal on the backend currently, and
`AArch64TargetLowering::lowerInterleavedStore` to conditionally skip st2
lowering.
2026-02-11 15:37:13 +02:00
Matt Arsenault
6af11dba3c clang/AMDGPU: Remove dead code in RocmInstallationDetector (#180920)
The defaulted constructor argument isn't used anywhere, so
this path is unreachable.
2026-02-11 14:29:26 +01:00
Charles Zablit
0ec4aa51dd [lldb][windows] switch to using std::string instead of std::wstring in Python setup (#180786)
This patch changes the return type of methods returning `std:wstring` to
`std::string` in `PythonPathSetup.cpp`.

This follows lldb's style of converting to `std::wstring` at the last
moment.
2026-02-11 14:23:38 +01:00
Brian Cain
99c9e5ebd6 [Hexagon] Fix signed constant creation in EmitVAArgFromMemory (#180385)
Use ConstantInt::getSigned instead of ConstantInt::get when creating a
negative alignment mask in EmitVAArgFromMemory. This is the same fix as
commit 8546294db9 (PR #176115) which addressed the issue in
EmitVAArgForHexagonLinux.

Added a test case that exercises the EmitVAArgFromMemory alignment path
using a struct that is both >8 bytes (to trigger EmitVAArgFromMemory)
and has 8-byte alignment (to trigger the alignment masking code).
2026-02-11 07:19:43 -06:00
Juan Manuel Martinez Caamaño
7f2b875c7d [SPIRV] Replace SPIRVType with SPIRVTypeInst as much as we can (#180721)
Second part of https://github.com/llvm/llvm-project/pull/179947 where we
use `SPIRVTypeInst` as much as we can.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-11 14:05:08 +01:00
Jordan Rupprecht
4a2a450cb7 [bazel][lldb] Port 304c680809 (#180931)
Co-authored-by: Pranav Kant <prka@google.com>
2026-02-11 07:02:17 -06:00
Alexey Bataev
54cdd903b8 [SLP]Skip operands comparing on non-matching (but compatible) instructions
If the instructions are compatible but non-matching (zext-select pair as
example), no need to perform operands analysis, just return that they
are matching.
2026-02-11 04:55:29 -08:00
Renato Golin
81e0de2ea5 [MLIR][Arith] FastMath extf conversion without NaN checks (#180926)
This PR allows the expand op converter to consider the NoNaN fastmath
attribute to disable the runtime checks for NaNs in E8M0 types. Default
behaviour is still the same.

The OCP document provides all-ones as NaN for E8M0, but for pre-MX I8
quantization, the checks for NaNs are prohibitively expensive,
especially if the hardware doesn't have native support for that type.
2026-02-11 12:46:07 +00:00
Younan Zhang
924f773f5e [Clang] Don't diagnose missing members when looking at the instantiating class template (#180725)
The perfect matching patch revealed another bug where recursive
instantiations could lead to the escape of SFINAE errors, as shown in
the issue.

Fixes https://github.com/llvm/llvm-project/issues/179118
2026-02-11 20:42:11 +08:00
Twice
972aa597de [MLIR][Python] Make traits declarative in python-defined operations (#180748)
This will support two syntax in python-defined dialects.

First is that traits can now be declared in class parameters, e.g.
```python
class ParentIsIfTrait(DynamicOpTrait): #define a python-side trait
    @staticmethod
    def verify_invariants(op) -> bool:
        if not isinstance(op.parent.opview, IfOp):
            op.location.emit_error(
                f"{op.name} should be put inside {IfOp.OPERATION_NAME}"
            )
            return False
        return True

class YieldOp( # attach two traits: IsTerminatorTrait, ParentIsIfTrait
    TestRegion.Operation, name="yield", traits=[IsTerminatorTrait, ParentIsIfTrait]
):
    ...
```

Second is that users can directly define
`verify_invariants`/`verify_region_invariants` methods in the operation
to add additional custom verification logic. And this is implemented via
traits.
```python
class YieldOp(TestRegion.Operation, name="yield", ...):
    value: Operand[Any]

    def verify_invariants(self) -> bool: # define a method directly
        if self.parent.results[0].type != self.value.type:
            self.location.emit_error(
                "result type mismatch between YieldOp and its parent IfOp"
            )
            return False
        return True
```

Previously we use `verify`/`verify_region` as method names (in
yesterday's PR #179705), but in this PR they are renamed to
`verify_invariants`/`verify_region_invariants` because there are
conflicts between the newly-added `verify` method and `ir.OpView.verify`
method:
- `verify_invariants` is just to attach **additional** verification
logic. but `OpView.verify` is to construct an OperationVerifer and do
full verification for an operation, so the semantics is not same between
these two. We should not shadow the `OpView.verify` method by defining a
new semantically-different `verify` method.
- it will make users confuse between these two `verify` methods, since
they have different meaning.
- if users didn't define the `verify` method in their python-defined
operation, `DynamicOpTraits.attach(opname, MyOpCls)` still do the
attaching (because `hasattr("verify")` returns `True`) and seg fault
(because we cannot attach `OpView.verify`).

---------

Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
2026-02-11 20:39:58 +08:00
LLVM GN Syncbot
9a07b1c84e [gn build] Port 5e5b799853 2026-02-11 12:19:44 +00:00
Minsoo Choo
5e5b799853 [lldb][NativeRegisterContext] Rename to x86 for shared files (#180624) 2026-02-11 07:19:34 -05:00
Luke Hutton
3123d9cb3a [mlir][tosa] Fix validation of dim op when reliant on datatype extension (#180915)
For example:
```
error: 'tosa.dim' op illegal: requires [bf16, shape] but not included in the profile compliance [shape]

    %0 = tosa.dim %arg0 {axis = 4 : i32} : (tensor<4x5x8x8x6x4xbf16>) -> !tosa.shape<1>
```
Here dim requires support to be declared for the BF16 and SHAPE
extensions, but only SHAPE was specified in the op declaration.
2026-02-11 12:15:14 +00:00
Florian Hahn
2dcf858ba0 [LAA] Use SCEVPtrToAddr in tryToCreateDiffChecks. (#178861)
The checks created by LAA only compute a pointer difference and do not
need to capture provenance. Use SCEVPtrToAddr instead of SCEVPtrToInt
for computations.

To avoid regressions while parts of SCEV are migrated to use PtrToAddr
this adds logic to rewrite all PtrToInt to PtrToAddr if possible in the
created expressions. This is needed to avoid regressions.

Similarly, if in the original IR we have a PtrToInt, SCEVExpander tries
to re-use it if possible when expanding PtrToAddr.

Depends on https://github.com/llvm/llvm-project/pull/178727.

Fixes https://github.com/llvm/llvm-project/issues/156978.

PR: https://github.com/llvm/llvm-project/pull/178861
2026-02-11 11:51:51 +00:00
Nikolas Klauser
064694160f [libc++] Rewrite the std::pop_heap benchmark (#179911)
Testing a bunch of random types has relatively little value. This
reduces the number of benchmarks so we can run them on a regular basis.
This saves ~90 seconds when running the benchmarks.
2026-02-11 12:47:33 +01:00