Previously, TheImageKind was set to IMG_None and relied on a runtime
heuristic to determine the correct image type. This commit sets it
explicitly to IMG_Object for AOT-compiled images and IMG_SPIRV for
SPIR-V images based on the IsAOTCompileNeeded flag.
Also it adds test for this change, which required minor changes in
OffloadBinary and OffloadDump.
We're trying to get the time it takes to run all the benchmarks down, so
that we can run them on a regular basis. This patch saves us ~18 minutes
per run.
We're trying to get the time it takes to run all the benchmarks down, so
that we can run them on a regular basis. This patch saves us ~80 seconds
per run.
Testing a bunch of range sizes has relatively little value. This reduces
the number of benchmarks so we can run them on a regular basis. This
saves ~10 minutes when running benchmarks.
Fixes#179698
This PR enables the `LIBCPP_SHARED_PTR_DEFINE_LEGACY_INLINE_FUNCTIONS`
macro on AIX because the functions guarded by this macro are required for
backward compatibility.
Unify the ad-hoc use of whitespace in `LLVM_DEBUG()` messages.
This approach should also make it easier to see which loop debug
messages correspond to and which part of the loop unrolling heuristics
each message corresponds to.
Fixes#151786
The original `ceilf` expansion lowers to `fptosi`, which produces poison
for Inf, and any subsequent use leads to undefined behavior. This patch
adds a safe path, similar to the existing `round` expansion, for large
or special inputs and avoids the UB.
This addresses the following warning when PYTHON_EXECUTABLE is not set
in the host build:
```bash
CMake Warning:
Manually-specified variables were not used by the project:
PYTHON_EXECUTABLE
```
Reference:
https://github.com/llvm/llvm-project/pull/163574
Since #180397, all elements of a `DenseIntOrFPElementsAttr` are padded
to full bytes. This enables additional simplifications: whether a
`DenseIntOrFPElementsAttr` is a splat or not can now be inferred from
the size of the buffer. This was not possible before because a single
byte sometimes contained multiple `i1` elements.
Discussion:
https://discourse.llvm.org/t/denseelementsattr-i1-element-type/62525
Basically https://github.com/llvm/llvm-project/pull/168506 but for
riscv, so to be clear the hard work here is @heiher 's. I figured we may
as well get some extra eyeballs on this from riscv too.
Previously the riscv backend could not handle `musttail` calls with more
arguments than fit in registers, or any explicit `byval` or `sret`
parameters/return values. Those have now been implemented.
This is part of my push to get more LLVM backends to support `byval` and
`sret` parameters so that rust can stabilize guaranteed tail call
support. See also:
- https://github.com/llvm/llvm-project/pull/168956
- https://github.com/rust-lang/rust/issues/148748
---------
Co-authored-by: WANG Rui <wangrui@loongson.cn>
Add CI job definitions using our new templated pipelines to
llvm-project, this way we can enable multi branch pipelines which
trigger for changes on a given branch.
By storing the Jenkinfile definitions in llvm-project, we gain the
benefit of enabling Jenkins multi branch pipelines. This means in the
future, expanding a job configuration to build with a new branch is as
simple as updating a regular expression in Jenkins (the regular
expression represents which branches should be built). The work required
for enabling testing new branches becomes minimal, and furthermore we
would have a great deal of confidence that job configurations across
branches remain identical.
I will verify these new jenkinsfiles work before deprecating the old
definitions in zorg
When alias analysis reports potential aliasing between LHS and RHS when
inlining `hlfir.assign`, use `ArraySectionAnalyzer` to determine if the
sections are disjoint or identical, which is safe for element-wise
assignment.
Co-authored-by: Delaram Talaashrafi <dtalaashrafi@rome5.pgi.net>
`DenseElementsAttr` stores elements in a `ArrayRef<char>` buffer, where
each element is padded to a full byte. Before this commit, there used to
be a special storage format for `i1` elements: they used to be densely
packed, i.e., 1 bit per element. This commit removes the dense packing
special case for `i1`.
This commit removes complexity from `DenseElementsAttr`. If dense
packing is needed in the future it could be implemented in a general way
that works for all element types (based on #179122).
Discussion:
https://discourse.llvm.org/t/denseelementsattr-i1-element-type/62525
When encountering a declaration without a type specifier, in contexts
where they could reasonably be assumed to default to int, clang emits a
diagnostic with FixIt. This FixIt does not produce working code.
This patch updates `SemaType` to correctly insert a single int type
specifier per group of declarations, and adds coverage in the FixIt lit test suite.
Fixes#179354
As seen in #177570, this code has a bunch of corner cases, does not
handle ANSI codes properly and does not handle unicode at all. That's
enough to fix that we need some tests to make it clear where we're
starting from.
The body of OutputFormattedUsageText is moved into a utility in the
AnsiTerminal.h header and tests added to the existing
AnsiTerminalTest.cpp.
Some results are known to be wrong. Some that cause crashes are
commented out, to be enabled once fixed.
Update LowerMatrixIntrinsics to use tiled loops automatically in for
larger matrixes. The fully unrolled codegen creates a huge amount of
code, which performs noticably worse then the tiled loop nest variant.
We new try to estimate the number of instructions needed for the
multiply, and if it is too large, tiled loops are used. The current
threshold is anything roughly larger than 6x6x6 double multiply.
Eventually I think we want to only generate tiled loops. This patch is a
first step, trying to opt in for cases where we know it is beneficial.
Checked on AArch64, but should help on other architectures similarly,
and also drastically reduce binary size + compile time.
PR: https://github.com/llvm/llvm-project/pull/179325
This pass recently had NewPM coverage added which means we now can see
profcheck issues with the pass. Disable it for now until we can get it
fixed, although its not crucial for anything given it is only run for
32-bit X86 Windows.
Add a step to drain the init sequences emitted by the ConPTY before
attaching it to the debuggee.
A ConPTY (PseudoConsole) emits init sequences which flush the screen and
contain the name of the program (ESC[2J for clear screen, ESC[H for
cursor home and more). It's not desirable to filter them out: if a
debuggee also emits them, lldb would filter that output as well. To work
around this, the ConPTY is drained by attaching a dummy process to it,
consuming the init sequences and then attaching the actual debuggee.
---------
Co-authored-by: Nerixyz <nero.9@hotmail.de>
clang/AMDGPU: Do not look for rocm device libs if environment is llvm
Introduce usage of the llvm environment type. This will be useful as
a switch to eventually stop depending on externally provided libraries,
and only take bitcode from the resource directory.
I wasn't sure how to handle the confusing mess of -no-* flags. Try
to handle them all. I'm not sure --no-offloadlib makes sense for OpenCL
since it's not really offload, but interpret it anyway.
Summary:
The RPC interface is useful for forwarding functions. This PR adds
helper functions for doing a completely bare forwarding of a function
from the client to the server. This is intended to facilitate
heterogenous libraries that implement host functions on the GPU (like
MPI or Fortran).
This commit implement the methods:
- SampleBias
- SampleCmp
- SampleCmpLevelZero
- SampleGrad
- SampleLevel
They are added to the Texture2D resource type. All overloads except for
those with the `status` argument.
Part of https://github.com/llvm/llvm-project/issues/175630
Assisted-by: Gemini
---------
Co-authored-by: Helena Kotas <hekotas@microsoft.com>
Add support for expanding fptosi.sat and fptoui.sat via IR expansions.
Similar to fptosi/fptoui we would get legalization errors otherwise.
The previous expansion for fptosi/fptoui was already saturating -- but
those instructions do not actually require saturation, and the
implementation of the saturation was incorrect in lots of ways. What
this PR does is:
* For fptosi, remove the unnecessary saturation handling.
* For fptoui, remove the unnecessary saturation handling and sign
multiplication.
* For fptosi, use the previous saturation handling with fixes: We need
to map NaNs to 0 and the saturation condition on the exponent was
incorrect. (I'm performing the NaN check via fcmp -- there's no
requirement to do everything bitwise here.)
* For fptoui use a variation of the signed saturation handling: Negative
values need to go to zero and we saturate to unsigned max.
Proofs: https://alive2.llvm.org/ce/z/Xv9FNd
This patch prioritizes lowering to `stnp` over `st2` store instructions
marked !nontemporal.
From performance perspective, we should conservatively prioritize STNP
lowering for non-temporal stores, because currently NT stores requires
explicit usage of `__builtin_nontemporal_store()` intrinsic, so I think
its reasonable to assume the developer explicitly intends to optimize
D-cache usage of some hot non-temporal execution. He can rollback if it
doesnt help.
The cost here is it adds a few instructions for code size (thus we
predicate when not optimizing for code size), few extra fast
instructions to execute, few extra short dep chains - should be commonly
handled by OOO execution, I-cache alignment effects, few extra
registers. In the future we can may be able to approximate a cost model
to select by.
The patch implements an AArch64 specific static function to model what
NT stores are directly legal on the backend currently, and
`AArch64TargetLowering::lowerInterleavedStore` to conditionally skip st2
lowering.
This patch changes the return type of methods returning `std:wstring` to
`std::string` in `PythonPathSetup.cpp`.
This follows lldb's style of converting to `std::wstring` at the last
moment.
Use ConstantInt::getSigned instead of ConstantInt::get when creating a
negative alignment mask in EmitVAArgFromMemory. This is the same fix as
commit 8546294db9 (PR #176115) which addressed the issue in
EmitVAArgForHexagonLinux.
Added a test case that exercises the EmitVAArgFromMemory alignment path
using a struct that is both >8 bytes (to trigger EmitVAArgFromMemory)
and has 8-byte alignment (to trigger the alignment masking code).
If the instructions are compatible but non-matching (zext-select pair as
example), no need to perform operands analysis, just return that they
are matching.
This PR allows the expand op converter to consider the NoNaN fastmath
attribute to disable the runtime checks for NaNs in E8M0 types. Default
behaviour is still the same.
The OCP document provides all-ones as NaN for E8M0, but for pre-MX I8
quantization, the checks for NaNs are prohibitively expensive,
especially if the hardware doesn't have native support for that type.
The perfect matching patch revealed another bug where recursive
instantiations could lead to the escape of SFINAE errors, as shown in
the issue.
Fixes https://github.com/llvm/llvm-project/issues/179118
This will support two syntax in python-defined dialects.
First is that traits can now be declared in class parameters, e.g.
```python
class ParentIsIfTrait(DynamicOpTrait): #define a python-side trait
@staticmethod
def verify_invariants(op) -> bool:
if not isinstance(op.parent.opview, IfOp):
op.location.emit_error(
f"{op.name} should be put inside {IfOp.OPERATION_NAME}"
)
return False
return True
class YieldOp( # attach two traits: IsTerminatorTrait, ParentIsIfTrait
TestRegion.Operation, name="yield", traits=[IsTerminatorTrait, ParentIsIfTrait]
):
...
```
Second is that users can directly define
`verify_invariants`/`verify_region_invariants` methods in the operation
to add additional custom verification logic. And this is implemented via
traits.
```python
class YieldOp(TestRegion.Operation, name="yield", ...):
value: Operand[Any]
def verify_invariants(self) -> bool: # define a method directly
if self.parent.results[0].type != self.value.type:
self.location.emit_error(
"result type mismatch between YieldOp and its parent IfOp"
)
return False
return True
```
Previously we use `verify`/`verify_region` as method names (in
yesterday's PR #179705), but in this PR they are renamed to
`verify_invariants`/`verify_region_invariants` because there are
conflicts between the newly-added `verify` method and `ir.OpView.verify`
method:
- `verify_invariants` is just to attach **additional** verification
logic. but `OpView.verify` is to construct an OperationVerifer and do
full verification for an operation, so the semantics is not same between
these two. We should not shadow the `OpView.verify` method by defining a
new semantically-different `verify` method.
- it will make users confuse between these two `verify` methods, since
they have different meaning.
- if users didn't define the `verify` method in their python-defined
operation, `DynamicOpTraits.attach(opname, MyOpCls)` still do the
attaching (because `hasattr("verify")` returns `True`) and seg fault
(because we cannot attach `OpView.verify`).
---------
Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
For example:
```
error: 'tosa.dim' op illegal: requires [bf16, shape] but not included in the profile compliance [shape]
%0 = tosa.dim %arg0 {axis = 4 : i32} : (tensor<4x5x8x8x6x4xbf16>) -> !tosa.shape<1>
```
Here dim requires support to be declared for the BF16 and SHAPE
extensions, but only SHAPE was specified in the op declaration.
The checks created by LAA only compute a pointer difference and do not
need to capture provenance. Use SCEVPtrToAddr instead of SCEVPtrToInt
for computations.
To avoid regressions while parts of SCEV are migrated to use PtrToAddr
this adds logic to rewrite all PtrToInt to PtrToAddr if possible in the
created expressions. This is needed to avoid regressions.
Similarly, if in the original IR we have a PtrToInt, SCEVExpander tries
to re-use it if possible when expanding PtrToAddr.
Depends on https://github.com/llvm/llvm-project/pull/178727.
Fixes https://github.com/llvm/llvm-project/issues/156978.
PR: https://github.com/llvm/llvm-project/pull/178861
Testing a bunch of random types has relatively little value. This
reduces the number of benchmarks so we can run them on a regular basis.
This saves ~90 seconds when running the benchmarks.