This is the first part of the effort to make parsing of clause modifiers
more uniform and robust. Currently, when multiple modifiers are allowed,
the parser will expect them to appear in a hard-coded order.
Additionally, modifier properties (such as "ultimate") are checked
separately for each case.
The overall plan is
1. Extract all modifiers into their own top-level classes, and then
equip them with sets of common properties that will allow performing the
property checks generically, without refering to the specific kind of
the modifier.
2. Define a parser (as a separate class) for each modifier.
3. For each clause define a union (std::variant) of all allowable
modifiers, and parse the modifiers as a list of these unions.
The intent is also to isolate parts of the code that could eventually be
auto-generated.
OpenMP modifier overhaul: #1/3
This PR uses the existing lifetime analysis for the `capture_by`
attribute.
The analysis is behind `-Wdangling-capture` warning and is disabled by
default for now. Once it is found to be stable, it will be default
enabled.
Planned followup:
- add implicit inference of this attribute on STL container methods like
`std::vector::push_back`.
- (consider) warning if capturing `X` cannot capture anything. It should
be a reference, pointer or a view type.
- refactoring temporary visitors and other related handlers.
- start discussing `__global` vs `global` in the annotation in a
separate PR.
---------
Co-authored-by: Boaz Brickner <brickner@google.com>
convertFixedMaskToScalableVector expects the mask input to honour the
BoolContents scheme employed by the target. For AArch64 this means a
mask should be zero or all ones, and thus when promoting a mask we must
use a sign extend.
Below is the error message for reference.
/llvm-project/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp: In static member
function 'static llvm::SmallVector<mlir::AffineMap>
mlir::linalg::MatmulOp::getDefaultIndexingMaps(mlir::MLIRContext*)':
/llvm-project/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp:3468:10: error:
could not convert 'indexingMaps' from 'SmallVector<[...],3>' to
'SmallVector<[...],6>'
3468 | return indexingMaps;
| ^~~~~~~~~~~~
| |
| SmallVector<[...],3>
Here is the link to the failure.
https://lab.llvm.org/buildbot/#/builders/117/builds/3919
...
#116419 assumed that getSHUFPDImm incorrectly hardcoded the mask size to 4 (cut+pasta typo from getV4X86ShuffleImm).
Waiting on reduced test case from @metaflow
This dissuades contributors from using contractions when writing
diagnostic wording for Clang. Contractions should be avoided because of
the potential for visual confusion with single quoting syntactic
constructs and because they can be harder to understand for non-native
English speakers.
Following #116547, this changes the result of `ARMISD::CMPFP*` and the
operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type.
This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.
Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is
going to be changed in a separate patch.
This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a
negative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV`
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.
There might be a case when a copy can't be avoided (although not found
in existing tests). If a copy is necessary, the virtual register will be
created with `cl_FPSCR_NZCV` register class. If this register class is
inappropriate, `TRI::getCrossCopyRegClass` should be modified to return
the correct class.
Pull Request: https://github.com/llvm/llvm-project/pull/116676
Rename R_X86_64_REX2_GOTPCRELX to R_X86_64_CODE_4_GOTPCRELX, to align
with GCC/binutils and ABI.
GCC/binutils:
3d5a60de52
and
4a54cb0658
ABI:
357de358ba
* `MRS`, `PTEST` and FP comparisons were missing "flags" result, and
were sometimes created with invalid types (f32, Glue, Other).
* `REV16`, `REV32`, `REV64`, and `CMGEz` were sometimes created with an
extra operand.
* `TLSDESC_CALLSEQ` had `SDNPInGlue` property, but the node was never
created with a glue operand.
Here is the [merged
MR](https://github.com/llvm/llvm-project/pull/116007) which caused a
failure and [was
reverted](https://github.com/llvm/llvm-project/pull/116811).
Thanks to @joker-eph for the help, I fix it (miss constructing
`ModuleObject` with callback functions in
`mlir/lib/Target/LLVM/NVVM/Target.cpp`) and split unit tests from origin
test which don't need `ptxas` to make the test runs more widely.
Fixes#116809.
After running some passes (SimpleLoopUnswitch, LoopInstSimplify, etc.),
MemorySSA might be outdated, and the instruction `I` may have become a
non-memory touching instruction.
LICM has already handled this, but it does not pass
`CreationMustSucceed=false` to `createDefinedAccess`.
Ensure we use the SchedWriteVecMoveLSNT class for all (V)MOVNTDQA instructions, remove unnecessary scheduler overrides and adjust resource pipe usage to match uops.info/Agner numbers
As suggested by Craig, this tries to merge the two sets of register
classes created in #112983, GPRPair* and GPRF64Pair*.
- I added some explicit annotations to `RISCVInstrInfoD.td` which fixed
the type inference issues I was seeing from tablegen for select
patterns.
- I've had to make the behaviour of `splitValueIntoRegisterParts` and
`joinRegisterPartsIntoValue` cover more cases, because you cannot
bitcast to/from untyped (the bitcast would otherwise have been inserted
automatically by TargetLowering code).
- I apparently didn't need to change `getNumRegisters` again, which
continues to tell me there's a bug in the code for tied inputs. I added
some more test coverage of this case but it didn't seem to help find the
asserts I was finding before - I think the difference is between the
default behaviour for integers which doesn't apply to floats.
- There's still a difference between BuildGPRPair and BuildPairF64 (and
the same for SplitGPRPair and SplitF64). I'm not happy with this, I
think it's quite confusing, as they're very similar, just differing in
whether they give a `untyped` or a `f64`. I haven't really worked out
how the DAGCombiner copes if one meets the other, I know we have some of
this for the f64 variants already, but they're a lot more complex than
the GPRPair variants anyway.
Since cd16b07 (IR: introduce CmpInst::isEquivalence), there is now an
isEquivalence routine in CmpInst that we can use to determine
equivalence in foldSelectValueEquivalence. Implement this, extending it
to include floating-point equivalences as well.
When spilling a VGPR in `emitPrologue`, chain functions prefer to use
offsets to access the stack instead of the SP.
This patch fixes `emitEpilogue` to do the same. It also brings back some
test coverage that was lost in #93526, when WWM registers started being
shifted to the lowest available range (which meant that tests that were
originally spilling v8 would shift to spill v0, which is a scratch
register for chain functions and didn't get spilled).
Change-Id: Icb07fccd859b563cd45f74c25ae578ecb38bdeeb
Set the maximum interleaving factor to 4, aligning with the number of available
SIMD pipelines. This increases the number of vector instructions in the vectorised
loop body, enhancing performance during its execution. However, for very low
iteration counts, the vectorised body might not execute at all, leaving only the
epilogue loop to run. This issue affects e.g. cam4_r from SPEC FP, which
experienced a performance regression. To address this, the patch reduces the
minimum epilogue vectorisation factor from 16 to 8, enabling the epilogue to be
vectorised and largely mitigating the regression.
De-duplicate the functions getSignedPredicate and getUnsignedPredicate,
nearly identical versions of which were present in CmpInst and ICmpInst,
creating less confusion.
This commit refactors part of the code in preparation for the migration
of other *matmul* variants from OpDSL to ODS.
Moves getDefaultIndexingmaps() helper into the MatmulOp class.
For the situation where scope is implemented to 5.1 standard, check that
the 5.2 are still "not yet implemented" (or some other partial
implementation).
This prepares for using the DECLARE MAPPER construct.
A check in lowering will say "Not implemented" when trying to use a
mapper as some code is required to tie the mapper to the declared one.
Senantics check for the symbol generated.
This attempts to clean up and improve where we generate smull/umull
using known-bits. For v2i64 types (where no mul is present), we try to
create mull more aggressively to avoid scalarization.
When compiling LLVM with -std=c++ instead of -std=gnu we'd fail to detect many newer POSIX functions.
We define it for the whole of LLVM anyway so moving the definition to the top fixes detection of a bunch of these on such setups.
Keeping it at the top also avoids accidentally introducing new dependent checks before it being defined.
Certain tests (many are from lld/test) run `... '2>&1 | count 0` to
ensure that there is no stderr message.
GetRegistersAndSP may rarely fail, leading to
a spurious failure like (with a local hack to make `count` dump the
input):
```
+ /home/ray/llvm/out/asan/bin/ld.lld func1-gcs.o func2-gcs.o func3-gcs.o -o /dev/null -z gcs-report=warning -z gcs=never
+ /home/ray/llvm/out/asan/bin/count 0
Expected 0 lines, got 1.
==2403039==Unable to get registers from thread 2403018.
```
The failure can reliably be reproduced by running `ninja check-lld` a
few times under asan+lsan (see the bot
sanitizer-x86_64-linux-bootstrap-asan).
bogner is listed as the current SDAG maintainer, but mostly works on
DirectX nowadays and isn't directly involved with SDAG work anymore.
Add RKSimon and topperc as new SelectionDAG maintainers.
I'd like to use the name CaptureInfo to represent the new attribute
proposed at
https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420,
but it's already taken by AA, and I can't think of great alternatives
(CaptureEffects would be something of a stretch).
As such, I'd like to rename CaptureInfo -> CaptureAnalysis in AA, which
also seems like the more accurate terminology.
TF32 is a variant of F32 that is truncated to 19 bits. There used to be
special handling in `FloatType::getWidth()` so that TF32 was treated as
a 32-bit float in some places. (Some places use `FloatType::getWidth`,
others directly query the `APFloat` semantics.) This caused problems
because `FloatType::getWidth` did not agree with the underlying
`APFloat` semantics.
In particular, creating an elements attr / array attr with `tf32`
element type crashed. E.g.:
```
"foo"() {attr = dense<4.0> : tensor<tf32>} : () -> ()
mlir-opt: llvm-project/llvm/lib/Support/APFloat.cpp:4108: void llvm::detail::IEEEFloat::initFromAPInt(const fltSemantics *, const APInt &): Assertion `api.getBitWidth() == Sem->sizeInBits' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
```
```
"foo"() {f32attr = array<tf32: 1024.>} : () -> ()
mlir-opt: llvm-project/mlir/lib/AsmParser/AttributeParser.cpp:847: void (anonymous namespace)::DenseArrayElementParser::append(const APInt &): Assertion `data.getBitWidth() % 8 == 0' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
```
It is unclear why the special handling for TF32 is needed. For
reference: #107372
Use uppercase in the subvector description ("32x2" -> "32X4" etc.) - matches what we already do in VBROADCAST??X?, and we try to use uppercase for all x86 instruction mnemonics anyway (and lowercase just for the arg description suffix).
We got a bug report that this message is confusing. In this particular
case, the line zero was due to compiler tail merging (in optimized
code). The main issue was the "no source code" part: in this case it's
kind of incorrect because -- even though we can't really know that --
the address is arguably associated with *multiple* lines of source code.
I've tried to make the new wording more neutral, and added a wink
towards compiler optimizations. I left out the "compiler generated" part
of the message because I couldn't find a way to squeeze that in nicely.
I'm also not entirely sure what it was referring to -- if this was
(just) function prologue/epilogue, then maybe leaving it out is fine, as
we're not likely to stop there anyway (?)
I also left out the function name, because:
- for template functions it gets rather long
- it's already present in the message, potentially twice (once in the
"frame summary" line and once in the snippet of code we show for the
function declaration)
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Updates one example so that:
* it uses `vector.mask`,
* upper loop bound is a multiple of the loop step,
* use `vector.outerproduct` instead of "test.some_computation".
This makes this example a bit closer to realistic cases, which has
always been the goal for this test.