This PR resolves#96322 and implements the `signbit` macro under a new
header `generic-math-macros.h`. This also removed the `TODO` in
`math-macros.h` and moves `isfinite`, `isinf`, and `isnan` to the same
generic maths header. Finally, a test file
`generic-math-macros_test.cpp` that adds coverage to the above 4 macros.
Fixes#96322.
Add tests which are not safe to vectorize because %indices are loaded in
the loop and the same indices could be loaded in later iterations.
Tests for https://github.com/llvm/llvm-project/issues/87189.
This addresses the build error introduced in #98287 where
src/errno/errno.h is included instead of the system errno.h.
We instead move the declaration to libc_errno.h.
Rather than selecting the errno implementation based on the platform
which doesn't provide the necessary flexibility, make it configurable.
The errno value location is returned by `int *__llvm_libc_errno()` which
is a common design used by other C libraries.
This PR reworks HLSL's implicit conversion sequences. Initially I was
seeking to match DXC's behavior more closely, but that was leading to a
pile of special case rules to tie-break ambiguous cases that should
really be left as ambiguous. We've decided that we're going to break
compatibility with DXC here, and we may port this new behavior over to
DXC instead.
This change is a bit closer to C++'s overload resolution rules, but it
does have a bit of nuance around how dimension adjustment conversions
are ranked. Conversion sequence ranks for HLSL are:
* Exact match
* Scalar Widening (i.e. splat)
* Promotion
* Scalar Widening with Promotion
* Conversion
* Scalar Widening with Conversion
* Dimension Reduction (i.e. truncation)
* Dimension Reduction with Promotion
* Dimension Reduction with Conversion
In this implementation I've folded the disambiguation into the
conversion sequence ranks which does add some complexity as compared to
C++, however this avoids needing to add special casing in
`CompareStandardConversionSequences`. I believe the added conversion
rank values provide a simpler approach, but feedback is appreciated.
The HLSL language spec updates are in the PR here:
https://github.com/microsoft/hlsl-specs/pull/261
The new transformation folds `umin(cttz(x), c)` to `cttz(x | (1 << c))`
and `umin(ctlz(x), c)` to `ctlz(x | ((1 << (bitwidth - 1)) >> c))`. The
transformation is only implemented for constant `c` to not increase the
number of instructions.
The idea of the transformation is to set the c-th lowest (for `cttz`) or
highest (for `ctlz`) bit in the operand. In this way, the `cttz` or
`ctlz` instruction always returns at most `c`.
Alive2 proofs: https://alive2.llvm.org/ce/z/y8Hdb8Fixes#90000
`linalg.matmul` already has an attribute for casts, defaults to signed
but allowed unsigned, so the operation `linalg.matmul_unsigned` is
redundant. The generalization test has an example on how to lower to
unsigned matmul in linalg.
This is the first PR in a list of many that will simplify the linalg
operations by using similar attributes.
Ref:
https://discourse.llvm.org/t/rfc-transpose-attribute-for-linalg-matmul-operations/80092
This adds an `emitc.member` and `emitc.member_of_ptr` operation for the
corresponding member access operators. Furthermore, `emitc.assign` is
adjusted to be used with the member access operators.
Since Linux 4.7, RLIMIT_DATA may result in mmap() returning ENOMEM.
Example:
$ clang -fsanitize=address -o hello hello.c
$ ulimit -d 100000
$ ./hello
==3349007==ERROR: AddressSanitizer failed to allocate 0x10000000
(268435456) bytes at address 7fff7000 (errno: 12)
==3349007==ReserveShadowMemoryRange failed while trying to map
0x10000000 bytes. Perhaps you're using ulimit -v
Suggest checking ulimit -d in addition to ulimit -v.
Constructs like `__is_pointer(Foo)` are never considered to be functions
declarations.
This matches usages in libstdc++, and we can hope
no one else redefine these reserved identifiers.
Fixes#95598
Summary:
This patch implements the `printf` family of functions on the GPU using
the new variadic support. This patch adapts the old handling in the
`rpc_fprintf` placeholder, but adds an extra RPC call to get the size of
the buffer to copy. This prevents the GPU from needing to parse the
string. While it's theoretically possible for the pass to know the size
of the struct, it's prohibitively difficult to do while maintaining ABI
compatibility with NVIDIA's varargs.
Depends on https://github.com/llvm/llvm-project/pull/96015.
With ptrauth-calls, function pointers are supposed to be signed.
On Darwin that includes the TLS indirection accessor (`_tlv_get_addr`).
We simply sign it with the plain function-pointer schema (IA,0), which
lets us do a `blraaz` when calling it.
Note that this doesn't have any kind of diversity, even when function
pointer diversity is enabled in the frontend. On arm64e this accessor
is never signed that way, but the obvious alternative where this (or
another backend-generated) function pointer needs to be diversified
would need more than the "ptrauth-calls" attribute as it exists today.
Fix an issue in #97618 - if the two basic blocks involved are not
predecessor / successor to each other, treat the candidate as illegal
for critical edge splitting.
Closes#98477 (checked in test copied from its comment).
For DXIL which is based on llvm 3.7, max supported behavior flag for
module flags is 6.
The commit will check all module flags, for behavior flag > 6, change it
to 2 (Warning).
This is to fix the behavior flag part for #96912.
Moved sys yaml files into the sys folder.
After CMake patch lands, will make appropriate changes to account for
yaml, header, and .h.def files that are located within the sys folder in
a separate patch.
Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.
We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.
The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.
1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.
This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
armv8m builtins aren't being built because `armv8m` doesn't match any of
the arm cpu strings in compiler-rt cmake files. Instead there's
`armv8m.main` and `armv8m.base`. We want to use the `armv8m.main`
version.
Summary:
This patch explicitly disables runtime calls to be emitted from the
NVPTX backend. This allows other utilities to know that we do not need
to worry about emitting these.
This patch fixes a problem that existed before where in some situations
a `UCMP`/`SCMP` node which operated on 1-element vectors had a legal
result type (i.e. `v1i64` on AArch64), but illegal operands (i.e.
`v1i65`). This meant that operand scalarization was performed on the
node and the operands were changed to a legal scalar type, but the
result wasn't. This then led to `UCMP`/`SCMP` nodes with different
vector-ness of operands and result appearing in the SDAG. This patch
addresses this issue by fully scalarizing the `UCMP`/`SCMP` node and
then turning its result back into a 1-element vector using a
`SCALAR_TO_VECTOR` node.
It also adds several assertions to `SelectionDAG::getNode()` to avoid
this or a similar issue arising in the future. I wasn't sure if these
two changes are unrelated enough to warrant two small separate PRs, but
I'm happy to split this PR into two if that's deemed more appropriate.