This PR adds support for HIP on macOS: Mach-O section naming, Darwin
host toolchain initialization guards, and HIPSPV behavior when Darwin is
the host.
This has been verified using chipStar on MacOS via the PoCL OpenCL
implementation.
## Uninitialized target workaround
Darwin’s toolchain is only initialized when its own TranslateArgs runs.
For HIP/CUDA device jobs, Darwin is used as the HostTC and never gets
its args translated, so its target stays uninitialized. The new checks
avoid asserting on that uninitialized state. A better long-term fix is
to initialize Darwin earlier (see the FIXME in Driver.cpp
BuildJobsForAction).
- [ ] Initialize Darwin toolchain during construction instead of lazily
in TranslateArgs. See Driver.cpp BuildJobsForAction FIXME.
- [x] In Darwin’s addClangTargetOptions, skip host-stdlib flags when
DeviceOffloadKind != OFK_None so HIPSPV can safely delegate to the host.
`--icp=<value>`/`--indirect-call-promotion=<value>` results in an
`UNIMPLEMENTED` crash when invoked as it is unimplemented in AArch64.
- Guard IndirectCallPromotion for non-X86
- Update unsupported-passes.test with expected error
Fixes https://github.com/llvm/llvm-project/issues/18707
During fixit recompile, the frontend was not reapplying command-line
diagnostic options, so the second pass could lose -Wno-* suppressions
and other warning configuration.
Added regression test to make sure that diagnostic options are properly
applied in the fixit-recompile path.
Functions with both `alwaysinline` and `flatten` attributes were
collected into the `NeedFlattening` worklist, then erased during
always-inline processing, leaving dangling pointers. Fix by collecting
flatten functions after the always-inline loop, and eliminate the
separate worklist by iterating the module directly.
Converted Lower/user-defined-operators.f90,
Lower/variable-inquiries.f90, Lower/where-allocatable-assignments.f90,
Lower/where.f90, and Transforms/constant-argument-globalisation.fir from
legacy lowering (-hlfir=false / -flang-deprecated-no-hlfir) to new
lowering (-emit-hlfir or no flag for FIR-input tests).
The function `symbolizeAddresses` is used by debugify to symbolize
addresses captured in the current invocation of LLVM, which it does by
executing llvm-symbolizer with temporary input and output files.
Creating the temporary files has an explicit sandbox exclusion, as
temporary files are necessarily not part of the compiler's formal
output, but attempting to read back the output file via MemoryBuffer
triggers a sandbox violation. Since we are always only operating on
temporary files within symbolizeAddresses, this patch disables the IO
sandbox in that function.
A variant of https://github.com/llvm/llvm-project/pull/176253 with a
change to reduce compile-time impact.
Since "llvm_unreachable" is actually allowed in constexpr functions,
simply emit the bodies of the selected functions in the header file.
In the previous PR the `isAllowedClauseForDirective` function was made
constexpr, but since it was very long it had a significant impact on
compilation time. In this PR that function is no longer constexpr.
This fixes a bug in the CIR LoweringPrepare pass where we were creating
multiple constant initializer global values with the same name, causing
references to them (specifically cir.get_global) to get the wrong value.
Assisted-by: Cursor / claude-4.7-opus-xhigh
I observed a crash in device OpenMP lowering when compiling with
`-fdefault-integer-8`. In `targetParallelCallback`, `NumThreads` can be
`i64`, but `__kmpc_parallel_60` expects an `i32` `num_threads`
parameter, which caused a bad-signature assertion during call creation.
The fix is to use `CreateZExtOrTrunc(..., Int32)` for the `num_threads`
argument before building the runtime call. This matches the handling
used in clang in `CGOpenMPRuntimeGPU::emitParallelCall`.
The problem can be seen with the following testcase whe compiled with
`flang -fopenmp --offload-arch=gfx90a test.f90 -fdefault-integer-8``
```
program test
implicit none
integer :: nthreads
integer :: i
nthreads = 137
!$omp target teams distribute parallel do num_threads(nthreads)
do i = 1, 1
end do
!$omp end target teams distribute parallel do
end program test
```
Reland of #193837 (reverted in #193855), now using a marker op interface
to avoid the link cycle that broke `BUILD_SHARED_LIBS=ON` builds.
`SimplifyArrayCoorOp` folded `fir.rebox` into `fir.array_coor` across a
`cuf.kernel` boundary. CUF lowering needs the captured rebox to
materialize a managed-memory descriptor for the kernel; folding it away
makes the kernel dereference the host-side descriptor and crash with
`cudaErrorIllegalAddress`.
Fix is to add `fir::CUDAKernelOpInterface`, a marker op interface
defined in FIRDialect and implemented by `cuf.kernel`. The
canonicalization guard queries the interface, so the `TypeIDResolver`
symbol lives in `libFIRDialect.so` and no `FIR -> CUF` link edge is
introduced.
As per OpenMP 5.2/6.0 the below are valid device values in a `#pragma
omp target` directive:
omp_initial_device (-1) -> refers to the host CPU.
omp_invalid_device (-2) -> an intentionally invalid device, used to
trigger a runtime error.
For the 2 values discussed above flang fails with:
```
error: The device expression of the DEVICE clause must be a positive integer expression
!$OMP TARGET DEVICE(-1)
error: Must have INTEGER type, but is REAL(4)
!$OMP TARGET DEVICE(OMP_INVALID_DEVICE)
```
Issue: https://github.com/llvm/llvm-project/issues/192989
Replace WideCanonicalIV with a ScalarIVSteps over the CanonicalIV when
only the first lane is used. This is a preparatory step in enabling
expansion of WideCanonicalIV into executable recipes.
This PR is part of a series of patches upgrading Lit's in-process
built-ins to be able to run with piped input/output and full redirection
support, and to allow custom in-process builtns to be provided via the
Lit config. The remaining patches to Lit's test runner can be found here:
https://github.com/BStott6/llvm-project/compare/lit-inproc-builtins.
This is part of the Lit daemonized testing project:
https://discourse.llvm.org/t/88612
This PR makes Lit open all sub-processes with `text=False`, so that the
Python code will be able to read and write binary data to and from their
IO streams. This currently causes no functional change, as when Lit
reads output from the sub-processes, it already handles the case that
the read output is `bytes` by decoding it, but we will need to be able
to read binary data from a sub-process's STDIN if its output, which may
be binary, is piped into an in-process built-in, and we will need to be
able to write binary data to a sub-process's STDOUT if its input is
piped from an in-process builtin.
I have made sure that on Windows, when a sub-process invoked by Lit has
its output redirected to a file by Lit, the `\n -> \r\n` conversion is
performed as usual when writing to the file from the process - this
change only affects how the Python code interacts with the streams.
The CamelCase-to-hyphenated conversion was incorrectly splitting
"OpenMP" and "OpenACC" into "open-mp" and "open-acc", producing wrong -W
flag names like -Wopen-mp-usage instead of -Wopenmp-usage. Fix the
conversion to treat these as compound names, keep the old spellings as
deprecated aliases, and emit a warning when deprecated spellings are
used.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This PR is the second in a series of patches upgrading Lit's in-process
built-ins to be able to run with piped input/output and full redirection
support, and to allow custom in-process builtns to be provided via the
Lit config. The remaining patches to Lit's test runner can be found here@
https://github.com/BStott6/llvm-project/compare/lit-inproc-builtins.
This is part of the Lit daemonized testing project:
https://discourse.llvm.org/t/88612.
This PR makes Lit's `processRedirects` function open all input/output
files in binary mode. This makes sure that in-process builtins have the
expected behaviour when reading and writing from them:
Newline translation is not required for any of the current in-process
built-ins, in fact, the in-process built-in for `echo`, which is the
only one that writes to `stdout`, explicitly re-opens the output file
with `newline=""` on Windows, to avoid newline translation. Also,
in-process builtins will eventually need to be able to read or write
binary data: for example, `opt` without `-S` running in daemon mode.
I believe this has no functional change for regular process invocations;
I have confirmed that programs invoked by Lit which write to files
opened in binary mode by Lit still have the newline translation
performed as normal on Windows, unless they change the mode of their
output stream themselves.
This PR ensures SYCL compilations default to C++17 when no explicit
standard is specified, and validates that user-provided standards meet
SYCL's C++17 minimum requirement. It also fixes Windows MSVC compilation
by enabling -fms-extensions for SYCL device code.
Remove redundant `Args.size()` assertions from `AMDGPUMCExpr` evaluate
functions (`evaluateExtraSGPRs`, `evaluateTotalNumVGPR`,
`evaluateAlignTo`, `evaluateOccupancy`).
These assertions are redundant with the `zip_equal` size checking
performed in the `evaluateMCExprs` helper function introduced in
#193859.
---
*This PR was developed with AI assistance (GitHub Copilot).*
Same V is commonly seen in multiple TEs (shared scalars), and the
expensive part of IsExternallyUsed walks V->users() with multiple
match() pattern checks plus per-user getTreeEntries lookups - all
V-only-dependent. Split out the V-dependent body and memoize by
Value pointer, leaving the TE-specific copyable check at the call
site. DeletedNodes is read-only during the cost loop, so caching
is safe.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/194637
When looking to load an object at the start of a struct, the types do
not always match exactly. When we have an HLSL matrix the type in the
load will not match the type in memory. We need to improve the pointer
legalization pass to look for any "compatible" type at the start of an
aggragate.
A compatible are two types that the pass knows know to convert from one
to another.
This involves a refactoring of the code to make the check more general.
Assisted-by: Gemini
<!-- branch-stack-start -->
<!-- branch-stack-end -->
Each V in VL is queried up to 3 times for MightBeIgnored (direct +
NeighborMightBeIgnored from both neighbors), and the underlying
areAllUsersVectorized walks the instruction's user list. Memoize per
Value pointer to avoid the redundant walks.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/194619
Renamed sys/ucontext.h to sys/ucontext.h.def and created a corresponding
sys/ucontext.yaml, following the pattern used by sys/prctl. Updated
CMakeLists.txt to use add_header_macro.
Also removed the orphaned top-level ucontext.h.def which was never
referenced by ucontext.yaml.
The goal is to have the same attributes on ScopedSetting regardless if
this cmake setting is enabled or not.
Both of these should have nodiscard and maybe_unused attributes.
when parsing an invalid `::template operator`, the parser incorrectly
kept the consumed tokens on error. This caused the token cache to go out
of sync and crash. This patch fixes it by reverting the tokens and
properly returning the error
fixes#186582
Hoist loop-invariant predicates and memoize per-UserTE
all_of(Scalars, isUsedOutsideBlock) in
isGatherShuffledSingleRegisterEntry and vectorizeTree to avoid
redundant walks over scalar user lists in the gather-shuffle hot path.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/194612
Delay materialization of branches when building local temporary
descriptor for OPTIONAL from hlfir-to-fir until pre-cg-rewrite.
This makes the IR easier to analyze with OPTIONAL (for instance alias
analysis does not need to handle the branches to find the source).
This is done by adding an "optional" attribute to fir.embox, fir.rebox,
and fir.rebox_assumed_rank to indicate that their cogeneration must be
conditional.
The conditional aspect is implemented in pre-cg-rewrite to avoid
complexifying codegen and the fir.cg dialect.
Assisted by: Claude
Even though we have per-field lifetime information we did not previously
diagnose this test:
```c++
struct R {
struct Inner { constexpr int f() const { return 0; } };
int a = b.f();
Inner b;
};
constexpr R r;
```
because the life time was started by default.
This patch makes record members be `Lifetime::NotStarted` by default
(unless they are primitive arrays) and then starts the lifetime when in
`Pointer::initialize()`.
As discussed on #194473 - add middleend test coverage to ensure we're
creating vXi8/vXi16 llvm.vector.reduce calls to ensure we can lower to
PHMINPOS instructions
Also demonstrates that we're still not matching partial reduction
patterns in vectorcombine
Extend vsplat_uimm_{pow2,inv_pow2} matching to allow specifying an
explicit element bit width, enabling recognition of splat patterns whose
logical element size differs from the vector's native element type.
Introduce templated selectVSplatUimm{Pow2,InvPow2} helpers with an
optional EltSize parameter, and add corresponding ComplexPattern
definitions for i8/i16/i32 element widths. This allows TableGen patterns
to match cases such as operating on v8i32/v4i64 vectors with masks
derived from smaller element sizes.
With these changes, AND/OR/XOR operations using inverse power-of-two or
power-of-two splat masks are now correctly selected to VBITCLRI,
VBITSETI, and VBITREVI instructions instead of falling back to vector
logical operations with materialized constants.
This PR implements vector reduction and manipulation intrinsics.
Note that floating-point vector reduction intrinsics are not covered by
this change; they will be added in a follow-up PR after #188453 is
merged.