llvm-project

Files

Jared Hoberock 4d33c692e9 [MLIR][GPU] Add cooperative launch support to gpu.launch_func (#190639 )

Add a `cooperative` UnitAttr to `gpu.launch_func` that enables
cooperative kernel launch semantics. Cooperative launches guarantee that
all thread blocks in the grid are co-resident on the GPU simultaneously,
enabling grid-wide synchronization patterns.

## Implementation

When `cooperative` is set (with or without cluster sizes), the lowering
emits a call to the new `mgpuLaunchKernelCooperative` runtime function,
which uses `cuLaunchKernelEx` with a `CUlaunchConfig` and
`CU_LAUNCH_ATTRIBUTE_COOPERATIVE`. This API is guarded behind
`CUDA_VERSION >= 12000`. The HIP path funnels through
`hipModuleLaunchCooperativeKernel`.

## Changes

- **GPUOps.td**: add `cooperative` UnitAttr and assembly format keyword
- **SelectObjectAttr.cpp**: add `getKernelLaunchExFn()`, route
cooperative and/or cluster launches through `mgpuLaunchKernelEx`
- **CudaRuntimeWrappers.cpp**: implement `mgpuLaunchKernelCooperative`
via `cuLaunchKernelEx` or `hipModuleLaunchCooperativeKernel`, depending
on platform
- **GPUToLLVMConversion.cpp**: propagate cooperative attribute through
the legalization pattern
- **test/Dialect/GPU/ops.mlir**: round-trip tests for cooperative
keyword with and without clusters

## Context

MLIR currently has no support for cooperative kernel launches. Flang
works around this with a CUF-specific attribute (PRs #124325, #124362),
but there is no first-class support in the GPU dialect. This patch adds
it at the `gpu.launch_func` level so all frontends can use it.

Assisted-by: Claude (Anthropic)

2026-04-29 11:33:22 +02:00

SparseTensor

[MLIR][CMake] Fix runtime libraries with PCH (#182850 )

2026-02-25 15:50:15 +01:00

APFloatWrappers.cpp

[mlir][arith] Add arith.flush_denormals operation (#192641 )

2026-04-21 13:45:59 +02:00

ArmRunnerUtils.cpp

…

ArmSMEStubs.cpp

[compiler-rt] Don't provide __arm_sme_state for baremetal targets (#191434 )

2026-04-20 10:00:18 +01:00