Files
Jared Hoberock 4d33c692e9 [MLIR][GPU] Add cooperative launch support to gpu.launch_func (#190639)
Add a `cooperative` UnitAttr to `gpu.launch_func` that enables
cooperative kernel launch semantics. Cooperative launches guarantee that
all thread blocks in the grid are co-resident on the GPU simultaneously,
enabling grid-wide synchronization patterns.

## Implementation

When `cooperative` is set (with or without cluster sizes), the lowering
emits a call to the new `mgpuLaunchKernelCooperative` runtime function,
which uses `cuLaunchKernelEx` with a `CUlaunchConfig` and
`CU_LAUNCH_ATTRIBUTE_COOPERATIVE`. This API is guarded behind
`CUDA_VERSION >= 12000`. The HIP path funnels through
`hipModuleLaunchCooperativeKernel`.

## Changes

- **GPUOps.td**: add `cooperative` UnitAttr and assembly format keyword
- **SelectObjectAttr.cpp**: add `getKernelLaunchExFn()`, route
cooperative and/or cluster launches through `mgpuLaunchKernelEx`
- **CudaRuntimeWrappers.cpp**: implement `mgpuLaunchKernelCooperative`
via `cuLaunchKernelEx` or `hipModuleLaunchCooperativeKernel`, depending
on platform
- **GPUToLLVMConversion.cpp**: propagate cooperative attribute through
the legalization pattern
- **test/Dialect/GPU/ops.mlir**: round-trip tests for cooperative
keyword with and without clusters

## Context

MLIR currently has no support for cooperative kernel launches. Flang
works around this with a CUF-specific attribute (PRs #124325, #124362),
but there is no first-class support in the GPU dialect. This patch adds
it at the `gpu.launch_func` level so all frontends can use it.

Assisted-by: Claude (Anthropic)
2026-04-29 11:33:22 +02:00
..