Files
llvm-project/llvm/lib/CodeGen/PostRAHazardRecognizer.cpp
Dark Steve 254cb2a326 [AMDGPU] Hoist WMMA coexecution hazard V_NOPs from loops to preheaders (#176895)
On GFX1250, V_NOPs inserted for WMMA coexecution hazards are placed at
the use-site. When the hazard-consuming instruction is inside a loop and
the WMMA is outside, these NOPs execute every iteration even though the
hazard only needs to be covered once.

This patch hoists the V_NOPs to the loop preheader, reducing executions
from N iterations to 1.

```
Example (assuming a hazard requiring K V_NOPs):
  Before:
    bb.0 (preheader): WMMA writes vgpr0
    bb.1 (loop):      V_NOP xK, VALU reads vgpr0, branch bb.1
                      -> K NOPs executed per iteration

  After:
    bb.0 (preheader): WMMA writes vgpr0, V_NOP xK
    bb.1 (loop):      VALU reads vgpr0, branch bb.1
                      -> K NOPs executed once
```

For nested loops, V_NOPs are hoisted to the outermost preheader where no
WMMA hazard exists within the loop.
Hoisting is restricted to strict preheaders (not any single predecessor)
to avoid introducing V_NOPs on unrelated control flow paths.

The optimization is controlled by `-amdgpu-wmma-vnop-hoisting` (default:
on).

Fixes: SWDEV-573407
2026-02-26 17:19:00 +05:30

4.4 KiB