Files
llvm-project/llvm/test/CodeGen/AMDGPU/kernel-args.ll
jian.wu ed395c894f [AMDGPU] Use value's DebugLoc for bitcast in performStoreCombine (#186766)
## Description

When `AMDGPUTargetLowering::performStoreCombine` inserts a synthetic
bitcast to convert vector types (e.g. `<1 x float>` → `i32`) for stores,
the bitcast inherits the **store's** SDLoc. When
`DAGCombiner::visitBITCAST` later folds `bitcast(load)` → `load`, the
resulting load loses its original debug location.

## Analysis

The bitcast is **not** present in the initial SelectionDAG — it is
inserted during DAGCombine by
`AMDGPUTargetLowering::performStoreCombine`. This can be observed with
`-debug-only=isel,dagcombine`:

```
Initial selection DAG: no bitcast, load is v1f32 directly used by store

Combining: t17: ch = store ... /tmp/beans.c:6:14
 ... into: t20: ch = store ... /tmp/beans.c:6:14

Combining: t19: i32 = bitcast [ORD=3] # D:1 t13, /tmp/beans.c:6:14
 ... into: t21: i32,ch = load ... /tmp/beans.c:6:14
```

In `performStoreCombine` (`AMDGPUISelLowering.cpp`):

```cpp
SDLoc SL(N);  // N = store node → SL has store's DebugLoc
...
SDValue CastVal = DAG.getNode(ISD::BITCAST, SL, NewVT, Val);
// bitcast gets store's DebugLoc, not load's
```

When `visitBITCAST` folds `bitcast(load)` → `load`, it uses `SDLoc(N)`
(the bitcast's loc = store's loc), so the resulting load loses its
original debug location.

```
Before (initial DAG):
  t13: v1f32 = load ...           line 2   ; original load
  t14: ch    = store t13, ...     line 3   ; store

After performStoreCombine:
  t13: v1f32 = load ...           line 2   ; original load
  t19: i32   = bitcast t13        line 3   ; synthetic bitcast (store's loc!)
  t20: ch    = store t19, ...     line 3

After visitBITCAST folds (incorrect):
  t21: i32 = load ...             line 0   ; lost debug location

After visitBITCAST folds (expected):
  t21: i32 = load ...             line 2   ; preserves load's location
```

## Fix

Target-specific fix in `AMDGPUISelLowering.cpp` `performStoreCombine`:
use `DAG.getBitcast()` instead of `DAG.getNode(ISD::BITCAST, SL, ...)`.
`getBitcast()` internally uses `SDLoc(V)` (the value operand's SDLoc),
so the synthetic bitcast naturally inherits the load's DebugLoc instead
of the store's:

```cpp
// Before:
SDValue CastVal = DAG.getNode(ISD::BITCAST, SL, NewVT, Val);
if (OtherUses) {
    SDValue CastBack = DAG.getNode(ISD::BITCAST, SL, VT, CastVal);

// After:
SDValue CastVal = DAG.getBitcast(NewVT, Val);
if (OtherUses) {
    SDValue CastBack = DAG.getBitcast(VT, CastVal);
```

This is consistent with `performLoadCombine` where the bitcast also uses
the load's `SDLoc`.
2026-04-11 18:16:52 +00:00

232 KiB