Files
Eric Feng ef739b97b1 [AMDGPU] Correct gfx950 smfmac sparse index verifier (#193541)
Originally, the smfmac verifier expects for the sparse indices, which
describe which the positions of the non-zero elements per lane, the
following:

```
8 bit source -> require vector<2xi16>, ABID range [0, 1]
16 bit source -> require vector<4xi8>, ABID range [0, 3]
```

which is correct for CDNA3 and what was stated in the CDNA4 ISA
description as well. However, because the CDNA4 variants have double K
of the CDNA3 variants, meaning e.g, for 16 bit variants, each lane
carries 8 non-zero values rather than 4, we need 16 bit sparse indices
to express the full range of non-zero elements. This is in line with the
layout tables presented in the CDNA4 ISA.

Direct comparison for 16 bit elements:
On gfx942; we can select from one of four 8-bit sets of sparse indices
with ABID. Each set represents the location of four non-zero values per
8 following 4:2 structured sparsity. For example:
```
a0 a1 0 0 a3 a4 0 0 | a5 a6 0 0 0 0 a7 a8 | a9 0 0 a10 0 a11 0 a12 | 0 a13 0 a14 0 a15 0
```

On gfx950; because each lane carries 8 non-zero values; we can only
specify the full range of 8 non-zero values per 16 from one of two
16-bit sets. For example:
```
a0 a1 0 0 a3 a4 0 0 a5 a6 0 0 0 0 a7 a8 | a9 0 0 a10 0 a11 0 a12 0 a13 0 a14 0 a15 0
```

Similarly, for 8 bit variants on gfx950, we would need the full 32 bits
to describe the full range of the locations for the 16 non-zero 8 bit
elements. In this case, there is no option to select from different sets
of indices.

The issue arises in downstream use cases if we want to use use a set of
sparse indices targeting gfx950; because we are unable to specify the
full range of the non-zero values at the moment, we will get numerical
issues.

Assisted by: Claude

---------

Signed-off-by: Eric Feng <Eric.Feng@amd.com>
2026-04-24 13:47:17 -07:00
..