llvm-project

Files

Nicolai Hähnle 69589dd2c0 AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818 )

These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"

The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.

2025-11-21 19:33:13 +00:00

add_sub_sat-inseltpoison.ll

…

add_sub_sat.ll

…

address-space-ptr-sze-gep-index-assert.ll

…

bswap-inseltpoison.ll

…

bswap.ll

…

crash_extract_subvector_cost.ll

…

external-shuffle.ll

…

extract-ordering.ll

…

horizontal-store.ll

…

invariant-load-no-alias-store.ll

…

lit.local.cfg

…

min_max.ll

…

packed-math.ll

…

phi-result-use-order.ll

…

reduction.ll

…

round-inseltpoison.ll

…

round.ll

…

slp-v2f16.ll

…

slp-v2f32.ll

…

vectorize-i8.ll

…