The convert-affine-for-to-gpu pass moved operations from the affine.for loop body to the GPU launch kernel, then erased the original loop. However, if the loop had iter_args (reduction loops), the moved operations could still reference the loop body's block arguments (the iter_args). When the loop was erased, those block arguments were destroyed while still having live uses, triggering a use_empty() assertion. Fix this by detecting loops with iter_args in collectBounds and returning an error. Reduction loops cannot be trivially converted to GPU kernels without dedicated handling of the accumulator semantics. Fixes #116044 Assisted-by: Claude Code
34 KiB
34 KiB