SIFixSGPRCopies was incorrectly handling inline assembly operands with
SGPR ("s") constraints when the value came from a memory load (which
produces a VGPR). The pass would fail to insert the necessary
v_readfirstlane instruction instead directly passes the vgpr value.
example:
asm sideeffect buffer_load_dwordx4 $0, $1, $2, 0 =v,v,s,n
previously it generated:
buffer_load_dwordx4 v[0:3], v0, v[8:11] (but sgpr is expected), 0 offen
The fix adds readfirstlanes during lowering when there is a copy from
divergent register to SGPR.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>