On GPUs `TTI::getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector()` returns the element bit width, not the whole vector size as it does on the CPU. So this patch changes this to a call to `getLoadStoreVecRegBitWidth()`, which depends on the address space, so it also moves the calculation per seed. This patch also adds an AMDGPU lit test directory with a simple test.