Split the input vector with an extend_low and high and then split the results again with extend_low and high for a total of 6 instructions. This is removes 3 shuffles and a couple of extends.
22 KiB
22 KiB