This change introduces Gap Filling, an optimization that aims to fill in
holes in otherwise contiguous load/store chains to enable vectorization.
It also introduces Chain Extending, which extends the end of a chain to
the closest power of 2.
This was originally motivated by the NVPTX target, but I tried to
generalize it to be universally applicable to all targets that may use
the LSV. I'm more than willing to make adjustments to improve the
target-agnostic-ness of this change. I fully expect there are some
issues and encourage feedback on how to improve things.
For both loads and stores we only perform the optimization when we can
generate a legal llvm masked load/store intrinsic, masking off the
"extra" elements. Determining legality for stores is a little tricky
from the NVPTX side, because these intrinsics are only supported for
256-bit vectors. See the other PR I opened for the implementation of the
NVPTX lowering of masked store intrinsics, which include NVPTX TTI
changes that return true for isLegalMaskedStore under certain
conditions: https://github.com/llvm/llvm-project/pull/159387. This
change is dependent on that backend change, but I predict this change
will require more discussion, so I am putting them both up at the same
time. The backend change will be merged first assuming both are
approved.
Edited: both stores _and loads_ must use masked intrinsics for this
optimization to be legal.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>