for "!omp do parallel simd reduction" ensuring that reduction for array
section is done properly by :
1) per-SIMD-lane reduction results are combined into the wsloop's
thread-local copies.
2) wsloop thread-local copies are combined across threads by the wsloop
reduction.
Issue is in [192077](https://github.com/llvm/llvm-project/issues/192077)
---------
Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>