for "!omp do parallel simd reduction" ensuring that reduction for array section is done properly by : 1) per-SIMD-lane reduction results are combined into the wsloop's thread-local copies. 2) wsloop thread-local copies are combined across threads by the wsloop reduction. Issue is in [192077](https://github.com/llvm/llvm-project/issues/192077) --------- Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>