Implement new ACLE matrix multiply-accumulate intrinsics for Armv9.7: ```c // 16-bit floating-point matrix multiply-accumulate. // Only if __ARM_FEATURE_SVE_B16MM // Variant also available for _f16 if (__ARM_FEATURE_SVE2p2 && __ARM_FEATURE_F16MM). svbfloat16_t svmmla[_bf16](svbfloat16_t zda, svbfloat16_t zn, svbfloat16_t zm); // Half-precision matrix multiply accumulating to single-precision instruction. // Requires the +f16f32mm architecture extension. float32x4_t vmmlaq_f32_f16(float32x4_t r, float16x8_t a, float16x8_t b); // Non-widening half-precision matrix multiply instruction. // Requires the +f16mm architecture extension. float16x8_t vmmlaq_f16_f16(float16x8_t r, float16x8_t a, float16x8_t b); ```
309 KiB
309 KiB