Add a new `arith.flush_denormals` operation. The operation takes a floating-point value as input and returns zero if the value is denormal. If the input is not denormal, the operation passes through the input. This commit also adds support to the `ArithToAPFloat` infrastructure. Running example: ```mlir %flush_a = arith.flush_denormals %a : f32 %flush_b = arith.flush_denormals %b : f32 %res = arith.addf %flush_a, %flush_b : f32 %flush_res = arith.flush_denormals %res : f32 ``` The exact lowering path depends on the backend and is not implemented as part of this PR: - Per-instruction mode. E.g., on NVIDIA architectures, the above example can lower to `add.ftz.f32 dest, a, b`. - Global status register. E.g., on `x86_64`, the above example can lower to `_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON); r = a + b; _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_OFF); _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_OFF);`. Subsequent ON-OFF-ON switches can be folded away. - Emulation via integer arithmetics. Check the bit pattern of the input float (depending on the specific FP type) and pass-through either the input or a zero constant. This lowering approach works on all architectures. Assisted-by: claude-opus-4.7-thinking-high
12 KiB
12 KiB