Use concrete `I32` (default) and `I64` (clock64, globaltimer) instead of
generic `LLVM_Type` for special-register op results. The dialect
verifier now rejects mismatches up-front, and the Python op-binding
generator emits the inferred-result form, so callers can write
`nvvm.ThreadIdXOp()` with no arguments. Strict tightening: no valid
existing IR is rejected.