Files
llvm-project/flang-rt
Sairudra More 111bafff9b [flang] Add runtime trampoline pool for W^X compliance (#183108)
Flang currently lowers internal procedures passed as actual arguments
using LLVM's `llvm.init.trampoline` / `llvm.adjust.trampoline`
intrinsics, which require an executable stack. On modern Linux
toolchains and security-hardened kernels that enforce W^X (Write XOR
Execute), this causes link-time failures (`ld.lld: error: ... requires
an executable stack`) or runtime `SEGV` from NX violations.

This patch introduces a runtime trampoline pool that allocates
trampolines from a dedicated `mmap`'d region instead of the stack. The
pool toggles page permissions between writable (for patching) and
executable (for dispatch), so the stack stays non-executable throughout.
On macOS, MAP_JIT and `pthread_jit_write_protect_np` are used for the
same effect. An i-cache flush (`__builtin___clear_cache` on Linux,
`sys_icache_invalidate` on macOS) is performed after each write→exec
transition.

The feature is gated behind a new driver flag, `-fsafe-trampoline` (off
by default), which threads through the frontend into the
`BoxedProcedurePass`. When enabled, the pass emits calls to
`_FortranATrampolineInit`, `_FortranATrampolineAdjust`, and
`_FortranATrampolineFree` instead of the legacy intrinsics. The legacy
path is completely untouched when the flag is off.

The pool is a singleton with a fixed capacity (default 1024 slots,
overridable via `FLANG_TRAMPOLINE_POOL_SIZE`). Slot size varies by
target (32 bytes on x86-64/AArch64, 48 on PPC64, 64 fallback). Each slot
holds a small architecture-specific stub, currently x86-64 (17 bytes,
using `r10` as the nest/static-chain register) and AArch64 (24 bytes,
using `x15`). The implementation compiles on all architectures but will
crash at runtime with a clear diagnostic if trampoline emission is
actually attempted on an unsupported target. This avoids breaking the
flang-rt build on e.g. RISC-V or PPC64.

Freed slots are poisoned (the callee pointer is overwritten with a
sentinel) and recycled into a freelist, so the pool can sustain
long-running programs that repeatedly create and destroy closures.

A few design choices worth calling out:

The runtime avoids all C++ runtime dependencies, no `std::mutex`, no
`operator new`, no function-local statics with hidden guard variables.
Locking is via flang-rt's own `Lock` / `CriticalSection`, memory is via
`AllocateMemoryOrCrash` / `FreeMemory`, and the singleton uses explicit
double-checked locking with a raw pointer. This was done so the
trampoline pool links cleanly in minimal / freestanding flang-rt
configurations.

`_FortranATrampolineFree` calls are inserted immediately before every
`func.return` in the enclosing host function. This is a conservative but
correct strategy. The trampoline handle cannot outlive the host's stack
frame since the closure captures the host's local variables by
reference.

The GNU_STACK note is verified via a dedicated integration test
(`safe-trampoline-gnustack.f90`) that compiles and links a Fortran
program using the runtime path, then inspects the ELF with
`llvm-readelf` to confirm the stack segment is `RW` (not `RWE`).

**Test coverage:**

- `flang/test/Driver/fsafe-trampoline.f90` — flag forwarding (on, off,
default)
- `flang/test/Fir/boxproc-safe-trampoline.fir` — FIR-level FileCheck for
emitted runtime calls
- `flang/test/Lower/safe-trampoline.f90` — end-to-end lowering
- `flang-rt/test/Driver/safe-trampoline-gnustack.f90` — GNU_STACK ELF
verification

Closes #182813

Co-authored-by: Sairudra More <moresair@pe31.hpc.amslabs.hpecorp.net>
2026-03-10 16:16:05 +05:30
..

Fortran Runtime (Flang-RT)

Flang-RT is the runtime library for code emitted by the Flang compiler (https://flang.llvm.org).

Getting Started

There are two build modes for the Flang-RT. The bootstrap build, also called the in-tree build, and the runtime-only build, also called the out-of-tree build. Not to be confused with the terms in-source and out-of-source builds as defined by CMake. In an in-source build, the source directory and the build directory are identical, whereas with an out-of-source build the build artifacts are stored somewhere else, possibly in a subdirectory of the source directory. LLVM does not support in-source builds.

Requirements

Requirements:

Bootstrapping Runtimes Build

The bootstrapping build will first build Clang and Flang, then use these compilers to compile Flang-RT. CMake will create a secondary build tree configured to use these just-built compilers. The secondary build will reuse the same build options (Flags, Debug/Release, ...) as the primary build. It will also ensure that once built, Flang-RT is found by Flang from either the build- or install-prefix. To enable, add flang-rt to LLVM_ENABLE_RUNTIMES:

cmake -S <path-to-llvm-project-source>/llvm \
  -GNinja                                   \
  -DLLVM_ENABLE_PROJECTS="clang;flang"      \
  -DLLVM_ENABLE_RUNTIMES=flang-rt           \
  ...

It is recommended to enable building OpenMP alongside Flang and Flang-RT as well. This will build omp_lib.mod required to use OpenMP from Fortran. Building Compiler-RT may also be required, particularly on platforms that do not provide all C-ABI functionality (such as Windows).

cmake -S <path-to-llvm-project-source>/llvm     \
  -GNinja                                       \
  -DCMAKE_BUILD_TYPE=Release                    \
  -DLLVM_ENABLE_PROJECTS="clang;flang"   \
  -DLLVM_ENABLE_RUNTIMES="compiler-rt;flang-rt;openmp" \
  ...

By default, the enabled runtimes will only be built for the host platform (-DLLVM_RUNTIME_TARGETS=default). To add additional targets to support cross-compilation via flang --target=<target-triple>, add more triples to LLVM_RUNTIME_TARGETS, such as -DLLVM_RUNTIME_TARGETS="default;aarch64-linux-gnu".

After configuration, build, test, and install the runtime(s) via

$ ninja flang-rt
$ ninja check-flang-rt
$ ninja install

Standalone Runtimes Build

Instead of building Clang and Flang from scratch, the standalone Runtime build uses CMake's environment introspection to find a C, C++, and Fortran compiler. The compiler to be used can be controlled using CMake's standard mechanisms such as CMAKE_CXX_COMPILER, CMAKE_CXX_COMPILER, and CMAKE_Fortran_COMPILER. CMAKE_Fortran_COMPILER must be flang built from the same Git commit as Flang-RT to ensure they are using the same ABI. The C and C++ compiler can be any compiler supporting the same ABI.

In addition to the compiler, the build must be able to find LLVM development tools such as lit and FileCheck that are not found in an LLVM's install directory. Use CMAKE_BINARY_DIR to point to directory where LLVM has been built. When building Flang as part of a bootstrapping build (LLVM_ENABLE_PROJECTS=flang), Flang-RT is automatically added unless configured with -DFLANG_ENABLE_FLANG_RT=OFF. Add that option to avoid having two conflicting versions of the same library.

A simple build configuration might look like the following:

cmake -S <path-to-llvm-project-source>/runtimes              \
  -GNinja                                                    \
  -DLLVM_BINARY_DIR=<path-to-llvm-builddir>                  \
  -DCMAKE_Fortran_COMPILER=<path-to-llvm-builddir>/bin/flang \
  -DCMAKE_Fortran_COMPILER_WORKS=yes                         \
  -DLLVM_ENABLE_RUNTIMES=flang-rt                            \
  ...

The CMAKE_Fortran_COMPILER_WORKS parameter must be set because otherwise CMake will test whether the Fortran compiler can compile and link programs which will obviously fail without a runtime library available yet.

Building Flang-RT for cross-compilation triple, the target triple can be selected using LLVM_DEFAULT_TARGET_TRIPLE AND LLVM_RUNTIMES_TARGET. Of course, Flang-RT can be built multiple times with different build configurations, but have to be located manually when using with the Flang driver using the -L option.

After configuration, build, test, and install the runtime via

$ ninja
$ ninja check-flang-rt
$ ninja install

Configuration Option Reference

Flang-RT has the followign configuration options. This is in addition to the build options the LLVM_ENABLE_RUNTIMES mechanism and CMake itself provide.

  • FLANG_RT_INCLUDE_TESTS (boolean; default: ON)

    When OFF, does not add any tests and unittests. The check-flang-rt build target will do nothing.

  • FLANG_RUNTIME_F128_MATH_LIB (default: "")

    Determines the implementation of REAL(16) math functions. If set to libquadmath, uses quadmath.h and -lquadmath typically distributed with gcc. If empty, disables REAL(16) support. For any other value, introspects the compiler for __float128 or 128-bit long double support. More details.

  • FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT (values: "CUDA", "" default: "")

    When set to CUDA, builds Flang-RT with experimental support for GPU accelerators using CUDA. CMAKE_CUDA_COMPILER must be set if not automatically detected by CMake. nvcc as well as clang are supported.

  • FLANG_RT_INCLUDE_CUF (bool, default: OFF)

    Compiles the libflang_rt.cuda_<CUDA-version>.a/.so library. This is independent of FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT=CUDA and only requires a CUDA Toolkit installation (no CMAKE_CUDA_COMPILER).

Experimental CUDA Support

With -DFLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT=CUDA, the following additional configuration options become available.

  • FLANG_RT_LIBCUDACXX_PATH (path, default: "")

    Path to libcu++ package installation.

  • FLANG_RT_CUDA_RUNTIME_PTX_WITHOUT_GLOBAL_VARS (boolean, default: OFF)

    Do not compile global variables' definitions when producing PTX library. Default is OFF, meaning global variable definitions are compiled by default.

GPU Offloading Support

Flang-RT can be built for GPU targets (AMDGPU, NVPTX) using the LLVM runtimes build infrastructure. The easiest way to configure a build for GPU offloading is via the CMake cache file at offload/cmake/caches/FlangOffload.cmake.