Flang currently lowers internal procedures passed as actual arguments using LLVM's `llvm.init.trampoline` / `llvm.adjust.trampoline` intrinsics, which require an executable stack. On modern Linux toolchains and security-hardened kernels that enforce W^X (Write XOR Execute), this causes link-time failures (`ld.lld: error: ... requires an executable stack`) or runtime `SEGV` from NX violations. This patch introduces a runtime trampoline pool that allocates trampolines from a dedicated `mmap`'d region instead of the stack. The pool toggles page permissions between writable (for patching) and executable (for dispatch), so the stack stays non-executable throughout. On macOS, MAP_JIT and `pthread_jit_write_protect_np` are used for the same effect. An i-cache flush (`__builtin___clear_cache` on Linux, `sys_icache_invalidate` on macOS) is performed after each write→exec transition. The feature is gated behind a new driver flag, `-fsafe-trampoline` (off by default), which threads through the frontend into the `BoxedProcedurePass`. When enabled, the pass emits calls to `_FortranATrampolineInit`, `_FortranATrampolineAdjust`, and `_FortranATrampolineFree` instead of the legacy intrinsics. The legacy path is completely untouched when the flag is off. The pool is a singleton with a fixed capacity (default 1024 slots, overridable via `FLANG_TRAMPOLINE_POOL_SIZE`). Slot size varies by target (32 bytes on x86-64/AArch64, 48 on PPC64, 64 fallback). Each slot holds a small architecture-specific stub, currently x86-64 (17 bytes, using `r10` as the nest/static-chain register) and AArch64 (24 bytes, using `x15`). The implementation compiles on all architectures but will crash at runtime with a clear diagnostic if trampoline emission is actually attempted on an unsupported target. This avoids breaking the flang-rt build on e.g. RISC-V or PPC64. Freed slots are poisoned (the callee pointer is overwritten with a sentinel) and recycled into a freelist, so the pool can sustain long-running programs that repeatedly create and destroy closures. A few design choices worth calling out: The runtime avoids all C++ runtime dependencies, no `std::mutex`, no `operator new`, no function-local statics with hidden guard variables. Locking is via flang-rt's own `Lock` / `CriticalSection`, memory is via `AllocateMemoryOrCrash` / `FreeMemory`, and the singleton uses explicit double-checked locking with a raw pointer. This was done so the trampoline pool links cleanly in minimal / freestanding flang-rt configurations. `_FortranATrampolineFree` calls are inserted immediately before every `func.return` in the enclosing host function. This is a conservative but correct strategy. The trampoline handle cannot outlive the host's stack frame since the closure captures the host's local variables by reference. The GNU_STACK note is verified via a dedicated integration test (`safe-trampoline-gnustack.f90`) that compiles and links a Fortran program using the runtime path, then inspects the ELF with `llvm-readelf` to confirm the stack segment is `RW` (not `RWE`). **Test coverage:** - `flang/test/Driver/fsafe-trampoline.f90` — flag forwarding (on, off, default) - `flang/test/Fir/boxproc-safe-trampoline.fir` — FIR-level FileCheck for emitted runtime calls - `flang/test/Lower/safe-trampoline.f90` — end-to-end lowering - `flang-rt/test/Driver/safe-trampoline-gnustack.f90` — GNU_STACK ELF verification Closes #182813 Co-authored-by: Sairudra More <moresair@pe31.hpc.amslabs.hpecorp.net>
Fortran Runtime (Flang-RT)
Flang-RT is the runtime library for code emitted by the Flang compiler (https://flang.llvm.org).
Getting Started
There are two build modes for the Flang-RT. The bootstrap build, also called the in-tree build, and the runtime-only build, also called the out-of-tree build. Not to be confused with the terms in-source and out-of-source builds as defined by CMake. In an in-source build, the source directory and the build directory are identical, whereas with an out-of-source build the build artifacts are stored somewhere else, possibly in a subdirectory of the source directory. LLVM does not support in-source builds.
Requirements
Requirements:
Bootstrapping Runtimes Build
The bootstrapping build will first build Clang and Flang, then use these
compilers to compile Flang-RT. CMake will create a secondary build tree
configured to use these just-built compilers. The secondary build will reuse
the same build options (Flags, Debug/Release, ...) as the primary build.
It will also ensure that once built, Flang-RT is found by Flang from either
the build- or install-prefix. To enable, add flang-rt to
LLVM_ENABLE_RUNTIMES:
cmake -S <path-to-llvm-project-source>/llvm \
-GNinja \
-DLLVM_ENABLE_PROJECTS="clang;flang" \
-DLLVM_ENABLE_RUNTIMES=flang-rt \
...
It is recommended to enable building OpenMP alongside Flang and Flang-RT
as well. This will build omp_lib.mod required to use OpenMP from Fortran.
Building Compiler-RT may also be required, particularly on platforms that do
not provide all C-ABI functionality (such as Windows).
cmake -S <path-to-llvm-project-source>/llvm \
-GNinja \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS="clang;flang" \
-DLLVM_ENABLE_RUNTIMES="compiler-rt;flang-rt;openmp" \
...
By default, the enabled runtimes will only be built for the host platform
(-DLLVM_RUNTIME_TARGETS=default). To add additional targets to support
cross-compilation via flang --target=<target-triple>, add more triples to
LLVM_RUNTIME_TARGETS, such as
-DLLVM_RUNTIME_TARGETS="default;aarch64-linux-gnu".
After configuration, build, test, and install the runtime(s) via
$ ninja flang-rt
$ ninja check-flang-rt
$ ninja install
Standalone Runtimes Build
Instead of building Clang and Flang from scratch, the standalone Runtime build
uses CMake's environment introspection to find a C, C++, and Fortran compiler.
The compiler to be used can be controlled using CMake's standard mechanisms such
as CMAKE_CXX_COMPILER, CMAKE_CXX_COMPILER, and CMAKE_Fortran_COMPILER.
CMAKE_Fortran_COMPILER must be flang built from the same Git commit as
Flang-RT to ensure they are using the same ABI. The C and C++ compiler
can be any compiler supporting the same ABI.
In addition to the compiler, the build must be able to find LLVM development
tools such as lit and FileCheck that are not found in an LLVM's install
directory. Use CMAKE_BINARY_DIR to point to directory where LLVM has
been built. When building Flang as part of a bootstrapping build
(LLVM_ENABLE_PROJECTS=flang), Flang-RT is automatically added
unless configured with -DFLANG_ENABLE_FLANG_RT=OFF. Add that option to avoid
having two conflicting versions of the same library.
A simple build configuration might look like the following:
cmake -S <path-to-llvm-project-source>/runtimes \
-GNinja \
-DLLVM_BINARY_DIR=<path-to-llvm-builddir> \
-DCMAKE_Fortran_COMPILER=<path-to-llvm-builddir>/bin/flang \
-DCMAKE_Fortran_COMPILER_WORKS=yes \
-DLLVM_ENABLE_RUNTIMES=flang-rt \
...
The CMAKE_Fortran_COMPILER_WORKS parameter must be set because otherwise CMake
will test whether the Fortran compiler can compile and link programs which will
obviously fail without a runtime library available yet.
Building Flang-RT for cross-compilation triple, the target triple can
be selected using LLVM_DEFAULT_TARGET_TRIPLE AND LLVM_RUNTIMES_TARGET.
Of course, Flang-RT can be built multiple times with different build
configurations, but have to be located manually when using with the Flang
driver using the -L option.
After configuration, build, test, and install the runtime via
$ ninja
$ ninja check-flang-rt
$ ninja install
Configuration Option Reference
Flang-RT has the followign configuration options. This is in addition to the build options the LLVM_ENABLE_RUNTIMES mechanism and CMake itself provide.
-
FLANG_RT_INCLUDE_TESTS(boolean; default:ON)When
OFF, does not add any tests and unittests. Thecheck-flang-rtbuild target will do nothing. -
FLANG_RUNTIME_F128_MATH_LIB(default:"")Determines the implementation of
REAL(16)math functions. If set tolibquadmath, usesquadmath.hand-lquadmathtypically distributed with gcc. If empty, disablesREAL(16)support. For any other value, introspects the compiler for__float128or 128-bitlong doublesupport. More details. -
FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT(values:"CUDA",""default:"")When set to
CUDA, builds Flang-RT with experimental support for GPU accelerators using CUDA.CMAKE_CUDA_COMPILERmust be set if not automatically detected by CMake.nvccas well asclangare supported. -
FLANG_RT_INCLUDE_CUF(bool, default:OFF)Compiles the
libflang_rt.cuda_<CUDA-version>.a/.solibrary. This is independent ofFLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT=CUDAand only requires a CUDA Toolkit installation (noCMAKE_CUDA_COMPILER).
Experimental CUDA Support
With -DFLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT=CUDA, the following
additional configuration options become available.
-
FLANG_RT_LIBCUDACXX_PATH(path, default:"")Path to libcu++ package installation.
-
FLANG_RT_CUDA_RUNTIME_PTX_WITHOUT_GLOBAL_VARS(boolean, default:OFF)Do not compile global variables' definitions when producing PTX library. Default is
OFF, meaning global variable definitions are compiled by default.
GPU Offloading Support
Flang-RT can be built for GPU targets (AMDGPU, NVPTX) using the LLVM
runtimes build infrastructure. The easiest way to configure a build for
GPU offloading is via the CMake cache file at
offload/cmake/caches/FlangOffload.cmake.