This adds a document describing the implementation of LLVM IR generation for exceptions and C++ cleanup handling. This will be used as a point of reference for future CIR exception handling design work. This document was generated using AI, with some manual modifications afterwards.
252 lines
11 KiB
ReStructuredText
252 lines
11 KiB
ReStructuredText
========================================
|
|
LLVM IR Generation for EH and Cleanups
|
|
========================================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
Overview
|
|
========
|
|
|
|
This document describes how Clang's LLVM IR generation represents exception
|
|
handling (EH) and C++ cleanups. It focuses on the data structures and control
|
|
flow patterns used to model normal and exceptional exits, and it outlines how
|
|
the generated IR differs across common ABI models.
|
|
|
|
For details on the LLVM IR representation of exception handling, see
|
|
`LLVM Exception Handling <https://llvm.org/docs/ExceptionHandling.html>`_.
|
|
|
|
Core Model
|
|
==========
|
|
|
|
EH and cleanup handling is centered around an ``EHScopeStack`` that records
|
|
nested scopes for:
|
|
|
|
- **Cleanups**, which run on normal control flow, exceptional control flow, or
|
|
both. These are used for destructors, full-expression cleanups, and other
|
|
scope-exit actions.
|
|
- **Catch scopes**, which represent ``try``/``catch`` handlers.
|
|
- **Filter scopes**, used to model dynamic exception specifications and some
|
|
platform-specific filters.
|
|
- **Terminate scopes**, used for ``noexcept`` and similar termination paths.
|
|
|
|
Each cleanup is a small object with an ``Emit`` method. When a cleanup scope is
|
|
popped, the IR generator decides whether it must materialize a normal cleanup
|
|
block (for fallthrough, branch-through, or unresolved ``goto`` fixups) and/or an
|
|
EH cleanup entry (when exceptional control flow can reach the cleanup). This
|
|
results in a flattened CFG where cleanup lifetime is represented by the blocks
|
|
and edges that flow into those blocks.
|
|
|
|
Key Components
|
|
==============
|
|
|
|
The LLVM IR generation for EH and cleanups is spread across several core
|
|
components:
|
|
|
|
- ``CodeGenModule`` owns module-wide state such as the LLVM module, target
|
|
information, and the selected EH personality function. It provides access to
|
|
ABI helpers via ``CGCXXABI`` and target-specific hooks.
|
|
- ``CodeGenFunction`` manages per-function state and IR building. It owns the
|
|
``EHScopeStack``, tracks the current insertion point, and emits blocks, calls,
|
|
and branches. Most cleanup and EH control flow is built here.
|
|
- ``EHScopeStack`` is the central stack of scopes used to model EH and cleanup
|
|
semantics. It stores ``EHCleanupScope`` entries for cleanups, along with
|
|
``EHCatchScope``, ``EHFilterScope``, and ``EHTerminateScope`` for handlers and
|
|
termination logic.
|
|
- ``EHCleanupScope`` stores the cleanup object plus state data (active flags,
|
|
fixup depth, and enclosing scope links). When a cleanup scope is popped,
|
|
``CodeGenFunction`` decides whether to emit a normal cleanup block, an EH
|
|
cleanup entry, or both.
|
|
- Cleanup emission helpers implement the mechanics of branching through
|
|
cleanups, threading fixups, and emitting cleanup blocks.
|
|
- Exception emission helpers implement landing pads, dispatch blocks,
|
|
personality selection, and helper routines for try/catch, filters, and
|
|
terminate handling.
|
|
- ``CGCXXABI`` (and its ABI-specific implementations such as
|
|
``ItaniumCXXABI`` and ``MicrosoftCXXABI``) provide ABI-specific lowering for
|
|
throws, catch handling, and destructor emission details.
|
|
- The cleanup and exception handling code generation is driven by the flow of
|
|
``CodeGenFunction`` and its helper classes traversing the AST to emit IR for
|
|
C++ expressions, classes, and statements.
|
|
|
|
AST traversal in ``CodeGenFunction`` emits code and pushes cleanups or EH scopes,
|
|
``EHScopeStack`` records scope nesting, cleanup and exception helpers materialize
|
|
the CFG as scopes are popped, and ``CGCXXABI`` supplies ABI-specific details for
|
|
landing pads or funclets.
|
|
|
|
Cleanup Destination Routing
|
|
===========================
|
|
|
|
When multiple control flow exits (``return``, ``break``, ``continue``,
|
|
fallthrough) pass through the same cleanup, the generated IR shares a single
|
|
cleanup block among them. Before entering the cleanup, each exit path stores a
|
|
unique index into a "cleanup destination" slot. After the cleanup code runs, a
|
|
``switch`` instruction loads this index and dispatches to the appropriate final
|
|
destination. This avoids duplicating cleanup code for each exit while preserving
|
|
correct control flow.
|
|
|
|
For example, if a function has both a ``return`` and a ``break`` that exit
|
|
through the same destructor cleanup, both paths branch to the shared cleanup
|
|
block after storing their respective destination indices. The cleanup epilogue
|
|
then switches on the stored index to reach either the return block or the
|
|
loop-exit block.
|
|
|
|
When only a single exit passes through a cleanup (the common case), the switch
|
|
is unnecessary and the cleanup block branches directly to its sole destination.
|
|
|
|
Branch Fixups for Forward Gotos
|
|
-------------------------------
|
|
|
|
A ``goto`` statement that jumps forward to a label not yet seen poses a special
|
|
problem. The destination's enclosing cleanup scope is unknown at the point the
|
|
``goto`` is emitted. This is handled by emitting an optimistic branch and
|
|
recording a "fixup." When the cleanup scope is later popped, any recorded fixups
|
|
are resolved by rewriting the branch to thread through the cleanup block and
|
|
adding the destination to the cleanup's switch.
|
|
|
|
Exceptional Cleanups and EH Dispatch
|
|
====================================
|
|
|
|
Exceptional exits (``throw``, ``invoke`` unwinds) are routed through EH cleanup
|
|
entries, which are reached via a landing pad or a funclet dispatch block,
|
|
depending on the target ABI.
|
|
|
|
For Itanium-style EH (such as is used on x86-64 Linux), the IR uses ``invoke``
|
|
to call potentially-throwing operations and a ``landingpad`` instruction to
|
|
capture the exception and selector values. The landing pad aggregates any
|
|
catch and cleanup clauses for the current scope, and branches to a dispatch
|
|
block that compares the selector to type IDs and jumps to the appropriate
|
|
handler.
|
|
|
|
For Windows, LLVM IR uses funclet-style EH: ``catchswitch`` and ``catchpad`` for
|
|
handlers, and ``cleanuppad`` for cleanups, with ``catchret`` and ``cleanupret``
|
|
edges to resume normal flow. The personality function determines how these pads
|
|
are interpreted by the backend.
|
|
|
|
Personality and ABI Selection
|
|
=============================
|
|
|
|
Each function with exception handling constructs is associated with a
|
|
personality function (e.g. __gxx_personality_v0 for C++ on Linux). The
|
|
personality function determines the ABI-specifc EH behavior of the
|
|
function. The IR generation selects a personality function based on language
|
|
options and the target ABI (e.g., Itanium, MSVC SEH, SJLJ, Wasm EH). This
|
|
decision affects:
|
|
|
|
- Whether the IR uses landing pads or funclet pads.
|
|
- The shape of dispatch logic for catch and filter scopes.
|
|
- How termination or rethrow paths are modeled.
|
|
- Whether certain helper functions such as exception filters must be outlined.
|
|
|
|
Because the personality choice is made during IR generation, the CFG shape
|
|
directly reflects ABI-specific details.
|
|
|
|
Example: Array of Objects with Throwing Constructor
|
|
===================================================
|
|
|
|
Consider:
|
|
|
|
.. code-block:: c++
|
|
|
|
class MyClass {
|
|
public:
|
|
MyClass(); // may throw
|
|
~MyClass();
|
|
};
|
|
void doSomething(); // may throw
|
|
void f() {
|
|
MyClass arr[4];
|
|
doSomething();
|
|
}
|
|
|
|
High-level behavior
|
|
-------------------
|
|
|
|
- Construction of ``arr`` proceeds element-by-element. If an element constructor
|
|
throws, destructors must run for any elements that were successfully
|
|
constructed before the throw in reverse order of construction.
|
|
- After full construction, the call to ``doSomething`` may throw, in which case
|
|
the destructors for all constructed elements must run, in reverse order.
|
|
- On normal exit, destructors for all elements run in reverse order.
|
|
|
|
Codegen flow and key components
|
|
-------------------------------
|
|
|
|
- The surrounding compound statement enters a ``CodeGenFunction::LexicalScope``,
|
|
which is a ``RunCleanupsScope`` and is responsible for popping local cleanups
|
|
at the end of the block.
|
|
- ``CodeGenFunction::EmitDecl`` routes the local variable to
|
|
``CodeGenFunction::EmitVarDecl`` and then ``CodeGenFunction::EmitAutoVarDecl``,
|
|
which in turn calls ``EmitAutoVarAlloca``, ``EmitAutoVarInit``, and
|
|
``EmitAutoVarCleanups``.
|
|
- ``CodeGenFunction::EmitCXXAggrConstructorCall`` emits the array constructor
|
|
loop. While emitting the loop body, it enters a ``RunCleanupsScope`` and uses
|
|
``CodeGenFunction::pushRegularPartialArrayCleanup`` to register a
|
|
cleanup before calling ``CodeGenFunction::EmitCXXConstructorCall`` for one
|
|
element in the loop iteration. If this constructor were to throw an exception,
|
|
the cleanup handler would destroy the previously constructed elements in
|
|
reverse order.
|
|
- ``CodeGenFunction::EmitAutoVarCleanups`` calls ``emitAutoVarTypeCleanup``,
|
|
which ultimately registers a ``DestroyObject`` cleanup via
|
|
``CodeGenFunction::pushDestroy`` / ``pushFullExprCleanup`` for the full-array
|
|
destructor path.
|
|
- ``DestroyObject`` uses ``CodeGenFunction::destroyCXXObject``, which emits the
|
|
actual destructor call via ``CodeGenFunction::EmitCXXDestructorCall``.
|
|
- Cleanup emission helpers (e.g., ``CodeGenFunction::PopCleanupBlock`` and
|
|
``CodeGenFunction::EmitBranchThroughCleanup``) thread both normal and EH exits
|
|
through the cleanup blocks as scopes are popped.
|
|
- The cleanup is represented as an ``EHCleanupScope`` on ``EHScopeStack``, and
|
|
its ``Emit`` method generates a loop that calls the destructor on the
|
|
initialized range in reverse order.
|
|
|
|
The above function names and flow are accurate as of LLVM 22.0, but this is
|
|
subject to change as the code evolves, and this document might not be updated to
|
|
reflect the exact functions used.
|
|
|
|
Example: Temporary object materialization
|
|
=========================================
|
|
|
|
Consider:
|
|
|
|
.. code-block:: c++
|
|
|
|
class MyClass {
|
|
public:
|
|
MyClass();
|
|
~MyClass();
|
|
};
|
|
void useMyClass(MyClass &);
|
|
void f() {
|
|
useMyClass(MyClass());
|
|
}
|
|
|
|
High-level behavior
|
|
-------------------
|
|
|
|
- The temporary ``MyClass`` is materialized for the call argument.
|
|
- The temporary must be destroyed at the end of the full-expression, both on
|
|
the normal path and on the exceptional path if ``useMyClass`` throws.
|
|
- If the constructor throws, the temporary is not considered constructed and no
|
|
destructor runs.
|
|
|
|
Codegen flow and key functions
|
|
------------------------------
|
|
|
|
- ``CodeGenFunction::EmitExprWithCleanups`` wraps the full-expression in a
|
|
``RunCleanupsScope`` so that full-expression cleanups are run after the call.
|
|
- ``CodeGenFunction::EmitMaterializeTemporaryExpr`` creates storage for the
|
|
temporary via ``createReferenceTemporary`` and initializes it. For record
|
|
temporaries this flows through ``EmitAnyExprToMem`` and
|
|
``CodeGenFunction::EmitCXXConstructExpr``, which calls
|
|
``CodeGenFunction::EmitCXXConstructorCall``.
|
|
- ``pushTemporaryCleanup`` registers the destructor as a full-expression
|
|
cleanup by calling ``CodeGenFunction::pushDestroy`` for
|
|
``SD_FullExpression`` temporaries.
|
|
- The cleanup ultimately uses ``DestroyObject`` and
|
|
``CodeGenFunction::destroyCXXObject``, which emits
|
|
``CodeGenFunction::EmitCXXDestructorCall``.
|
|
|
|
The above function names and flow are accurate as of LLVM 22.0, but this is
|
|
subject to change as the code evolves, and this document might not be updated to
|
|
reflect the exact functions used.
|