The MAI member is non-null. #194280 made this clearer by making the
MCContext constructor take MCAsmInfo by reference. Convert getAsmInfo to
return const MCAsmInfo & and the member to a reference.
Both MCContext::MCContext and TargetMachine::getMCAsmInfo treat
MCAsmInfo as a pointer that must be non-null. Make the contract
explicit:
* MCContext's constructor takes `const MCAsmInfo &MAI`.
* TargetMachine::getMCAsmInfo returns `const MCAsmInfo &`.
Make this change now since the MCContext ctor has recently been updated.
Since #180464 the canonical MCTargetOptions pointer is stored in
MCAsmInfo, but it is bound after construction via `setTargetOptions`
called from TargetRegistry::createMCAsmInfo.
Direct constructions in unit tests can leave the pointer null, leading
to a runtime assert failure. Add MCTargetOptions to every MCAsmInfo
subclass constructor, store it as a reference in MCAsmInfo, and remove
`setTargetOptions()`.
BOLT hardcoded 4-byte LSDA (exception table) encoding for x86-64. This
is insufficient for large code model binaries where functions in .ltext
sections may be placed at addresses above 2GB, exceeding the range of
DW_EH_PE_udata4/DW_EH_PE_sdata4 encodings.
Detect large code model by checking for .ltext sections
(SHF_X86_64_LARGE) and update LSDAEncoding to use 8-byte pointers:
- Non-PIC: DW_EH_PE_absptr (8-byte absolute)
- PIC: DW_EH_PE_pcrel | DW_EH_PE_sdata8 (8-byte PC-relative)
This was pulled out from
https://github.com/llvm/llvm-project/pull/190637
`AddressSize` parameter is not used by `DataExtractor` and will be
removed in the future. See #190519 for more context.
I took the liberty of switching from using the `StringRef` constructor
overload to `ArrayRef` where appropriate.
Most clients don't have a notion of "address" and pass arbitrary values
(including `0` and `sizeof(void *)`) to `DataExtractor` constructors.
This makes address-extraction methods dangerous to use.
Those clients that do have a notion of address can use other methods
like `getUnsigned()` to extract an address, or they can derive from
`DataExtractor` and add convenience methods if extracting an address is
routine. `DWARFDataExtractor` is an example, where the removed methods
were actually moved.
This does not remove `AddressSize` argument of `DataExtractor`
constructors yet, but makes it unused and overloads constructors in
preparation for their deletion. I'll be removing uses of the
to-be-deleted constructors in follow-up patches.
Except MC-internal `MCAsmInfo()` uses, MCAsmInfo is always constructed
with `const MCTargetOptions &` via `TargetRegistry::createMCAsmInfo`
(https://reviews.llvm.org/D41349). Store the pointer in MCAsmInfo and
change `MCContext::getTargetOptions()` to retrieve it from there,
removing the `MCTargetOptions const *TargetOptions` member from
MCContext.
MCContext's constructor still accepts an MCTargetOptions parameter
for now but is often omitted by call sites.
A subsequent change will remove this parameter and update all callers.
The "private global" terminology, likely came from
llvm/lib/IR/Mangler.cpp, is misleading: "private" is the opposite of
"global", and these prefixed symbols are not global in the object file
format sense (e.g. ELF has STB_GLOBAL while these symbols are always
STB_LOCAL). The term "internal symbol" better describes their purpose:
symbols for internal use by compilers and assemblers, not meant to be
visible externally.
This rename is a step toward adopting the "internal symbol prefix"
terminology agreed with GNU as
(https://sourceware.org/pipermail/binutils/2026-March/148448.html).
There are cases in which `getEntryIDForSymbol` is called, where the
given Symbol is in a constant island, and so BOLT can not find its
function. This causes BOLT to reach `llvm_unreachable("symbol not
found")` and crash. This patch adds a check that avoids this crash.
BOLT currently strips all STT_NOTYPE STB_LOCAL zero-sized symbols
that fall inside function bodies. Certain such symbols are named
labels (loop markers and subroutine entry points) or local function
symbols in hand-written assembly. We now keep them in local symbol
table in BOLT processed binaries for better symbolication.
`BinaryFunction::translateInputToOutputAddress()` contains fallback
logic in case that querying `IOAddressMap` doesn't yield an output
address. Because this function could be called in scenarios where
`IOAddressMap` won't be set up, we should check if the map actually
exists before lookup.
In relocation mode, keep folded functions in the BinaryFunctions map
instead of erasing them. Mark them as folded using setFolded() and skip
emitting them.
PLT entries are PseudoFunctions, and are not disassembled or emitted.
For BTI, we need to check the first MCInst of PLT entries, to see
if indirectly calling them is safe or not.
This patch disassembles PLTs for binaries using BTI, while not changing
the behaviour for binaries without BTI.
The PLTs are only disassembled, not emitted.
---------
Co-authored-by: Paschalis Mpeis <paschalis.mpeis@arm.com>
Corrected various spelling mistakes such as 'occurred', 'receiver',
'initialized', 'length', and others in comments, variable names,
function names, and documentation throughout the project. These
changes improve code readability and maintain consistency in naming
and documentation.
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
There is no guarantee that PatchOffset is suitably aligned for uint32_t,
and in BOLT's own tests, it is not aligned for uint32_t.
Fixes test failures seen with LLVM_USE_SANITIZER=Undefined.
Add hashing support for MCSpecifierExpr, which is frequently used as an
operand in AArch64 instructions. Proper hashing reduces collisions when
placing functions into buckets, resulting in shorter Identical Code
Folding (ICF) pass runtime.
In one benchmark, the ICF wall time decreased from 272s to 124s.
To gain better control over the functions that go into the output file
and their order, introduce `BinaryContext::getOutputBinaryFunctions()`.
The new API returns a modifiable list of functions in output order.
This list is filled by a new `PopulateOutputFunctions` pass and includes
emittable functions from the input file, plus functions added by BOLT
(injected functions).
The new functionality allows to freely intermix input functions with
injected ones in the output, which will be used in new PRs.
The new function replaces `BinaryContext::getSortedFunctions()`, but
unlike its predecessor, it includes injected functions in the returned
list.
Assign output sections for injected functions explicitly, and don't
reassign in AssignSections pass.
This change is a prerequisite for further PRs where veneer functions are
created as injected functions and their code section depends on their
placement.
If a veneer is not disassembled in lite mode, the veneer elimination
pass will not recognize it as such and the call to such veneer will
remain unchanged.
Later, we may need to insert a new veneer for such code ending up with a
double veneer.
To avoid such suboptimal code generation, always disassemble veneers and
guarantee that they are converted to direct calls in BOLT.
In some edge cases, a binary may contain direct `branch` or `call`
instructions whose target do not point to a valid executable
instruction. This can occur due to compiler bugs, hand-written assembly,
obfuscation technique, **or when control flow targets a data by
mistake.**
We also encountered the problems as described in this
[issue](https://github.com/llvm/llvm-project/issues/149382), where "data
in code" within OpenSSL's hand-written assembly was misidentified as
instructions(island identification seems fail due to the absence of a
corresponding data symbol). The problem occurred because a data sequence
was incorrectly disassembled as a "jb" instruction.
The point here is that the data should not be pointed to by any edge, so
this patch tries to address this by validating the destination address
for **direct branches and calls**. If the target instruction is
invalid(implies a corrupted control flow), this function will be set
ignored.
Although this approach appears helpful for addressing the 'data in code'
problem, its validation might be compromised if the data can be
disassembled as normal instruction.
Validation of data relocations targeting internals of a function was
happening based on offsets inside a function. As a result, if multiple
relocations were targeting the same offset, and one of the relocations
was verified, e.g. as belonging to a jump table, then all relocations
targeting the offset would be considered verified and valid.
Now that we are tracking relocations pointing inside every function, we
can do a better validation based on the location of the relocation.
E.g., if a relocation belongs to a jump table only that relocation will
be accounted for and other relocations pointing to the same address will
be evaluated independently.
Rename passes to names that better reflect their intent,
and describe their relationship to each other.
InsertNegateRAStatePass renamed to PointerAuthCFIFixup,
MarkRAStates renamed to PointerAuthCFIAnalyzer.
Added the --print-<passname> flags for these passes.
Add more heuristics to check if a basic block is an AArch64 epilogue. We
assume instructions that load from stack or adjust stack pointer as
valid epilogue code sequence if and only if they immediately precede the
branch instruction that ends the basic block.
- unify isRAStateSigned and isRAStateUnsigned to a common getRAState,
- unify setRASigned and setRAUnsigned into setRAState(MCInst, bool),
- update users of these to match the new implementations.
In addition to tracking offsets inside a `BinaryFunction` that are
referenced by data relocations, we need to track those relocations too.
Plus, we will need to map symbols referenced by such relocations back to
the containing function.
This change introduces `BinaryFunction::InternalRefDataRelocations` to
track the aforementioned relocations and expands
`BinaryContext::SymbolToFunctionMap` to include local/temp symbols
involved in relocation processing.
There is no functional change introduced that should affect the output.
Future PRs will use the new tracking capabilities.
Remove internal undefined symbol tracking and instead rely on the
emission state of `MCSymbol` while processing data-to-code relocations.
Note that `CleanMCState` pass resets the state of all `MCSymbol`s prior
to code emission.
In C++17, static constexpr members are implicitly inline, so they no
longer require an out-of-line definition.
Identified with readability-redundant-declaration.
We are skipping four zero's at a time when validating code padding in
case that the next zero would be part of an instruction or constant
island, and for functions that have large amount of padding (like due to
hugify), this could be very slow. We now change the validation to skip
as many as possible but still need to be 4's exact multiple number of
zero's. No valid instruction has encoding as 0x00000000 and even if we
stumble into some constant island, the API
`BinaryFunction::isInConstantIsland()` has been made to find the size
between the asked address and the end of island (#164037), so this
should be safe.
Enumeration of relocation types is not always sequential, e.g. on
AArch64 the first real relocation type is 0x101. As such, the existing
code in `Relocation::print()` was crashing while printing AArch64
relocations. Fix it by using `llvm::object::getELFRelocationTypeName()`.
Avoid cloning constant island helps to reduce app size, especially for
BOLT optimization in which cloning would happen when a function is split
into multiple fragments. Add an option to make the cloning optional, and
we will introduce a new pass to handle the reference too far error that
may result from disabling constant island cloning (#165787).
The [previous patch](https://github.com/llvm/llvm-project/pull/163418)
has added a check to prevent adding an entry point into a constant
island, but only for successfully disassembled functions.
Because scanExternalRefs() is also called when a function fails to be
disassembled or is skipped, it can still attempt to add an entry point
at constant islands. The same issue may occur if without a check for it
So, this patch complements the 'constant island' check in
scanExternalRefs().
There are cases where `addEntryPointAtOffset` is called with a given
`Offset` that points to an address within a constant island. This
triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT
to crash. This patch adds a check which ignores functions that would add
such entry points and warns the user.
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing
binaries with pac-ret hardening (#120064)" (#162353)
This reverts commit c7d776b068.
#120064 was reverted for breaking builders.
Fix: changed the mismatched type in MarkRAStates.cpp to `auto`.
---
Original message:
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
whether the current return address has been signed with PAC.
OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.
This patch introduces two new passes:
- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
instruction based on OpNegateRAState CFIs in the input binary.
- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
new OpNegateRAState CFIs where RA state changes between instructions.
Design details are described in: `bolt/docs/PacRetDesign.md`.
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
if the current return address has been signed with PAC.
OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.
This patch introduces two new passes:
- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
instruction based on OpNegateRAState CFIs in the input binary.
- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
new OpNegateRAState CFIs where RA state changes between instructions.
Design details are described in: `bolt/docs/PacRetDesign.md`.
R_AARCH64_TLSDESC_CALL is a relocation emitted as a hint for a linker to
replace `blr r` instruction with nop. BOLT does not currently require
any special handling for it.
Note that previously existing extraction of the relocated value was
incorrect.