There are cases in which `getEntryIDForSymbol` is called, where the
given Symbol is in a constant island, and so BOLT can not find its
function. This causes BOLT to reach `llvm_unreachable("symbol not
found")` and crash. This patch adds a check that avoids this crash.
`BinaryFunction::translateInputToOutputAddress()` contains fallback
logic in case that querying `IOAddressMap` doesn't yield an output
address. Because this function could be called in scenarios where
`IOAddressMap` won't be set up, we should check if the map actually
exists before lookup.
In relocation mode, keep folded functions in the BinaryFunctions map
instead of erasing them. Mark them as folded using setFolded() and skip
emitting them.
PLT entries are PseudoFunctions, and are not disassembled or emitted.
For BTI, we need to check the first MCInst of PLT entries, to see
if indirectly calling them is safe or not.
This patch disassembles PLTs for binaries using BTI, while not changing
the behaviour for binaries without BTI.
The PLTs are only disassembled, not emitted.
---------
Co-authored-by: Paschalis Mpeis <paschalis.mpeis@arm.com>
If a veneer is not disassembled in lite mode, the veneer elimination
pass will not recognize it as such and the call to such veneer will
remain unchanged.
Later, we may need to insert a new veneer for such code ending up with a
double veneer.
To avoid such suboptimal code generation, always disassemble veneers and
guarantee that they are converted to direct calls in BOLT.
In some edge cases, a binary may contain direct `branch` or `call`
instructions whose target do not point to a valid executable
instruction. This can occur due to compiler bugs, hand-written assembly,
obfuscation technique, **or when control flow targets a data by
mistake.**
We also encountered the problems as described in this
[issue](https://github.com/llvm/llvm-project/issues/149382), where "data
in code" within OpenSSL's hand-written assembly was misidentified as
instructions(island identification seems fail due to the absence of a
corresponding data symbol). The problem occurred because a data sequence
was incorrectly disassembled as a "jb" instruction.
The point here is that the data should not be pointed to by any edge, so
this patch tries to address this by validating the destination address
for **direct branches and calls**. If the target instruction is
invalid(implies a corrupted control flow), this function will be set
ignored.
Although this approach appears helpful for addressing the 'data in code'
problem, its validation might be compromised if the data can be
disassembled as normal instruction.
Validation of data relocations targeting internals of a function was
happening based on offsets inside a function. As a result, if multiple
relocations were targeting the same offset, and one of the relocations
was verified, e.g. as belonging to a jump table, then all relocations
targeting the offset would be considered verified and valid.
Now that we are tracking relocations pointing inside every function, we
can do a better validation based on the location of the relocation.
E.g., if a relocation belongs to a jump table only that relocation will
be accounted for and other relocations pointing to the same address will
be evaluated independently.
Add more heuristics to check if a basic block is an AArch64 epilogue. We
assume instructions that load from stack or adjust stack pointer as
valid epilogue code sequence if and only if they immediately precede the
branch instruction that ends the basic block.
Remove internal undefined symbol tracking and instead rely on the
emission state of `MCSymbol` while processing data-to-code relocations.
Note that `CleanMCState` pass resets the state of all `MCSymbol`s prior
to code emission.
The [previous patch](https://github.com/llvm/llvm-project/pull/163418)
has added a check to prevent adding an entry point into a constant
island, but only for successfully disassembled functions.
Because scanExternalRefs() is also called when a function fails to be
disassembled or is skipped, it can still attempt to add an entry point
at constant islands. The same issue may occur if without a check for it
So, this patch complements the 'constant island' check in
scanExternalRefs().
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing
binaries with pac-ret hardening (#120064)" (#162353)
This reverts commit c7d776b068.
#120064 was reverted for breaking builders.
Fix: changed the mismatched type in MarkRAStates.cpp to `auto`.
---
Original message:
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
whether the current return address has been signed with PAC.
OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.
This patch introduces two new passes:
- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
instruction based on OpNegateRAState CFIs in the input binary.
- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
new OpNegateRAState CFIs where RA state changes between instructions.
Design details are described in: `bolt/docs/PacRetDesign.md`.
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
if the current return address has been signed with PAC.
OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.
This patch introduces two new passes:
- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
instruction based on OpNegateRAState CFIs in the input binary.
- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
new OpNegateRAState CFIs where RA state changes between instructions.
Design details are described in: `bolt/docs/PacRetDesign.md`.
Constant island embedded in text section doesn't have its alignment
information from input binary and we currently set its alignment as
8 bytes. Constant island might be given a much larger alignment due
to performance or other reasons, so this change adds some heuristics
to determine its alignment based on its size, original address from
input binary and its owning section's alignment.
Iterator implementation of PR #156243:
This improves BOLT runtime when optimizing rustc_driver.so from 15
minutes to 7 minutes (or 49 minutes to 37 minutes of userspace time).
Co-authored-by: Mark-Simulacrum <mark.simulacrum@gmail.com>
This patch fixes a bug in BOLT's debug line emission where functions
that belong to multiple compilation units (such as inline functions in
header files) were not handled correctly. Previously, BOLT incorrectly
assumed that a binary function could belong to only one compilation
unit, leading to incomplete or incorrect debug line information.
### **Problem**
When a function appears in multiple compilation units (common scenarios
include):
* Template instantiated functions
* Inline functions defined in header files included by multiple source
files
BOLT would only emit debug line information for one compilation unit,
losing debug information for other CUs where the function was compiled.
This resulted in incomplete debugging information and could cause
debuggers to fail to set breakpoints or show incorrect source locations.
### **Root Cause**
The issue was in BOLT's assumption that each binary function maps to
exactly one compilation unit. However, when the same function (e.g., an
inline function from a header) is compiled into multiple object files,
it legitimately belongs to multiple CUs in the final binary.
Jump tables may contain entries that point immediately past the end of
their parent function. Normally, such entries are generated by the
compiler as a result of builtin_unreachable() case. We used to replace
those entries with a label belonging to their parent function assuming
the destination doesn't matter if it's an undefined behavior.
However, if such entry is at the end of the jump table, it could be a
real function pointer, not a jump table entry. We rely on heuristics to
detect such cases and can drop the trailing function pointer entries
from the table.
The problem presents when the "unreachable" ambiguous entry is followed
by another ambiguous entry corresponding to the start of the parent
function. In this case we accept pointers as entries and may incorrectly
update the function pointer.
The solution is to keep ambiguous "unreachable" jump table entries
identical to the original input, i.e. point to the same function. This
change does not affect CFG, but results in the entries being updated
with the new function address if it gets relocated.
The MCSymbolRefExpr::create overload with the specifier parameter is
discouraged and being phased out. Expressions with relocation specifiers
should use MCSpecifierExpr instead.
Record the number of function invocations from external code - code
outside the binary, which may include JIT code and DSOs. Accounting
external entry counts improves the fidelity of call graph flow
conservation analysis.
Test Plan: updated shrinkwrapping.test
When we call setIgnored() on functions that already have CFG built,
these functions are not going to get emitted and we risk missing
external function references being updated.
To mitigate the potential issues, run scanExternalRefs() on such
functions to create patches/relocations.
Since scanExternalRefs() relies on function relocations, we have to
preserve relocations until the function is emitted. As a result, the
memory overhead without debug info update could reach up to 2%.
We should never call fixBranches() on a function with invalid CFG. E.g.,
ValidateInternalCalls modifies CFG for its internal analysis purposes.
At the same time, it marks the function as non-simple with an assumption
that fixBranches() will never run on that function.
However, calculateEmittedSize() by default calls fixBranches() which can
lead to all sorts of issues, including assertions firing in
fixBranches().
The fix is to use the original size for non-simple functions in
calculateEmittedSize() since we are supposed to emit the function
unmodified. Additionally, add an assertion at the start of
fixBranches().
When conditional tail call is located in old code while BOLT is
operating in lite mode, the call will require optional pending
relocation with a type that is currently not supported resulting in a
build-time crash.
Before a proper fix is implemented, ignore conditional tail calls for
relocation purposes and mark their target functions to be patched, i.e.
to be served as veneers/thunks.
Sample is a general term covering both basic (IP) and branch (LBR)
profiles. Find and replace ambiguous uses of sample in a basic sample
sense.
Rename `RawBranchCount` into `RawSampleCount` reflecting its use for
both kinds of profile.
Rename `PF_LBR` profile type as `PF_BRANCH` reflecting non-LBR based
branch profiles (non-brstack SPE, synthesized brstack ETM/PT).
Follow-up to #137644.
Test Plan: NFC
On AArch64, we create optional/weak relocations that may not be
processed due to the relocated value overflow. When the overflow
happens, we used to enforce patching for all functions in the binary via
--force-patch option. This PR relaxes the requirement, and enforces
patching only for functions that are target of optional relocations.
Moreover, if the compact code model is used, the relocation overflow is
guaranteed not to happen and the patching will be skipped.
Some functions have their sizes as zero in input binary's symbol
table, like those compiled by assembler. When figuring out function
sizes, we may create label symbol if it doesn't point to any constant
island. However, before function size is known, marker symbol can
not be correctly associated to a function and therefore all such
checks would fail and we could end up adding a code label pointing
to constant island as secondary entry point and later mistakenly
marking the function as not simple.
Querying the global marker symbol array has big throughput overhead.
Instead we can run an extra check when post processing entry points
to identify such label symbols that actually point to constant islands.
When a pending relocation is created it is also marked whether it is
optional or not. It can be optional when such relocation is added as
part of an optimization (i.e., `scanExternalRefs`).
When bolt tries to `flushPendingRelocations`, it safely skips any
optional relocations that cannot be encoded due to being out of
range. A pre-requisite to that is the usage of the `-force-patch`
flag. Alternatrively, BOLT will bail out with a relevant message.
Background:
BOLT, as part of scanExternalRefs, identifies external references from
calls and creates some pending relocations for them. Those when
flushed will update references to point to the optimized functions.
This optimization can be disabled using `--no-scan`.
BOLT can assert if any of these pending relocations cannot be encoded.
This patch does not disable this optimization but instead selectively
applies it given that a pending relocation is optional and `-force-patch`
was enabled.
In lite mode, we only emit code for a subset of functions while
preserving the original code in .bolt.org.text. This requires updating
code references in non-emitted functions to ensure that:
* Non-optimized versions of the optimized code never execute.
* Function pointer comparison semantics is preserved.
On x86-64, we can update code references in-place using "pending
relocations" added in scanExternalRefs(). However, on AArch64, this is
not always possible due to address range limitations and linker address
"relaxation".
There are two types of code-to-code references: control transfer (e.g.,
calls and branches) and function pointer materialization.
AArch64-specific control transfer instructions are covered by #116964.
For function pointer materialization, simply changing the immediate
field of an instruction is not always sufficient. In some cases, we need
to modify a pair of instructions, such as undoing linker relaxation and
converting NOP+ADR into ADRP+ADD sequence.
To achieve this, we use the instruction patch mechanism instead of
pending relocations. Instruction patches are emitted via the regular MC
layer, just like regular functions. However, they have a fixed address
and do not have an associated symbol table entry. This allows us to make
more complex changes to the code, ensuring that function pointers are
correctly updated. Such mechanism should also be portable to RISC-V and
other architectures.
To summarize, for AArch64, we extend the scanExternalRefs() process to
undo linker relaxation and use instruction patches to partially
overwrite unoptimized code.
Instead of filtering and modifying relocations in readRelocations(),
preserve the relocation info and use it in the symbolizing disassembler.
This change mostly affects AArch64, where we need to look at original
linker relocations in order to properly symbolize instruction operands.
Add AArch64MCSymbolizer that symbolizes `MCInst` operands during
disassembly. The symbolization was previously done in
`BinaryFunction::disassemble()`, but it is also required by
`scanExternalRefs()` for "lite" mode functionality. Hence, similar to
x86, I've implemented the symbolizer interface that uses
`BinaryFunction` relocations to properly create instruction operands. I
expect the result of the disassembly to be identical after the change.
AArch64 disassembler was not calling `tryAddingSymbolicOperand()` for
`MOV` instructions. Fix that. Additionally, the disassembler marks `ldr`
instructions as branches by setting `IsBranch` parameter to true. Ignore
the parameter and rely on `MCPlusBuilder` interface instead.
I've modified `--check-encoding` flag to check symolization of operands
of instructions that have relocations against them.
In analyzeInstructionForFuncReference(), use MCPlusBuilder interface
while scanning symbolic operands of MCInst. Should be NFC on x86, but
will make the function work on other architectures. Note that it's
currently unused on non-x86 as its functionality is exclusive to safe
ICF that runs on x86 only.
BOLT used to mark multi-entry functions non-simple in non-relocation
mode with the reasoning that we can't move them due to potentially
undetected references. However, in aggregation mode it doesn't apply as
BOLT doesn't perform optimizations.
Relax this constraint in case of an aggregation job.
Test Plan: added entry-point-fallthru.s
Sometimes we need to know the size of a symbol besides its address, so
maybe we can start using the existing `BOLTLinker::lookupSymbolInfo()`
(that returns symbol address and size) and remove
`BOLTLinker::lookupSymbol()` (that only returns symbol address). And for
both we need to check return value as it is wrapped in `std::optional<>`,
which makes the difference even smaller.