1414 Commits

Author SHA1 Message Date
Amina Chabane
3232d38a59 [BOLT][AArch64] Refuse to run RegReAssign pass (#194866)
RegReAssign hits an unreachable on AArch64 as it is a pass
(conceptually) specific to X86.

- Add a guard to RegReAssign for non-X86
- Update unsupported-passes.test
2026-04-30 14:37:27 +01:00
Amina Chabane
dddd0da8e6 [BOLT][AArch64] Refuse to run IndirectCallPromotion pass (#194363)
`--icp=<value>`/`--indirect-call-promotion=<value>` results in an
`UNIMPLEMENTED` crash when invoked as it is unimplemented in AArch64.

- Guard IndirectCallPromotion for non-X86
- Update unsupported-passes.test with expected error
2026-04-28 18:39:15 +01:00
Fangrui Song
c8ff86259b [CodeGen] Make AsmPrinter::MAI a reference. NFC (#194538)
AsmPrinter::MAI is non-null. This is made more explicit after
PR #194523 changed TargetMachine::getMCAsmInfo to return a reference
with recent MCAsmInfo/MCTargetOptions related refactoring.

Convert the member from const MCAsmInfo * to const MCAsmInfo & and
update all consumers.
2026-04-28 05:27:22 +00:00
Fangrui Song
f61e1e46ff [MC] Make MCContext::getAsmInfo return a reference. NFC (#194523)
The MAI member is non-null. #194280 made this clearer by making the
MCContext constructor take MCAsmInfo by reference. Convert getAsmInfo to
return const MCAsmInfo & and the member to a reference.
2026-04-28 03:57:47 +00:00
Amina Chabane
bd7cd403db [BOLT][AArch64] Refuse to run CMOVConversion pass (#193998)
`--cmov-conversion` is unsupported in AArch64 as
convertMoveToConditionalMove() is only overriden for X86.

- Add a guard for non-X86
- Update unsupported-passes.test with expected error
2026-04-27 12:09:56 +01:00
Fangrui Song
13e98d8341 [MC] Take MCAsmInfo by reference in MCContext and TargetMachine. NFC (#194280)
Both MCContext::MCContext and TargetMachine::getMCAsmInfo treat
MCAsmInfo as a pointer that must be non-null. Make the contract
explicit:

* MCContext's constructor takes `const MCAsmInfo &MAI`.
* TargetMachine::getMCAsmInfo returns `const MCAsmInfo &`.

Make this change now since the MCContext ctor has recently been updated.
2026-04-27 07:48:54 +00:00
Fangrui Song
33f2036f35 [MC] Add MCTargetOptions to MCAsmInfo constructor. NFC (#194200)
Since #180464 the canonical MCTargetOptions pointer is stored in
MCAsmInfo, but it is bound after construction via `setTargetOptions`
called from TargetRegistry::createMCAsmInfo.

Direct constructions in unit tests can leave the pointer null, leading
to a runtime assert failure. Add MCTargetOptions to every MCAsmInfo
subclass constructor, store it as a reference in MCAsmInfo, and remove
`setTargetOptions()`.
2026-04-26 05:52:32 +00:00
Amir Ayupov
46154fef0e [BOLT] Support negative hex in pre-aggregated profile (#192391)
Handle signed values in parseHexField by falling back to int64_t parsing
when uint64_t fails. This allows pre-aggregated profile tools to use -1
for BR_ONLY, -2 for FT_EXTERNAL_ORIGIN, -3 for FT_EXTERNAL_RETURN.

Guard the external address reset loop in parseAggregatedLBREntry to
preserve sentinel values (offsets >= FT_EXTERNAL_RETURN).

Add tests for -1/-2/-3 in parseHexField and T entries with -1,
ffffffffffffffff, and buildid:-1 as BR_ONLY.
2026-04-25 23:07:07 +00:00
Amina Chabane
8baf33522d [BOLT][AArch64] Refuse to run JTFootprintReduction pass (#193946)
JTFootprintReduction results in a no-op on AArch64. This is because it
emits createIJmp32Frag() which is unimplemented for AArch64 and is only
overridden by x86.

- Add a guard for non-x86
- Update unsupported-passes.test with expected error message
2026-04-24 14:50:02 +01:00
Amina Chabane
6a06c8bdcb [BOLT][AArch64] Refuse to run ThreeWayBranch pass (#193252)
On AArch64, `--three-way-branch` produces a crash as it is not
implemented. This patch adds a guard
and updates relevant test(s).
2026-04-23 09:00:26 +01:00
Farid Zakaria
6ef1b80fef [BOLT] Fix null pointer dereference in DWP processing with split DWARF (#191474)
Fix two null pointer dereferences in BOLT's DWP processing path that
cause SIGSEGV in worker threads when -update-debug-sections is used with
a co-located .dwp file.

1. getSliceData() in updateDebugData() dereferences the result of
getContribution() without checking for null. getContribution() returns
nullptr when the requested section kind (e.g. DW_SECT_LINE) is not
present as a column in the DWP CU index. When BOLT processes a DWP where
certain section kinds are absent from the index, every worker thread
that hits this path crashes simultaneously.

2. processSplitCU() dereferences getUnitDIEbyUnit() without checking for
null. If buildDWOUnit() fails for a CU, the returned DIE* is null and
the dereference crashes.

Crash signature from dmesg:
```
  llvm-worker-*: segfault at 8 ip <offset> error 4 in llvm-bolt
  (multiple worker threads crash at the same instruction)
```
The faulting address 0x8 corresponds to accessing the Length field
(offset 8) of a null `DWARFUnitIndex::Entry::SectionContribution*`.

At Meta, I reproduced this building hhvm with a co-located .dwp file and
the flags `update-debug-sections -debug-thread-count=80 -lite=0` with
profile data.

I confirmed that the unfixed BOLT crashes deterministically whereas the
fixed BOLT completes successfully.
2026-04-22 13:02:57 -07:00
Hemant Kulkarni
c8b526f76b [bolt] AArch64: Fix TLSDESC to LE relaxation by mold (#190370)
mold linker creates relaxation stub from TLSDESC to LE, (lld makes it
IE) using sequence as NOP+NOP+MOVZ+MOVK. This in itself is not an issue,
when --emit-relocs is added the relocs R_AARCH64_TLSDESC_ADD_LO12 and
R_AARCH64_TLSDESC_CALL are associated with useful MOVW instructions.
However bolt does not check for R_AARCH64_TLSDESC_ADD_LO12 in
adjustRelocation() when disassembling the file. This later triggers a
bug when reloc is patched as movk is patched with S_LO12 fixup kind
which is invalid.

Refer to bug: https://github.com/llvm/llvm-project/issues/190366 for
details.
2026-04-22 14:59:11 +01:00
Rafael Auler
11515959b5 [BOLT] Fix stream position before appendPadding in writeEHFrameHeader (#193126)
When writeEHFrameHeader needs to allocate new space for .eh_frame_hdr
(because the old section is too small), it calls appendPadding to align
NextAvailableAddress. appendPadding writes zero bytes at the current
stream position, but after the section write loop in rewriteFile the
stream is positioned at the end of the last section written in
BinarySection::operator< order — not at the file offset corresponding to
NextAvailableAddress.

In the common case (single loadObject call) the write order matches file
offset order, so the stream happens to be in the right place. But when a
runtime library adds sections via additional loadObject calls, the
operator< iteration order (code-before-data) can diverge from file
offset order: a runtime library code section may have a higher file
offset than a runtime library data section that comes after it in the
write loop. The stream then ends at a lower offset than expected, and
appendPadding's zeros overwrite the beginning of the code section.

Fix by seeking to the correct file offset before calling appendPadding.
2026-04-21 13:10:52 -07:00
David CARLIER
2c56a63b49 [BOLT][Passes] switch remaining Instrumentation containers to ADT. (#192525)
Follow-up to #192289. Swap the remaining `std::unordered_set`/
`std::unordered_map` containers in `Instrumentation.cpp` for `DenseSet`/
`DenseMap`: the `BBToSkip` param and `Visited` local in
`hasAArch64ExclusiveMemop`, and `BBToSkip`, `BBToID`, `VisitedSet` in
`instrumentFunction`. Drop the now-unused `<unordered_set>` include.

The swap removes per-element heap allocations on the hot path, stops
inserting empty buckets on probes where a miss is possible, and replaces
hashed-bucket traversal over node-based storage with lookups over inline
`DenseMap` storage. `BBToID` reads keep `operator[]` since the map is
pre-populated for every basic block of the function, so no
default-construct path is ever taken. NFC.

Measured on `llvm-bolt -instrument` against a relocations-linked
clang-23: -1.3% instrumentation-pass wall time, peak RSS unchanged
(dominated by instrumentation output size).
2026-04-16 21:44:59 +01:00
Farid Zakaria
ec1e3aef9a [BOLT] Update LSDA encoding for x86-64 large code model (#190685)
BOLT hardcoded 4-byte LSDA (exception table) encoding for x86-64. This
is insufficient for large code model binaries where functions in .ltext
sections may be placed at addresses above 2GB, exceeding the range of
DW_EH_PE_udata4/DW_EH_PE_sdata4 encodings.

Detect large code model by checking for .ltext sections
(SHF_X86_64_LARGE) and update LSDAEncoding to use 8-byte pointers:
- Non-PIC: DW_EH_PE_absptr (8-byte absolute)
- PIC: DW_EH_PE_pcrel | DW_EH_PE_sdata8 (8-byte PC-relative)

This was pulled out from
https://github.com/llvm/llvm-project/pull/190637
2026-04-16 00:34:08 -07:00
David CARLIER
5b979f51e3 [BOLT][Passes] use ADT containers for instrumentation spanning tree. (#192289)
Swap `std::unordered_map<…, std::set<…>>` for
`DenseMap<…, SmallVector<…>>` in `Instrumentation::instrumentFunction`
and switch read paths from `STOutSet[&BB]` to `find()`. This removes
per-set heap allocations, stops inserting empty buckets on every probe,
and replaces linear `is_contained()` scans over a red-black tree with
linear scans over inline `SmallVector` storage (most basic blocks have
at most a couple of spanning-tree out-edges). NFC.
2026-04-15 23:16:57 +01:00
Sergei Barannikov
f4e1a51d10 [bolt] Remove unused argument of DataExtractor constructor (NFC) (#191841)
`AddressSize` parameter is not used by `DataExtractor` and will be
removed in the future. See #190519 for more context.

I took the liberty of switching from using the `StringRef` constructor
overload to `ArrayRef` where appropriate.
2026-04-14 08:13:54 +03:00
Sergei Barannikov
b6ff43f1ec [Support] Remove address-extraction methods from DataExtractor (NFC) (#190519)
Most clients don't have a notion of "address" and pass arbitrary values
(including `0` and `sizeof(void *)`) to `DataExtractor` constructors.
This makes address-extraction methods dangerous to use.

Those clients that do have a notion of address can use other methods
like `getUnsigned()` to extract an address, or they can derive from
`DataExtractor` and add convenience methods if extracting an address is
routine. `DWARFDataExtractor` is an example, where the removed methods
were actually moved.

This does not remove `AddressSize` argument of `DataExtractor`
constructors yet, but makes it unused and overloads constructors in
preparation for their deletion. I'll be removing uses of the
to-be-deleted constructors in follow-up patches.
2026-04-13 16:44:51 +03:00
Brian Cain
8215fb02a6 [BOLT] Fix iterator bugs (#190978)
Fix iterator misuse in four BOLT passes, caught by _GLIBCXX_DEBUG
(enabled via LLVM_ENABLE_EXPENSIVE_CHECKS=ON).

* AllocCombiner: combineAdjustments() erases instructions while
iterating in reverse via llvm::reverse(BB), invalidating the reverse
iterator. Defer erasures to after the loop using a SmallVector.
* ShrinkWrapping: processDeletions() uses
std::prev(BB.eraseInstruction(II)) which is undefined when II ==
begin(). Restructure to standard forward iteration with erase.
* DataflowAnalysis: run() unconditionally dereferences BB->rbegin(),
which crashes on empty basic blocks (possible after the ShrinkWrapping
fix). Guard with an emptiness check.
* IndirectCallPromotion: rewriteCall() dereferences the end iterator via
&(*IndCallBlock.end()). Replace with &IndCallBlock.back().
* TailDuplication: constantAndCopyPropagate() uses
std::prev(OriginalBB.eraseInstruction(Itr)) which is undefined when Itr
== begin(). Restructure to standard forward iteration with erase.
2026-04-10 14:17:28 +00:00
wangjue
adb986a71c [BOLT][RISCV] Fix the inaccurate profile data check (#189338) 2026-04-10 08:47:41 +03:00
Amir Ayupov
10353899af [BOLT] Use identify_magic for shared library detection (#190902)
Replace the fragile filename-based check (ends_with(".so")) with
identify_magic()/file_magic::elf_shared_object to reliably detect
shared libraries when filtering pre-aggregated profile data by
build ID.

Test Plan: pre-aggregated-perf-shlib.test
2026-04-09 15:48:37 -07:00
Brian Cain
5fe235b986 [BOLT] Fix strict weak ordering in getCodeSections comparator (#190905)
The compareSections lambda in getCodeSections() violates the strict weak
ordering requirement: when A == B, the comparator can return true (e.g.
via the HotText mover name check), which triggers a _GLIBCXX_DEBUG
assertion on self-comparison.

Add an early identity check to satisfy irreflexivity.
2026-04-08 08:56:21 -05:00
Fangrui Song
1578bc684e [MC] Move MCTargetOptions pointer from MCContext to MCAsmInfo (#180464)
Except MC-internal `MCAsmInfo()` uses, MCAsmInfo is always constructed
with `const MCTargetOptions &` via `TargetRegistry::createMCAsmInfo`
(https://reviews.llvm.org/D41349). Store the pointer in MCAsmInfo and
change `MCContext::getTargetOptions()` to retrieve it from there,
removing the `MCTargetOptions const *TargetOptions` member from
MCContext.

MCContext's constructor still accepts an MCTargetOptions parameter
for now but is often omitted by call sites.
A subsequent change will remove this parameter and update all callers.
2026-04-08 04:35:58 +00:00
Shanzhi Chen
c7c902574c [BOLT][AArch64] Optimize the mov-imm-to-reg operation (#189304)
On AArch64, logical immediate instructions are used to encode some
special immediate values. And even at `-O0` level, the AArch64 backend
would not choose to generate 4 instructions (movz, movk, movk, movk) for
moving such a special value to a 64-bit regiter.

For example, to move the 64-bit value `0x0001000100010001` to `x0`, the
AArch64 backend would not choose a 4-instruction-sequence like
```
movz x0, 0x0001
movk x0, 0x0001, lsl 16
movk x0, 0x0001, lsl 32
movk x0, 0x0001, lsl 48
```
Actually, the AArch64 backend would choose to generate one instruction
```
mov x0, 0x0001000100010001
```
which is essentially
```
orr x1, xzr, 0x0001000100010001
```

We could refer to `AArch64ExpandPseudoImpl::expandMOVImm` and
`AArch64_IMM::expandMOVImm` for related implementation.

Therefore, maybe we could consider to leverage `expandMOVImm` in llvm to
optimize the mov-imm-to-reg operation in BOLT, which would help to speed
up the BOLT-instrumented binary.
2026-04-07 12:28:36 -07:00
Amir Ayupov
a8cf1a0352 [BOLT] Allow empty buildid in pre-aggregated profile addresses (#190675)
Allow `parseString()` to return an empty `StringRef` when the delimiter
appears at position 0. This enables parsing pre-aggregated profile
addresses with an omitted buildid but preserved colon (`:addr` format),
where the empty buildid corresponds to the main binary.

Previously, `parseString()` rejected zero-length fields by treating
`StringEnd == 0` the same as `StringRef::npos` (delimiter not found).
These are distinct situations: `npos` means no delimiter exists, while
`0` means the field before the delimiter is empty. The fix removes the
`StringEnd == 0` sub-condition so only the missing-delimiter case
errors.

The existing test for buildid-prefixed addresses is extended to also
verify that `:addr` input produces identical output to the plain-address
and non-empty-buildid variants.

Test Plan:
Added empty-buildid input file and extended
`pre-aggregated-perf-buildid.test` to run perf2bolt with `:addr` format
and diff the fdata output against the existing buildid-prefixed result.
2026-04-06 14:41:21 -07:00
Yashwant Singh
5e14916fa6 Early exit llvm-bolt when coming across empty data files (#176859)
perf2bolt generates empty fdata files for small binaries and right now
BOLT does this check while parsing by calling `((!hasBranchData() &&
!hasMemData()))`. Instead, early exit as soon as the buffer finishes
reading the data file and exit with error message.
2026-04-06 09:37:05 +05:30
Brian Cain
98ced6cfd0 [BOLT] Template patchELFPHDRTable and rewriteNoteSections for ELF32 (#189715)
Template patchELFPHDRTable, rewriteNoteSections, markGnuRelroSections,
and discoverStorage to support both ELF32LE and ELF64LE binaries.
Previously these functions were hardcoded for ELF64LE, causing crashes
when processing 32-bit ELF binaries.

The RewriteInstance constructor now accepts ELF32LE objects in addition
to ELF64LE. The ELF_FUNCTION macro is reused (and moved earlier in the
header) to dispatch to the correct template instantiation.

These changes are preparation for adding support to hexagon architecture
in Bolt.
2026-04-03 15:16:31 -05:00
Rafael Auler
7da3a66c06 [BOLT] Check for write errors before keeping output file (#190359)
Summary:
When the disk runs out of space during output file writing, BOLT would
crash with SIGSEGV/SIGABRT because raw_fd_ostream silently records write
errors and only reports them via abort() in its destructor. This made it
difficult to distinguish real BOLT bugs from infrastructure issues in
production monitoring.

Add an explicit error check on the output stream before calling
Out->keep(), so BOLT exits cleanly with exit code 1 and a clear error
message instead.

Test: manually verified with a full filesystem that BOLT now prints
"BOLT-ERROR: failed to write output file: No space left on device" and
exits with code 1.
2026-04-03 10:02:36 -07:00
Alexandros Lamprineas
64b728128d [BOLT][AArch64] Add minimal support for liveness analysis. (#183298)
In this patch I am adding the missing target hooks required for the
liveness analysis to run on AArch64. These are
 - getFlagsReg()
 - getRegsUsedAsParams()
 - getDefaultLiveOut()
 - getGPRegs()
 - isCleanRegXOR()

I am also introducing the following API in LivenessAnalysis
 - BitVector getLiveIn/Out(const MCInst &)
 - MCPhysReg scavengeRegFromState(BitVector &)
 
My intention is to allow the LongJmp pass scavenge usable registers when
injecting code.
2026-04-02 11:59:59 +01:00
wangjue
8c2feea2f7 [BOLT] Delete unnecessary instructions (#189297) 2026-04-02 06:48:38 +03:00
Alexandros Lamprineas
abc0674f83 [BOLT][AArch64] Handle irreversible branches in compact-code-model (#186850)
When the compact-code-model is used, LongJmpPass::relaxLocalBranches
attempts to reverseBranchCondition without calling isReversibleBranch
resulting in runtime error. With this patch I am adding an additional
trampoline to handle irreversible FEAT_CMPBR branches.

In the future the plan is to use liveness analysis and replace the
irreversible branch with compare followed by branch (see #185731) as
long as the condition flags are dead, or emit the additional trampoline
otherwise.
2026-03-27 13:41:58 +00:00
Amir Ayupov
2fafeb0509 [BOLT] Support buildid in pre-aggregated profile (#186931)
Sample addresses belonging to external DSOs (buildid doesn't match the
current file) are treated as external (0).

Buildid for the main binary is expected to be omitted.

Test Plan:
added pre-aggregated-perf-buildid.test
2026-03-24 15:15:08 -07:00
Amir Ayupov
2e247a1d54 Revert "[BOLT] Support buildid in pre-aggregated profile"
Accidentally pushed unreviewed version.

This reverts commit fce6895804.
2026-03-24 15:13:14 -07:00
Amir Ayupov
fce6895804 [BOLT] Support buildid in pre-aggregated profile
Sample addresses belonging to external DSOs (buildid doesn't match the
current file) are treated as external (0).

Buildid for the main binary is expected to be omitted.

Test Plan: added pre-aggregated-perf-buildid.test

Reviewers:
paschalis-mpeis, maksfb, yavtuk, ayermolo, yozhu, rafaelauler, yota9

Reviewed By: paschalis-mpeis

Pull Request: https://github.com/llvm/llvm-project/pull/186931
2026-03-24 15:05:33 -07:00
Fangrui Song
d1b9b4c548 [MC] Remove unused NoExecStack parameter from MCStreamer::initSections. NFC (#188184)
Unused after commit 34bc5d580b
2026-03-24 07:42:09 +00:00
Ádám Kallai
733bc3409b [BOLT][Perf2bolt] Add support to generate pre-parsed perf data (#171144)
Adding a generator into Perf2bolt is the initial step to support the
large end-to-end tests for Arm SPE. This functionality proves unified format of
pre-parsed profile that Perf2bolt is able to consume.

Why does the test need to have a textual format SPE profile?

* To collect an Arm SPE profile by Linux Perf, it needs to have
an arm developer device which has SPE support.
* To decode SPE data, it also needs to have the proper version of
Linux Perf.
* The minimum required version of Linux Perf is v6.15.

Bypassing these technical difficulties, that easier to prove
a pre-generated textual profile format.

The generator relies on the aggregator work to spawn the required
perf-script jobs based on the the aggregation type, and merges the
results of the pref-script jobs into a single file.
This hybrid profile will contain all required events such as BuildID,
MMAP, TASK, BRSTACK, or MEM event for the aggregation.

Two examples below how to generate a pre-parsed perf data as
an input for ARM SPE aggregation:

`perf2bolt -p perf.data BINARY -o perf.text --spe
--generate-perf-script`

Or for basic aggregation:

`perf2bolt -p perf.data BINARY -o perf.text --ba --generate-perf-script`
2026-03-23 12:03:52 +01:00
Shanzhi Chen
de514fbaba [BOLT] Remove some unused code (NFC) (#183880)
Remove some unused code in BOLT:
- `RewriteInstance::linkRuntime` is declared but not defined
- `BranchContext` typedef is never used
- `FuncBranchData::getBranch` is defined but never used
- `FuncBranchData::getDirectCallBranch` is defined but never used
2026-03-23 09:13:00 +00:00
YongKang Zhu
b7d97d9e8d [BOLT] Remove outdated assertion from local symtab update logic (#187409)
The assert condition (function is not split or split
into less than three fragments) is not always true now
that we will emit more local symbols due to #184074.
2026-03-21 13:15:49 -07:00
Vasily Leonenko
51fd033521 [BOLT] Enable compatibility of instrumentation-file-append-pid with instrumentation-sleep-time (#183919)
This commit enables compatibility of instrumentation-file-append-pid and
instrumentation-sleep-time options. It also requires keeping the
counters mapping between the watcher process and the instrumented binary
process in shared mode. This is useful when we instrument a shared
library that is used by several tasks running on the target system. In
case when we cannot wait for every task to complete, we must use the
sleep-time option. Without append-pid option, we would overwrite the
profile at the same path but collected from different tasks, leading to
unexpected or suboptimal optimization effects.

Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>
2026-03-18 09:14:03 +03:00
YongKang Zhu
037c2095e6 Add hybrid function ordering support (#186003)
Allow `--function-order` to be combined with `--reorder-functions`
algorithms. Functions listed in the order file are pinned first
(indices 0..N-1), then the selected algorithm orders remaining
functions starting at index N.
2026-03-17 11:12:54 -07:00
Anatoly Trosinenko
481da949a4 [BOLT] Gadget scanner: implement finer-grained --scanners=... argument (#176135)
Add separate options to enable each of the available gadget detectors.
Furthermore, add two meta-options enabling all PtrAuth scanners and all
available scanners of any type (which is only PtrAuth for now, though).

This commit renames `pacret` option to `ptrauth-pac-ret` and `pauth` to
`ptrauth-all`.
2026-03-13 15:03:25 +00:00
Ádám Kallai
fd225e296f [BOLT] Spawn buildid-list perf job at perf2bolt start. NFC (#185865)
Launch this perf job with the others at the beginning of the aggregation
process.

Extracting buildid-list from perf data is not a costly process, so it
can be performed by default. This provides a distinct advantage when
this dataset is required in other perf2bolt stages as well.

Please see PR #171144.
2026-03-12 10:24:09 +01:00
Amina Chabane
498906f2df [BOLT] Error out on SHF_COMPRESSED debug sections (#185662)
Some binaries are built using `-gz=zstd`, but when using
`--update-debug-sections` on said binaries BOLT crashes.

This patch fixes this issue by recognising compressed debug sections in
binaries via their flag `SHF_COMPRESSED` and appropriately erroring out.

Legacy GNU-style compression is not handled.
2026-03-10 10:18:12 -07:00
Fangrui Song
c889454f1d [MC] Rename PrivateGlobalPrefix to InternalSymbolPrefix. NFC (#185164)
The "private global" terminology, likely came from
llvm/lib/IR/Mangler.cpp, is misleading: "private" is the opposite of
"global", and these prefixed symbols are not global in the object file
format sense (e.g. ELF has STB_GLOBAL while these symbols are always
STB_LOCAL). The term "internal symbol" better describes their purpose:
symbols for internal use by compilers and assemblers, not meant to be
visible externally.

This rename is a step toward adopting the "internal symbol prefix"
terminology agreed with GNU as
(https://sourceware.org/pipermail/binutils/2026-March/148448.html).
2026-03-10 01:03:27 -07:00
Asher Dobrescu
7bce678ec1 [BOLT] Check if symbol is in data area of function (#160143)
There are cases in which `getEntryIDForSymbol` is called, where the
given Symbol is in a constant island, and so BOLT can not find its
function. This causes BOLT to reach `llvm_unreachable("symbol not
found")` and crash. This patch adds a check that avoids this crash.
2026-03-06 10:37:54 +00:00
YongKang Zhu
95685ca52e [BOLT] Retain certain local symbols (#184074)
BOLT currently strips all STT_NOTYPE STB_LOCAL zero-sized symbols
that fall inside function bodies. Certain such symbols are named
labels (loop markers and subroutine entry points) or local function
symbols in hand-written assembly. We now keep them in local symbol
table in BOLT processed binaries for better symbolication.
2026-03-05 00:34:36 -08:00
YongKang Zhu
14bcb1a009 [BOLT] Make sure IOAddressMap exist before lookup (NFC) (#183184)
`BinaryFunction::translateInputToOutputAddress()` contains fallback
logic in case that querying `IOAddressMap` doesn't yield an output
address. Because this function could be called in scenarios where
`IOAddressMap` won't be set up, we should check if the map actually
exists before lookup.
2026-03-01 23:27:39 -08:00
Gergely Bálint
9d762ad279 [BOLT][BTI] Patch ignored functions in place when targeting them with indirect branches (#177165)
When applying BTI fixups to indirect branch targets, ignored functions
are
considered as a special case:
- these hold no instructions,
- have no CFG,
- and are not emitted in the new text section.

The solution is to patch the entry points in the original location.

If such a situation occurs in a binary, recompilation using the
-fpatchable-function-entry flag is required. This will place a nop at
all
function starts, which BOLT can use to patch the original section.

Without the extra nop, BOLT cannot safely patch the original .text
section.

An alternative solution could be to also ignore the function from which
the stub starts. This has not been tried as LongJmp pass - where most
stubs are inserted - is currently not equipped to ignore functions.

Testing: both the success and failure cases are covered with lit tests.
2026-02-24 11:09:42 +01:00
Maksim Panchenko
7063b22c63 [BOLT] Always place new PT_LOAD after existing ones (#182642)
Insert new PT_LOAD segments right after the last existing PT_LOAD in the
program header table, instead of before PT_DYNAMIC or at the end. This
maintains the ascending p_vaddr order required by the ELF specification.

Previously, new segments could end up breaking PT_LOAD p_vaddr order
when PT_LOAD segments followed PT_DYNAMIC or PT_GNU_STACK. This lead to
runtime loader incorrectly assessing dynamic object size and silently
corrupting memory.
2026-02-21 14:09:36 -08:00
Amir Ayupov
393adaac1d [BOLT] Mark BOLTReserved segment executable (#181606)
Summary:
When .bolt_reserved section is defined in the linker script, there's
no way to mark the containing segment executable other than via PHDRS
command which overrides program headers entirely which is impractical.

Since .bolt_reserved contains executable code, mark segment executable
in BOLT.

Test Plan: bolt-reserved.test
2026-02-19 15:07:50 -08:00