Commit Graph

73 Commits

Author SHA1 Message Date
Maksim Panchenko
34a7608fad [BOLT] Drop -znow requirement for PLT optimization on x86-64 (#178758)
On x86-64, PLT optimization does not require the binary to be linked
with -znow because indirect calls through GOT work correctly with lazy
binding. At runtime, the dynamic linker's resolver will populate the GOT
entry on the first call, just like with a regular PLT call.

This change removes the -znow requirement specifically for x86-64 while
keeping it for other architectures. I haven't checked RISV-V, but it's
still necessary on AArch64.
2026-01-29 16:10:43 -08:00
Gergely Bálint
a25e3674ae [BOLT] Rename Pointer Auth DWARF rewriter passes (#164622)
Rename passes to names that better reflect their intent, 
and describe their relationship to each other.

InsertNegateRAStatePass renamed to PointerAuthCFIFixup,
MarkRAStates renamed to PointerAuthCFIAnalyzer.

Added the --print-<passname> flags for these passes.
2025-12-04 11:29:40 +01:00
Akshiitaa06
4289849931 Improve formatting in BAT.md (#170254)
Make "Header" a subheading to improve readability in the Functions table
section.
2025-12-02 12:57:53 +00:00
Gergely Bálint
29fef3a51e [BOLT] Improve DWARF CFI generation for pac-ret binaries (#163381)
During InsertNegateRAState pass we check the annotations on
instructions,
to decide where to generate the OpNegateRAState CFIs in the output
binary.

As only instructions in the input binary were annotated, we have to make
a judgement on instructions generated by other BOLT passes.
Incorrect placement may cause issues when an (async) unwind request
is received during the new "unknown" instructions.

This patch adds more logic to make a more informed decision on by taking
into account:
- unknown instructions in a BasicBlock with other instruction have the
same RAState. Previously, if the BasicBlock started with an unknown
instruction,
the RAState was copied from the preceding block. Now, the RAState is
copied from
  the succeeding instructions in the same block.
- Some BasicBlocks may only contain instructions with unknown RAState,
As explained in issue #160989, these blocks already have incorrect
unwind info. Because of this, the last known RAState based on the layout order
is copied.

Updated bolt/docs/PacRetDesign.md to reflect changes.
2025-12-01 12:00:31 +01:00
Vasily Leonenko
a751ed97ac [BOLT] Support runtime library hook via DT_INIT_ARRAY (#167467)
Major part of this PR is commit implementing support for DT_INIT_ARRAY
for BOLT runtime libraries initialization. Also, it adds related
hook-init test & fixes couple of X86 instrumentation tests.

This commit follows implementation of instrumentation hook via
DT_FINI_ARRAY (https://github.com/llvm/llvm-project/pull/67348) and
extends it for BOLT runtime libraries (including instrumentation
library) initialization hooking.

Initialization has has differences compared to finalization:
- Executables always use ELF entry point address. Update code checks it
and updates init_array entry if ELF is shared library (have no interp
entry) and have no DT_INIT entry. Also this commit introduces
"runtime-lib-init-hook" option to select primary initialization hook
(entry_point, init, init_array) with fall back to next available hook in
input binary. e.g. in case of libc we can explicitly set it to
init_array.
- Shared library init_array entries relocations usually has
R_AARCH64_ABS64 type on AArch64 binaries. We check relocation type and
adjust methods for reading init_array relocations in discovery and
update methods.

---------

Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>
2025-12-01 10:55:00 +03:00
Jinjie Huang
f7be258c28 [BOLT][NFC] Clean up the outdated option --write-dwp in doc (#166150)
Since the "--write-dwp" option has been removed in
[PR](https://github.com/llvm/llvm-project/pull/100771), this patch also
cleans up the corresponding document and test to avoid misleading
issues.
2025-11-04 18:27:53 +08:00
Paschalis Mpeis
ae6cb98b29 [BOLT] Add --ba flag to deprecate --nl (#164257)
The `--nl` flag, originally for Non-LBR mode, is deprecated and will be
replaced by `--basic-events` (alias `--ba`).

`--nl` remains as a deprecated alias for backward compatibility.
2025-10-23 10:13:28 +01:00
Paschalis Mpeis
96688d4b3c [BOLT][NFC] Use brstack in guides and user outputs (#163950)
Update guides to use brstack, with a mention to BRBE for AArch64. Use
brstack in user-facing outputs.

---------

Co-authored-by: Amir Ayupov <aaupov@fb.com>
2025-10-20 09:30:06 +00:00
Christian Clauss
0fc05aa1c6 [bolt] Fix typos discovered by codespell (#124726)
https://github.com/codespell-project/codespell
```bash
codespell bolt --skip="*.yaml,Maintainers.txt" --write-changes \
    --ignore-words-list=acount,alledges,ans,archtype,defin,iself,mis,mmaped,othere,outweight,vas
```
2025-10-14 14:45:40 +02:00
Gergely Bálint
889bfd9172 Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353) (#162435)
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing
binaries with pac-ret hardening (#120064)" (#162353)

This reverts commit c7d776b068.

#120064 was reverted for breaking builders.

Fix: changed the mismatched type in MarkRAStates.cpp to `auto`.

---

Original message:

OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
whether the current return address has been signed with PAC.

OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.

This patch introduces two new passes:

- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
  instruction based on OpNegateRAState CFIs in the input binary.

- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
  new OpNegateRAState CFIs where RA state changes between instructions.

Design details are described in: `bolt/docs/PacRetDesign.md`.
2025-10-08 11:05:41 +02:00
Gergely Bálint
c7d776b068 Revert "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353)
Reverts llvm/llvm-project#120064.

@gulfemsavrun reported that the patch broke toolchain builders.
2025-10-07 21:59:18 +02:00
Gergely Bálint
32eaf5b59c [BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#120064)
OpNegateRAState is an AArch64-specific DWARF CFI used to change the value
of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records
if the current return address has been signed with PAC.

OpNegateRAState requires special handling in BOLT because its placement
depends on the function layout. Since BOLT reorders basic blocks during
optimization, these CFIs must be regenerated after layout is finalized.

This patch introduces two new passes:

- MarkRAStates (runs before optimizations): assigns a signedness annotation to each
  instruction based on OpNegateRAState CFIs in the input binary.

- InsertNegateRAStates (runs after optimizations): reads the annotations and emits
  new OpNegateRAState CFIs where RA state changes between instructions.

Design details are described in: `bolt/docs/PacRetDesign.md`.
2025-10-07 10:22:14 +02:00
YafetBeyene
244588b9d7 [BOLT][AArch64] Inlining of Memcpy (#154929)
The pass for inlining memcpy in BOLT was currently X86-specific and was
using the instruction `rep movsb`.

This patch implements a static size analysis system for AArch64 memcpy
inlining that extracts copy sizes from preceding instructions to then
use it to generate the optimal width-specific load/store sequences.
2025-09-09 14:09:23 +01:00
YafetBeyene
fda24dbc16 [BOLT] Add dump-dot-func option for selective function CFG dumping (#153007)
## Change:
* Added `--dump-dot-func` command-line option that allows users to dump
CFGs only for specific functions instead of dumping all functions (the
current only available option being `--dump-dot-all`)

## Usage:
* Users can now specify function names or regex patterns (e.g.,
`--dump-dot-func=main,helper` or `--dump-dot-func="init.*`") to generate
.dot files only for functions of interest
* Aims to save time when analysing specific functions in large binaries
(e.g., only dumping graphs for performance-critical functions identified
through profiling) and we can now avoid reduce output clutter from
generating thousands of unnecessary .dot files when analysing large
binaries

## Testing
The introduced test `dump-dot-func.test` confirms the new option does
the following:

- [x] 1. `dump-dot-func` can correctly filter a specified functions
- [x] 2. Can achieve the above with regexes
- [x] 3. Can do 1. with a list of functions
- [x] No option specified creates no dot files
- [x] Passing in a non-existent function generates no dumping messages
- [x] `dump-dot-all` continues to work as expected
2025-08-22 10:51:09 +01:00
Amir Ayupov
5047a33cd8 [BOLT][heatmap] Produce zoomed-out heatmaps (#140153)
Add a capability to produce multiple heatmaps with given bucket sizes.

The default heatmap block size (64B) could be too fine-grained for
large binaries. Extend the option `block-size` to accept a list of
bucket sizes for additional heatmaps with coarser granularity. The
heatmap is simply rescaled so provided sizes should be multiples of
each other. Human-readable suffixes can be used, e.g. 4K, 16kb, 1MiB.

New defaults: 64B (base bucket size), 4KB (default page size),
256KB (for large binaries).

Test Plan: updated heatmap-preagg.test
2025-05-30 16:20:19 -07:00
Gergely Bálint
5b20b5721a [BOLT][AArch64] Allow binary-analysis and heatmap tool to run with pac-ret binaries (#136664)
OpNegateRAState support is only needed for tools that produce binaries.
2025-04-30 13:41:11 +01:00
cor3ntin
320ec7fa7f [Documentation] Always use SVG for dot-generated doxygen images. (#136843)
Despite our attempt (build-docs.sh)
to build the documentation with SVG,
it still uses PNG https://llvm.org/doxygen/classllvm_1_1StringRef.html,

and that renders terribly on any high dpi display.

SVG leads to smasller installation and works fine
on all browser (that has been true for _a while_
https://caniuse.com/svg), so this patch just unconditionally build all
dot graphs as SVG in all subprojects and remove the option.
2025-04-25 14:13:17 +02:00
Kristof Beyls
850b492976 [BOLT][binary-analysis] Add initial pac-ret gadget scanner (#122304)
This adds an initial pac-ret gadget scanner to the
llvm-bolt-binary-analysis-tool.

The scanner is taken from the prototype that was published last year at
https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype,
and has been discussed in RFC

https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148
and in the EuroLLVM 2024 keynote "Does LLVM implement security
hardenings correctly? A BOLT-based static analyzer to the rescue?"
[Video](https://youtu.be/Sn_Fxa0tdpY)
[Slides](https://llvm.org/devmtg/2024-04/slides/Keynote/Beyls_EuroLLVM2024_security_hardening_keynote.pdf)

In the spirit of incremental development, this PR aims to add a minimal
implementation that is "fully working" on its own, but has major
limitations, as described in the bolt/docs/BinaryAnalysis.md
documentation in this proposed commit. These and other limitations will
be fixed in follow-on PRs, mostly based on code already existing in the
prototype branch. I hope incrementally upstreaming will make it easier
to review the code.

Note that I believe that this could also form the basis of a scanner to
analyze correct implementation of PAuthABI.
2025-02-24 07:26:28 +00:00
Davide Italiano
62c39d7734 [BOLT/docs] The support for macro-op fusion was removed. (#121158)
Update the documentation accordingly.
2024-12-26 11:18:12 -08:00
Alexander Yermolovich
3c357a49d6 [BOLT] Add support for safe-icf (#116275)
Identical Code Folding (ICF) folds functions that are identical into one
function, and updates symbol addresses to the new address. This reduces
the size of a binary, but can lead to problems. For example when
function pointers are compared. This can be done either explicitly in
the code or generated IR by optimization passes like Indirect Call
Promotion (ICP). After ICF what used to be two different addresses
become the same address. This can lead to a different code path being
taken.

This is where safe ICF comes in. Linker (LLD) does it using address
significant section generated by clang. If symbol is in it, or an object
doesn't have this section symbols are not folded.

BOLT does not have the information regarding which objects do not have
this section, so can't re-use this mechanism.

This implementation scans code section and conservatively marks
functions symbols as unsafe. It treats symbols as unsafe if they are
used in non-control flow instruction. It also scans through the data
relocation sections and does the same for relocations that reference a
function symbol. The latter handles the case when function pointer is
stored in a local or global variable, etc. If a relocation address
points within a vtable these symbols are skipped.
2024-12-16 21:49:53 -08:00
Kristof Beyls
ceb7214be0 [BOLT] Introduce binary analysis tool based on BOLT (#115330)
This initial commit does not add any specific binary analyses yet, it
merely contains the boilerplate to introduce a new BOLT-based tool.

This basically combines the 4 first patches from the prototype pac-ret
and stack-clash binary analyzer discussed in RFC
https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148
and published at
https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype

The introduction of such a BOLT-based binary analysis tool was proposed
and discussed in at least the following places:
- The RFC pointed to above
- EuroLLVM 2024 round table
https://discourse.llvm.org/t/summary-of-bolt-as-a-binary-analysis-tool-round-table-at-eurollvm/78441
The round table showed quite a few people interested in being able to
build a custom binary analysis quickly with a tool like this.
- Also at the US LLVM dev meeting a few weeks ago, I heard interest from
a few people, asking when the tool would be available upstream.
- The presentation "Adding Pointer Authentication ABI support for your
ELF platform"
(https://llvm.swoogo.com/2024devmtg/session/2512720/adding-pointer-authentication-abi-support-for-your-elf-platform)
explicitly mentioned interest to extend the prototype tool to verify
correct implementation of pauthabi.
2024-12-12 10:06:27 +00:00
Peter Jung
c1912b4dd7 [BOLT][docs] Fix typo (#98640)
Typo:

`chwon` --> `chown`

Signed-off-by: Peter Jung <admin@ptr1337.dev>
2024-08-08 18:05:41 -07:00
Sayhaan Siddiqui
6aad62cf5b [BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282)
Enables parallelization for the processing of DWO CUs.
2024-08-08 16:41:51 -07:00
Jordan Brantner
d251a328b8 [BOLT] Fix typo from alterantive to alternative (#99704)
Fix typo from `alterantive` -> `alternative`

Signed-off-by: Jordan Brantner <brantnej@oregonstate.edu>
2024-07-22 18:35:20 -07:00
Eisuke Kawashima
8bc02bf5c6 fix(bolt/**.py): fix comparison to None (#94012)
from PEP8
(https://peps.python.org/pep-0008/#programming-recommendations):

> Comparisons to singletons like None should always be done with is or
is not, never the equality operators.

Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
2024-07-19 16:59:56 -07:00
Itis-hard2name
7f563232d6 [bolt][Docs] fix missing option in cmake of stage3 in OptimizingClang.md (#93684)
Fixes #93681
2024-07-19 16:55:21 -07:00
Sayhaan Siddiqui
d54ec64f67 [BOLT][DWARF] Remove deprecated opt (#99575)
Remove deprecated DeterministicDebugInfo option and its uses.
2024-07-19 14:03:50 -07:00
Shaw Young
296a956369 [BOLT] Match functions with call graph (#98125)
Implemented call graph function matching. First, two call graphs are
constructed for both profiled and binary functions. Then functions are
hashed based on the names of their callee/caller functions. Finally,
functions are matched based on these neighbor hashes and the 
longest common prefix of their names. The `match-with-call-graph` 
flag turns this matching on.

Test Plan: Added match-with-call-graph.test. Matched 164 functions 
in a large binary with 10171 profiled functions.
2024-07-19 14:00:28 -07:00
Paschalis Mpeis
b037d0f0e5 [BOLT][docs] Expand Heatmaps.md (#98162)
Improve documentation on heatmaps.
Add example for X axis labels.
2024-07-15 08:53:27 +01:00
Paschalis Mpeis
34433fdceb [BOLT] Add -print-mappings option to heatmaps (#97567)
Emit a mapping in the legend between the characters/buckets and the text
sections, using:

```sh
llvm-heatmap-bolt -print-mappings ..
```

Example:
```
Legend:
..
Sections:
  a/A : .init      0x00000100-0x00000200
  b/B : .plt       0x00000200-0x00000500
  c/C : .text      0x00010000-0x000a0000
  d/D : .fini      0x000a0000-0x000f0000
..
```
2024-07-15 08:23:06 +01:00
Maksim Panchenko
a0c6b8aef8 [BOLT][docs] Add merge-fdata to Linux optimization guide (#97659) 2024-07-03 17:30:37 -07:00
Shaw Young
97dc50882c [BOLT] Match functions with name similarity (#95884)
A mapping - from namespace to associated binary functions - is used to
match function profiles to binary based on the
'--name-similarity-function-matching-threshold' flag set edit distance
threshold. The flag is set to 0 (exact name matching) by default as it is
expensive, requiring the processing of all BFs.

Test Plan: Added name-similarity-function-matching.test. On a binary
with 5M functions, rewrite passes took ~520s without the flag and
~2018s with the flag set to 20.
2024-07-03 11:39:18 -07:00
Shaw Young
49fdbbcfed [BOLT] Match functions with exact hash (#96572)
Added flag '--match-profile-with-function-hash' to match functions 
based on exact hash. After identical and LTO name matching, more 
functions can be recovered for inference with exact hash, in the case
of function renaming with no functional changes. Collisions are 
possible in the unlikely case where multiple functions share the same
exact hash. The flag is off by default as it requires the processing of 
all binary functions and subsequently is expensive.

Test Plan: added hashing-based-function-matching.test.
2024-06-29 21:19:00 -07:00
Maksim Panchenko
ec2fb59e6c [BOLT][docs] Add Linux kernel optimization guide (#96669)
Describe steps for optimizing the Linux kernel with BOLT.
2024-06-25 12:09:04 -07:00
shawbyoung
902952ae04 Revert "[𝘀𝗽𝗿] initial version"
This reverts commit bb5ab1ffe7.
2024-06-25 08:30:29 -07:00
shawbyoung
bb5ab1ffe7 [𝘀𝗽𝗿] initial version
Created using spr 1.3.4
2024-06-25 08:05:29 -07:00
shaw young
32e4906c28 Revert "[BOLT] Hash-based function matching" (#96568)
Reverts llvm/llvm-project#95821
2024-06-24 18:44:24 -04:00
shaw young
5e097c79d8 [BOLT] Hash-based function matching (#95821)
Using the hashes of binary and profiled functions
to recover functions with changed names.

Test Plan: added 
hashing-based-function-matching.test.
2024-06-24 15:29:44 -07:00
shaw young
75ac887a30 [BOLT][NFC] Sync CommandLineArgumentReference with options (#96563) 2024-06-24 15:16:52 -07:00
shaw young
68fc8dffe4 [BOLT] Drop high discrepancy profiles in matching (#95156)
Summary: Functions with high discrepancy 
(measured by matched function blocks) 
can be ignored with an added command line 
argument for better performance.

Test Plan: Added 
stale-matching-min-matched-block.test

---------

Co-authored-by: Amir Ayupov <aaupov@fb.com>
2024-06-17 15:14:35 -07:00
Elvina Yakubova
765ce86991 [BOLT][DOC] Add script for automatic user guide generation (#93822) 2024-05-31 13:50:51 +01:00
Michael Kruse
c5a3f664fe [BOLT] Revise IDE folder structure (#89742)
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode (`set_property(TARGET <target>
PROPERTY FOLDER "<title>")`) when using the respective CMake's IDE
generator.

 * Ensure that every target is in a folder
 * Use a folder hierarchy with each LLVM subproject as a top-level folder
 * Use consistent folder names between subprojects
 * When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
2024-05-25 17:15:37 +02:00
Amir Ayupov
d1d9545ed3 [BOLT][BAT] Add entries for deleted basic blocks
Deleted basic blocks are required for correct mapping of branches
modified by SCTC.

Increases BAT size, bytes:
- large binary: 8622496 -> 8703244.
- small binary (X86/bolt-address-translation.test): 928 -> 940.

Test Plan: updated bb-with-two-tail-calls.s

Reviewers: ayermolo, dcci, maksfb, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/91906
2024-05-23 19:19:07 -07:00
Elvina Yakubova
dcf3102be8 [BOLT][NFC] Add documentation on BOLT options (#92117)
Add .md file documentation with all BOLT options to display it more
conveniently.
2024-05-15 16:16:39 +01:00
Amir Ayupov
b79b6f9cf0 [BOLT] Use offset deduplication for cold fragments
Apply deduplication for uniformity and BAT section size reduction.

Changes BAT section size to:
- large binary: 39541552 bytes (1.02x original),
- medium binary: 3828996 bytes (0.64x),
- small binary: 928 bytes (0.65x).

Test Plan: Updated bolt-address-translation.test

Reviewers: rafaelauler, dcci, ayermolo, JDevlieghere, maksfb

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/87853
2024-04-15 09:50:12 +02:00
Amir Ayupov
1b763f230a [BOLT] Add secondary entry points to BAT
Provide secondary entry points for `EntryDiscriminator` call info field
in YAML profile.

Increases BAT section size to:
- large binary: 39655300 bytes (1.03x the original),
- medium binary: 3834328 bytes (0.65x),
- small binary: 924 bytes (0.64x).

Depends on: https://github.com/llvm/llvm-project/pull/76911

Test Plan:
- Updated bolt-address-translation{,-yaml}.test
- Added openssl test: https://github.com/rafaelauler/bolt-tests/pull/30

Reviewers: dcci, rafaelauler, maksfb, ayermolo

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/86218
2024-03-25 15:14:33 -07:00
Amir Ayupov
ceba3a38e8 [BOLT] Add number of basic blocks to BAT
YAML profile reader checks the number of basic blocks in regular,
no-stale-matching mode. Add it to BAT.

This increases the size of BAT section to:
- large binary: 39583080 bytes (1.02x of the original),
- medium binary: 3816492 bytes (0.64x),
- small binary: 920 bytes (0.64x, no change due to alignment).

Test Plan: Updated bolt-address-translation-yaml.test

Reviewers: rafaelauler, ayermolo, maksfb, dcci

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/86045
2024-03-22 08:46:48 -07:00
Amir Ayupov
b0e23639c5 [BOLT] Add BB index to BAT
Add input basic block index to BAT metadata. This addresses the case
where some basic blocks are eliminated, and output index is not equal
to the input block index. These indices are used in non-stale-matching
mode.

Increases BAT section size to:
- large binary: 39521512 bytes (1.02x original),
- medium binary: 3799988 bytes (0.64x),
- small binary: 920 bytes (0.64x).

Test Plan:
Updated bolt-address-translation{,-yaml}.test

Pull Request: https://github.com/llvm/llvm-project/pull/86044
2024-03-22 08:42:58 -07:00
Amir Ayupov
f66d631bf8 Revert "[BOLT] Add BB index to BAT (#86044)"
This reverts commit 3b3de48fd8.
2024-03-22 08:38:40 -07:00
Amir Ayupov
3b3de48fd8 [BOLT] Add BB index to BAT (#86044) 2024-03-22 06:07:17 -07:00