llvm-project

Author	SHA1	Message	Date
Maksim Panchenko	34a7608fad	[BOLT] Drop -znow requirement for PLT optimization on x86-64 (#178758 ) On x86-64, PLT optimization does not require the binary to be linked with -znow because indirect calls through GOT work correctly with lazy binding. At runtime, the dynamic linker's resolver will populate the GOT entry on the first call, just like with a regular PLT call. This change removes the -znow requirement specifically for x86-64 while keeping it for other architectures. I haven't checked RISV-V, but it's still necessary on AArch64.	2026-01-29 16:10:43 -08:00
Gergely Bálint	a25e3674ae	[BOLT] Rename Pointer Auth DWARF rewriter passes (#164622 ) Rename passes to names that better reflect their intent, and describe their relationship to each other. InsertNegateRAStatePass renamed to PointerAuthCFIFixup, MarkRAStates renamed to PointerAuthCFIAnalyzer. Added the --print-<passname> flags for these passes.	2025-12-04 11:29:40 +01:00
Akshiitaa06	4289849931	Improve formatting in BAT.md (#170254 ) Make "Header" a subheading to improve readability in the Functions table section.	2025-12-02 12:57:53 +00:00
Gergely Bálint	29fef3a51e	[BOLT] Improve DWARF CFI generation for pac-ret binaries (#163381 ) During InsertNegateRAState pass we check the annotations on instructions, to decide where to generate the OpNegateRAState CFIs in the output binary. As only instructions in the input binary were annotated, we have to make a judgement on instructions generated by other BOLT passes. Incorrect placement may cause issues when an (async) unwind request is received during the new "unknown" instructions. This patch adds more logic to make a more informed decision on by taking into account: - unknown instructions in a BasicBlock with other instruction have the same RAState. Previously, if the BasicBlock started with an unknown instruction, the RAState was copied from the preceding block. Now, the RAState is copied from the succeeding instructions in the same block. - Some BasicBlocks may only contain instructions with unknown RAState, As explained in issue #160989, these blocks already have incorrect unwind info. Because of this, the last known RAState based on the layout order is copied. Updated bolt/docs/PacRetDesign.md to reflect changes.	2025-12-01 12:00:31 +01:00
Vasily Leonenko	a751ed97ac	[BOLT] Support runtime library hook via DT_INIT_ARRAY (#167467 ) Major part of this PR is commit implementing support for DT_INIT_ARRAY for BOLT runtime libraries initialization. Also, it adds related hook-init test & fixes couple of X86 instrumentation tests. This commit follows implementation of instrumentation hook via DT_FINI_ARRAY (https://github.com/llvm/llvm-project/pull/67348) and extends it for BOLT runtime libraries (including instrumentation library) initialization hooking. Initialization has has differences compared to finalization: - Executables always use ELF entry point address. Update code checks it and updates init_array entry if ELF is shared library (have no interp entry) and have no DT_INIT entry. Also this commit introduces "runtime-lib-init-hook" option to select primary initialization hook (entry_point, init, init_array) with fall back to next available hook in input binary. e.g. in case of libc we can explicitly set it to init_array. - Shared library init_array entries relocations usually has R_AARCH64_ABS64 type on AArch64 binaries. We check relocation type and adjust methods for reading init_array relocations in discovery and update methods. --------- Co-authored-by: Vasily Leonenko <vasily.leonenko@huawei.com>	2025-12-01 10:55:00 +03:00
Jinjie Huang	f7be258c28	[BOLT][NFC] Clean up the outdated option --write-dwp in doc (#166150 ) Since the "--write-dwp" option has been removed in [PR](https://github.com/llvm/llvm-project/pull/100771), this patch also cleans up the corresponding document and test to avoid misleading issues.	2025-11-04 18:27:53 +08:00
Paschalis Mpeis	ae6cb98b29	[BOLT] Add --ba flag to deprecate --nl (#164257 ) The `--nl` flag, originally for Non-LBR mode, is deprecated and will be replaced by `--basic-events` (alias `--ba`). `--nl` remains as a deprecated alias for backward compatibility.	2025-10-23 10:13:28 +01:00
Paschalis Mpeis	96688d4b3c	[BOLT][NFC] Use brstack in guides and user outputs (#163950 ) Update guides to use brstack, with a mention to BRBE for AArch64. Use brstack in user-facing outputs. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>	2025-10-20 09:30:06 +00:00
Christian Clauss	0fc05aa1c6	[bolt] Fix typos discovered by codespell (#124726 ) https://github.com/codespell-project/codespell ```bash codespell bolt --skip="*.yaml,Maintainers.txt" --write-changes \ --ignore-words-list=acount,alledges,ans,archtype,defin,iself,mis,mmaped,othere,outweight,vas ```	2025-10-14 14:45:40 +02:00
Gergely Bálint	889bfd9172	Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353 ) (#162435 ) Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#120064)" (#162353) This reverts commit `c7d776b068`. #120064 was reverted for breaking builders. Fix: changed the mismatched type in MarkRAStates.cpp to `auto`. --- Original message: OpNegateRAState is an AArch64-specific DWARF CFI used to change the value of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records whether the current return address has been signed with PAC. OpNegateRAState requires special handling in BOLT because its placement depends on the function layout. Since BOLT reorders basic blocks during optimization, these CFIs must be regenerated after layout is finalized. This patch introduces two new passes: - MarkRAStates (runs before optimizations): assigns a signedness annotation to each instruction based on OpNegateRAState CFIs in the input binary. - InsertNegateRAStates (runs after optimizations): reads the annotations and emits new OpNegateRAState CFIs where RA state changes between instructions. Design details are described in: `bolt/docs/PacRetDesign.md`.	2025-10-08 11:05:41 +02:00
Gergely Bálint	c7d776b068	Revert "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353 ) Reverts llvm/llvm-project#120064. @gulfemsavrun reported that the patch broke toolchain builders.	2025-10-07 21:59:18 +02:00
Gergely Bálint	32eaf5b59c	[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#120064 ) OpNegateRAState is an AArch64-specific DWARF CFI used to change the value of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records if the current return address has been signed with PAC. OpNegateRAState requires special handling in BOLT because its placement depends on the function layout. Since BOLT reorders basic blocks during optimization, these CFIs must be regenerated after layout is finalized. This patch introduces two new passes: - MarkRAStates (runs before optimizations): assigns a signedness annotation to each instruction based on OpNegateRAState CFIs in the input binary. - InsertNegateRAStates (runs after optimizations): reads the annotations and emits new OpNegateRAState CFIs where RA state changes between instructions. Design details are described in: `bolt/docs/PacRetDesign.md`.	2025-10-07 10:22:14 +02:00
YafetBeyene	244588b9d7	[BOLT][AArch64] Inlining of Memcpy (#154929 ) The pass for inlining memcpy in BOLT was currently X86-specific and was using the instruction `rep movsb`. This patch implements a static size analysis system for AArch64 memcpy inlining that extracts copy sizes from preceding instructions to then use it to generate the optimal width-specific load/store sequences.	2025-09-09 14:09:23 +01:00
YafetBeyene	fda24dbc16	[BOLT] Add dump-dot-func option for selective function CFG dumping (#153007 ) ## Change: * Added `--dump-dot-func` command-line option that allows users to dump CFGs only for specific functions instead of dumping all functions (the current only available option being `--dump-dot-all`) ## Usage: * Users can now specify function names or regex patterns (e.g., `--dump-dot-func=main,helper` or `--dump-dot-func="init.`") to generate .dot files only for functions of interest Aims to save time when analysing specific functions in large binaries (e.g., only dumping graphs for performance-critical functions identified through profiling) and we can now avoid reduce output clutter from generating thousands of unnecessary .dot files when analysing large binaries ## Testing The introduced test `dump-dot-func.test` confirms the new option does the following: - [x] 1. `dump-dot-func` can correctly filter a specified functions - [x] 2. Can achieve the above with regexes - [x] 3. Can do 1. with a list of functions - [x] No option specified creates no dot files - [x] Passing in a non-existent function generates no dumping messages - [x] `dump-dot-all` continues to work as expected	2025-08-22 10:51:09 +01:00
Amir Ayupov	5047a33cd8	[BOLT][heatmap] Produce zoomed-out heatmaps (#140153 ) Add a capability to produce multiple heatmaps with given bucket sizes. The default heatmap block size (64B) could be too fine-grained for large binaries. Extend the option `block-size` to accept a list of bucket sizes for additional heatmaps with coarser granularity. The heatmap is simply rescaled so provided sizes should be multiples of each other. Human-readable suffixes can be used, e.g. 4K, 16kb, 1MiB. New defaults: 64B (base bucket size), 4KB (default page size), 256KB (for large binaries). Test Plan: updated heatmap-preagg.test	2025-05-30 16:20:19 -07:00
Gergely Bálint	5b20b5721a	[BOLT][AArch64] Allow binary-analysis and heatmap tool to run with pac-ret binaries (#136664 ) OpNegateRAState support is only needed for tools that produce binaries.	2025-04-30 13:41:11 +01:00
cor3ntin	320ec7fa7f	[Documentation] Always use SVG for dot-generated doxygen images. (#136843 ) Despite our attempt (build-docs.sh) to build the documentation with SVG, it still uses PNG https://llvm.org/doxygen/classllvm_1_1StringRef.html, and that renders terribly on any high dpi display. SVG leads to smasller installation and works fine on all browser (that has been true for _a while_ https://caniuse.com/svg), so this patch just unconditionally build all dot graphs as SVG in all subprojects and remove the option.	2025-04-25 14:13:17 +02:00
Kristof Beyls	850b492976	[BOLT][binary-analysis] Add initial pac-ret gadget scanner (#122304 ) This adds an initial pac-ret gadget scanner to the llvm-bolt-binary-analysis-tool. The scanner is taken from the prototype that was published last year at https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype, and has been discussed in RFC https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 and in the EuroLLVM 2024 keynote "Does LLVM implement security hardenings correctly? A BOLT-based static analyzer to the rescue?" [Video](https://youtu.be/Sn_Fxa0tdpY) [Slides](https://llvm.org/devmtg/2024-04/slides/Keynote/Beyls_EuroLLVM2024_security_hardening_keynote.pdf) In the spirit of incremental development, this PR aims to add a minimal implementation that is "fully working" on its own, but has major limitations, as described in the bolt/docs/BinaryAnalysis.md documentation in this proposed commit. These and other limitations will be fixed in follow-on PRs, mostly based on code already existing in the prototype branch. I hope incrementally upstreaming will make it easier to review the code. Note that I believe that this could also form the basis of a scanner to analyze correct implementation of PAuthABI.	2025-02-24 07:26:28 +00:00
Davide Italiano	62c39d7734	[BOLT/docs] The support for macro-op fusion was removed. (#121158 ) Update the documentation accordingly.	2024-12-26 11:18:12 -08:00
Alexander Yermolovich	3c357a49d6	[BOLT] Add support for safe-icf (#116275 ) Identical Code Folding (ICF) folds functions that are identical into one function, and updates symbol addresses to the new address. This reduces the size of a binary, but can lead to problems. For example when function pointers are compared. This can be done either explicitly in the code or generated IR by optimization passes like Indirect Call Promotion (ICP). After ICF what used to be two different addresses become the same address. This can lead to a different code path being taken. This is where safe ICF comes in. Linker (LLD) does it using address significant section generated by clang. If symbol is in it, or an object doesn't have this section symbols are not folded. BOLT does not have the information regarding which objects do not have this section, so can't re-use this mechanism. This implementation scans code section and conservatively marks functions symbols as unsafe. It treats symbols as unsafe if they are used in non-control flow instruction. It also scans through the data relocation sections and does the same for relocations that reference a function symbol. The latter handles the case when function pointer is stored in a local or global variable, etc. If a relocation address points within a vtable these symbols are skipped.	2024-12-16 21:49:53 -08:00
Kristof Beyls	ceb7214be0	[BOLT] Introduce binary analysis tool based on BOLT (#115330 ) This initial commit does not add any specific binary analyses yet, it merely contains the boilerplate to introduce a new BOLT-based tool. This basically combines the 4 first patches from the prototype pac-ret and stack-clash binary analyzer discussed in RFC https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 and published at https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype The introduction of such a BOLT-based binary analysis tool was proposed and discussed in at least the following places: - The RFC pointed to above - EuroLLVM 2024 round table https://discourse.llvm.org/t/summary-of-bolt-as-a-binary-analysis-tool-round-table-at-eurollvm/78441 The round table showed quite a few people interested in being able to build a custom binary analysis quickly with a tool like this. - Also at the US LLVM dev meeting a few weeks ago, I heard interest from a few people, asking when the tool would be available upstream. - The presentation "Adding Pointer Authentication ABI support for your ELF platform" (https://llvm.swoogo.com/2024devmtg/session/2512720/adding-pointer-authentication-abi-support-for-your-elf-platform) explicitly mentioned interest to extend the prototype tool to verify correct implementation of pauthabi.	2024-12-12 10:06:27 +00:00
Peter Jung	c1912b4dd7	[BOLT][docs] Fix typo (#98640 ) Typo: `chwon` --> `chown` Signed-off-by: Peter Jung <admin@ptr1337.dev>	2024-08-08 18:05:41 -07:00
Sayhaan Siddiqui	6aad62cf5b	[BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282 ) Enables parallelization for the processing of DWO CUs.	2024-08-08 16:41:51 -07:00
Jordan Brantner	d251a328b8	[BOLT] Fix typo from alterantive to alternative (#99704 ) Fix typo from `alterantive` -> `alternative` Signed-off-by: Jordan Brantner <brantnej@oregonstate.edu>	2024-07-22 18:35:20 -07:00
Eisuke Kawashima	8bc02bf5c6	fix(bolt/**.py): fix comparison to None (#94012 ) from PEP8 (https://peps.python.org/pep-0008/#programming-recommendations): > Comparisons to singletons like None should always be done with is or is not, never the equality operators. Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>	2024-07-19 16:59:56 -07:00
Itis-hard2name	7f563232d6	[bolt][Docs] fix missing option in cmake of stage3 in OptimizingClang.md (#93684 ) Fixes #93681	2024-07-19 16:55:21 -07:00
Sayhaan Siddiqui	d54ec64f67	[BOLT][DWARF] Remove deprecated opt (#99575 ) Remove deprecated DeterministicDebugInfo option and its uses.	2024-07-19 14:03:50 -07:00
Shaw Young	296a956369	[BOLT] Match functions with call graph (#98125 ) Implemented call graph function matching. First, two call graphs are constructed for both profiled and binary functions. Then functions are hashed based on the names of their callee/caller functions. Finally, functions are matched based on these neighbor hashes and the longest common prefix of their names. The `match-with-call-graph` flag turns this matching on. Test Plan: Added match-with-call-graph.test. Matched 164 functions in a large binary with 10171 profiled functions.	2024-07-19 14:00:28 -07:00
Paschalis Mpeis	b037d0f0e5	[BOLT][docs] Expand Heatmaps.md (#98162 ) Improve documentation on heatmaps. Add example for X axis labels.	2024-07-15 08:53:27 +01:00
Paschalis Mpeis	34433fdceb	[BOLT] Add -print-mappings option to heatmaps (#97567 ) Emit a mapping in the legend between the characters/buckets and the text sections, using: ```sh llvm-heatmap-bolt -print-mappings .. ``` Example: ``` Legend: .. Sections: a/A : .init 0x00000100-0x00000200 b/B : .plt 0x00000200-0x00000500 c/C : .text 0x00010000-0x000a0000 d/D : .fini 0x000a0000-0x000f0000 .. ```	2024-07-15 08:23:06 +01:00
Maksim Panchenko	a0c6b8aef8	[BOLT][docs] Add merge-fdata to Linux optimization guide (#97659 )	2024-07-03 17:30:37 -07:00
Shaw Young	97dc50882c	[BOLT] Match functions with name similarity (#95884 ) A mapping - from namespace to associated binary functions - is used to match function profiles to binary based on the '--name-similarity-function-matching-threshold' flag set edit distance threshold. The flag is set to 0 (exact name matching) by default as it is expensive, requiring the processing of all BFs. Test Plan: Added name-similarity-function-matching.test. On a binary with 5M functions, rewrite passes took ~520s without the flag and ~2018s with the flag set to 20.	2024-07-03 11:39:18 -07:00
Shaw Young	49fdbbcfed	[BOLT] Match functions with exact hash (#96572 ) Added flag '--match-profile-with-function-hash' to match functions based on exact hash. After identical and LTO name matching, more functions can be recovered for inference with exact hash, in the case of function renaming with no functional changes. Collisions are possible in the unlikely case where multiple functions share the same exact hash. The flag is off by default as it requires the processing of all binary functions and subsequently is expensive. Test Plan: added hashing-based-function-matching.test.	2024-06-29 21:19:00 -07:00
Maksim Panchenko	ec2fb59e6c	[BOLT][docs] Add Linux kernel optimization guide (#96669 ) Describe steps for optimizing the Linux kernel with BOLT.	2024-06-25 12:09:04 -07:00
shawbyoung	902952ae04	Revert "[𝘀𝗽𝗿] initial version" This reverts commit `bb5ab1ffe7`.	2024-06-25 08:30:29 -07:00
shawbyoung	bb5ab1ffe7	[𝘀𝗽𝗿] initial version Created using spr 1.3.4	2024-06-25 08:05:29 -07:00
shaw young	32e4906c28	Revert "[BOLT] Hash-based function matching" (#96568 ) Reverts llvm/llvm-project#95821	2024-06-24 18:44:24 -04:00
shaw young	5e097c79d8	[BOLT] Hash-based function matching (#95821 ) Using the hashes of binary and profiled functions to recover functions with changed names. Test Plan: added hashing-based-function-matching.test.	2024-06-24 15:29:44 -07:00
shaw young	75ac887a30	[BOLT][NFC] Sync CommandLineArgumentReference with options (#96563 )	2024-06-24 15:16:52 -07:00
shaw young	68fc8dffe4	[BOLT] Drop high discrepancy profiles in matching (#95156 ) Summary: Functions with high discrepancy (measured by matched function blocks) can be ignored with an added command line argument for better performance. Test Plan: Added stale-matching-min-matched-block.test --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>	2024-06-17 15:14:35 -07:00
Elvina Yakubova	765ce86991	[BOLT][DOC] Add script for automatic user guide generation (#93822 )	2024-05-31 13:50:51 +01:00
Michael Kruse	c5a3f664fe	[BOLT] Revise IDE folder structure (#89742 ) Update the folder titles for targets in the monorepository that have not seen taken care of for some time. These are the folders that targets are organized in Visual Studio and XCode (`set_property(TARGET <target> PROPERTY FOLDER "<title>")`) when using the respective CMake's IDE generator. * Ensure that every target is in a folder * Use a folder hierarchy with each LLVM subproject as a top-level folder * Use consistent folder names between subprojects * When using target-creating functions from AddLLVM.cmake, automatically deduce the folder. This reduces the number of `set_property`/`set_target_property`, but are still necessary when `add_custom_target`, `add_executable`, `add_library`, etc. are used. A LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's root CMakeLists.txt.	2024-05-25 17:15:37 +02:00
Amir Ayupov	d1d9545ed3	[BOLT][BAT] Add entries for deleted basic blocks Deleted basic blocks are required for correct mapping of branches modified by SCTC. Increases BAT size, bytes: - large binary: 8622496 -> 8703244. - small binary (X86/bolt-address-translation.test): 928 -> 940. Test Plan: updated bb-with-two-tail-calls.s Reviewers: ayermolo, dcci, maksfb, rafaelauler Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/91906	2024-05-23 19:19:07 -07:00
Elvina Yakubova	dcf3102be8	[BOLT][NFC] Add documentation on BOLT options (#92117 ) Add .md file documentation with all BOLT options to display it more conveniently.	2024-05-15 16:16:39 +01:00
Amir Ayupov	b79b6f9cf0	[BOLT] Use offset deduplication for cold fragments Apply deduplication for uniformity and BAT section size reduction. Changes BAT section size to: - large binary: 39541552 bytes (1.02x original), - medium binary: 3828996 bytes (0.64x), - small binary: 928 bytes (0.65x). Test Plan: Updated bolt-address-translation.test Reviewers: rafaelauler, dcci, ayermolo, JDevlieghere, maksfb Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/87853	2024-04-15 09:50:12 +02:00
Amir Ayupov	1b763f230a	[BOLT] Add secondary entry points to BAT Provide secondary entry points for `EntryDiscriminator` call info field in YAML profile. Increases BAT section size to: - large binary: 39655300 bytes (1.03x the original), - medium binary: 3834328 bytes (0.65x), - small binary: 924 bytes (0.64x). Depends on: https://github.com/llvm/llvm-project/pull/76911 Test Plan: - Updated bolt-address-translation{,-yaml}.test - Added openssl test: https://github.com/rafaelauler/bolt-tests/pull/30 Reviewers: dcci, rafaelauler, maksfb, ayermolo Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/86218	2024-03-25 15:14:33 -07:00
Amir Ayupov	ceba3a38e8	[BOLT] Add number of basic blocks to BAT YAML profile reader checks the number of basic blocks in regular, no-stale-matching mode. Add it to BAT. This increases the size of BAT section to: - large binary: 39583080 bytes (1.02x of the original), - medium binary: 3816492 bytes (0.64x), - small binary: 920 bytes (0.64x, no change due to alignment). Test Plan: Updated bolt-address-translation-yaml.test Reviewers: rafaelauler, ayermolo, maksfb, dcci Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/86045	2024-03-22 08:46:48 -07:00
Amir Ayupov	b0e23639c5	[BOLT] Add BB index to BAT Add input basic block index to BAT metadata. This addresses the case where some basic blocks are eliminated, and output index is not equal to the input block index. These indices are used in non-stale-matching mode. Increases BAT section size to: - large binary: 39521512 bytes (1.02x original), - medium binary: 3799988 bytes (0.64x), - small binary: 920 bytes (0.64x). Test Plan: Updated bolt-address-translation{,-yaml}.test Pull Request: https://github.com/llvm/llvm-project/pull/86044	2024-03-22 08:42:58 -07:00
Amir Ayupov	f66d631bf8	Revert "[BOLT] Add BB index to BAT (#86044 )" This reverts commit `3b3de48fd8`.	2024-03-22 08:38:40 -07:00
Amir Ayupov	3b3de48fd8	[BOLT] Add BB index to BAT (#86044 )	2024-03-22 06:07:17 -07:00

1 2

73 Commits