llvm-project

Author	SHA1	Message	Date
Asher Dobrescu	7bce678ec1	[BOLT] Check if symbol is in data area of function (#160143 ) There are cases in which `getEntryIDForSymbol` is called, where the given Symbol is in a constant island, and so BOLT can not find its function. This causes BOLT to reach `llvm_unreachable("symbol not found")` and crash. This patch adds a check that avoids this crash.	2026-03-06 10:37:54 +00:00
YongKang Zhu	14bcb1a009	[BOLT] Make sure IOAddressMap exist before lookup (NFC) (#183184 ) `BinaryFunction::translateInputToOutputAddress()` contains fallback logic in case that querying `IOAddressMap` doesn't yield an output address. Because this function could be called in scenarios where `IOAddressMap` won't be set up, we should check if the map actually exists before lookup.	2026-03-01 23:27:39 -08:00
Maksim Panchenko	f80e3b3d7e	[BOLT] Keep folded functions in BinaryFunctions map. NFC (#180392 ) In relocation mode, keep folded functions in the BinaryFunctions map instead of erasing them. Mark them as folded using setFolded() and skip emitting them.	2026-02-10 14:56:26 -08:00
Gergely Bálint	4193c404ca	[BOLT][BTI] Disassemble PLT entries when processing BTI binaries (#169663 ) PLT entries are PseudoFunctions, and are not disassembled or emitted. For BTI, we need to check the first MCInst of PLT entries, to see if indirectly calling them is safe or not. This patch disassembles PLTs for binaries using BTI, while not changing the behaviour for binaries without BTI. The PLTs are only disassembled, not emitted. --------- Co-authored-by: Paschalis Mpeis <paschalis.mpeis@arm.com>	2026-01-16 07:40:05 +01:00
Maksim Panchenko	28ff941a2c	[BOLT][AArch64] Always cover veneers in lite mode (#171534 ) If a veneer is not disassembled in lite mode, the veneer elimination pass will not recognize it as such and the call to such veneer will remain unchanged. Later, we may need to insert a new veneer for such code ending up with a double veneer. To avoid such suboptimal code generation, always disassemble veneers and guarantee that they are converted to direct calls in BOLT.	2025-12-10 09:54:37 -08:00
Jinjie Huang	33e0301b07	[BOLT] Add validation for direct call/branch targets (#165406 ) In some edge cases, a binary may contain direct `branch` or `call` instructions whose target do not point to a valid executable instruction. This can occur due to compiler bugs, hand-written assembly, obfuscation technique, or when control flow targets a data by mistake. We also encountered the problems as described in this [issue](https://github.com/llvm/llvm-project/issues/149382), where "data in code" within OpenSSL's hand-written assembly was misidentified as instructions(island identification seems fail due to the absence of a corresponding data symbol). The problem occurred because a data sequence was incorrectly disassembled as a "jb" instruction. The point here is that the data should not be pointed to by any edge, so this patch tries to address this by validating the destination address for direct branches and calls. If the target instruction is invalid(implies a corrupted control flow), this function will be set ignored. Although this approach appears helpful for addressing the 'data in code' problem, its validation might be compromised if the data can be disassembled as normal instruction.	2025-12-09 16:17:19 +08:00
Maksim Panchenko	02482f4273	[BOLT] Properly validate relocations against internals of a function (#167451 ) Validation of data relocations targeting internals of a function was happening based on offsets inside a function. As a result, if multiple relocations were targeting the same offset, and one of the relocations was verified, e.g. as belonging to a jump table, then all relocations targeting the offset would be considered verified and valid. Now that we are tracking relocations pointing inside every function, we can do a better validation based on the location of the relocation. E.g., if a relocation belongs to a jump table only that relocation will be accounted for and other relocations pointing to the same address will be evaluated independently.	2025-12-06 14:39:08 -08:00
Maksim Panchenko	47de55f284	[BOLT] Minor code refactoring. NFC (#170746 )	2025-12-04 14:17:58 -08:00
YongKang Zhu	ac6daa8181	[BOLT][print] Add option '--print-only-file' (NFC) (#168023 ) With this option we can pass to BOLT names of functions to be printed through a file instead of specifying them all on command line.	2025-11-14 10:26:21 -08:00
YongKang Zhu	4cd16f2a0c	[BOLT][AArch64] Add more heuristics on epilogue determination (#167077 ) Add more heuristics to check if a basic block is an AArch64 epilogue. We assume instructions that load from stack or adjust stack pointer as valid epilogue code sequence if and only if they immediately precede the branch instruction that ends the basic block.	2025-11-10 09:50:44 -08:00
Maksim Panchenko	7af2b56dd5	[BOLT] Refactor undefined symbols handling. NFCI (#167075 ) Remove internal undefined symbol tracking and instead rely on the emission state of `MCSymbol` while processing data-to-code relocations. Note that `CleanMCState` pass resets the state of all `MCSymbol`s prior to code emission.	2025-11-07 19:42:05 -08:00
Maksim Panchenko	7c01a90545	[BOLT] Refactor handling of branch targets. NFCI (#165828 ) Refactor code that verifies external branch destinations and creates secondary entry points.	2025-10-31 08:56:30 -07:00
Jinjie Huang	6ba2127a5c	[BOLT] Add constant island check in scanExternalRefs() (#165577 ) The [previous patch](https://github.com/llvm/llvm-project/pull/163418) has added a check to prevent adding an entry point into a constant island, but only for successfully disassembled functions. Because scanExternalRefs() is also called when a function fails to be disassembled or is skipped, it can still attempt to add an entry point at constant islands. The same issue may occur if without a check for it So, this patch complements the 'constant island' check in scanExternalRefs().	2025-10-31 10:29:00 +08:00
Maksim Panchenko	cd27741c11	[BOLT] Remove CreatePastEnd parameter in getOrCreateLocalLabel(). NFC (#165065 ) CreatePastEnd parameter had no effect on the label creation. Remove it.	2025-10-25 22:16:15 -07:00
YongKang Zhu	e1ae126401	[BOLT][AArch64] Validate code padding (#164037 ) Check whether AArch64 function code padding is valid, and add an option to treat invalid code padding as error.	2025-10-22 20:25:06 -07:00
Christian Clauss	0fc05aa1c6	[bolt] Fix typos discovered by codespell (#124726 ) https://github.com/codespell-project/codespell ```bash codespell bolt --skip="*.yaml,Maintainers.txt" --write-changes \ --ignore-words-list=acount,alledges,ans,archtype,defin,iself,mis,mmaped,othere,outweight,vas ```	2025-10-14 14:45:40 +02:00
Gergely Bálint	889bfd9172	Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353 ) (#162435 ) Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#120064)" (#162353) This reverts commit `c7d776b068`. #120064 was reverted for breaking builders. Fix: changed the mismatched type in MarkRAStates.cpp to `auto`. --- Original message: OpNegateRAState is an AArch64-specific DWARF CFI used to change the value of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records whether the current return address has been signed with PAC. OpNegateRAState requires special handling in BOLT because its placement depends on the function layout. Since BOLT reorders basic blocks during optimization, these CFIs must be regenerated after layout is finalized. This patch introduces two new passes: - MarkRAStates (runs before optimizations): assigns a signedness annotation to each instruction based on OpNegateRAState CFIs in the input binary. - InsertNegateRAStates (runs after optimizations): reads the annotations and emits new OpNegateRAState CFIs where RA state changes between instructions. Design details are described in: `bolt/docs/PacRetDesign.md`.	2025-10-08 11:05:41 +02:00
Gergely Bálint	c7d776b068	Revert "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353 ) Reverts llvm/llvm-project#120064. @gulfemsavrun reported that the patch broke toolchain builders.	2025-10-07 21:59:18 +02:00
Gergely Bálint	32eaf5b59c	[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#120064 ) OpNegateRAState is an AArch64-specific DWARF CFI used to change the value of the RA_SIGN_STATE pseudoregister. The RA_SIGN_STATE register records if the current return address has been signed with PAC. OpNegateRAState requires special handling in BOLT because its placement depends on the function layout. Since BOLT reorders basic blocks during optimization, these CFIs must be regenerated after layout is finalized. This patch introduces two new passes: - MarkRAStates (runs before optimizations): assigns a signedness annotation to each instruction based on OpNegateRAState CFIs in the input binary. - InsertNegateRAStates (runs after optimizations): reads the annotations and emits new OpNegateRAState CFIs where RA state changes between instructions. Design details are described in: `bolt/docs/PacRetDesign.md`.	2025-10-07 10:22:14 +02:00
YongKang Zhu	1e302e942c	[BOLT] Add heuristics to determine constant island's alignment (#159486 ) Constant island embedded in text section doesn't have its alignment information from input binary and we currently set its alignment as 8 bytes. Constant island might be given a much larger alignment due to performance or other reasons, so this change adds some heuristics to determine its alignment based on its size, original address from input binary and its owning section's alignment.	2025-09-25 13:23:42 -07:00
Maksim Panchenko	9469ea216b	[BOLT] Avoid n^2 complexity in fixBranches(). NFCI (#160407 ) Iterator implementation of PR #156243: This improves BOLT runtime when optimizing rustc_driver.so from 15 minutes to 7 minutes (or 49 minutes to 37 minutes of userspace time). Co-authored-by: Mark-Simulacrum <mark.simulacrum@gmail.com>	2025-09-23 16:07:54 -07:00
Grigory Pastukhov	8c0f3b6e8f	[BOLT] Fix debug line emission for functions in multiple compilation units (#151230 ) This patch fixes a bug in BOLT's debug line emission where functions that belong to multiple compilation units (such as inline functions in header files) were not handled correctly. Previously, BOLT incorrectly assumed that a binary function could belong to only one compilation unit, leading to incomplete or incorrect debug line information. ### Problem When a function appears in multiple compilation units (common scenarios include): * Template instantiated functions * Inline functions defined in header files included by multiple source files BOLT would only emit debug line information for one compilation unit, losing debug information for other CUs where the function was compiled. This resulted in incomplete debugging information and could cause debuggers to fail to set breakpoints or show incorrect source locations. ### Root Cause The issue was in BOLT's assumption that each binary function maps to exactly one compilation unit. However, when the same function (e.g., an inline function from a header) is compiled into multiple object files, it legitimately belongs to multiple CUs in the final binary.	2025-09-11 10:41:11 -07:00
YongKang Zhu	a9c1ae8672	[BOLT][AArch64] Fix another cause of extra entry point misidentification (#155055 )	2025-08-27 00:15:46 -07:00
Maksim Panchenko	e665cf3976	[BOLT] Fix handling of ambiguous jump table entries (#155291 ) Jump tables may contain entries that point immediately past the end of their parent function. Normally, such entries are generated by the compiler as a result of builtin_unreachable() case. We used to replace those entries with a label belonging to their parent function assuming the destination doesn't matter if it's an undefined behavior. However, if such entry is at the end of the jump table, it could be a real function pointer, not a jump table entry. We rely on heuristics to detect such cases and can drop the trailing function pointer entries from the table. The problem presents when the "unreachable" ambiguous entry is followed by another ambiguous entry corresponding to the start of the parent function. In this case we accept pointers as entries and may incorrectly update the function pointer. The solution is to keep ambiguous "unreachable" jump table entries identical to the original input, i.e. point to the same function. This change does not affect CFG, but results in the entries being updated with the new function address if it gets relocated.	2025-08-25 17:13:30 -07:00
YongKang Zhu	5c4f506cca	[BOLT] Validate extra entry point by querying data marker symbols (#154611 ) Look up marker symbols and decide whether candidate is really extra entry point in `adjustFunctionBoundaries()`.	2025-08-20 14:18:56 -07:00
Fangrui Song	109b7d965c	MC: Remove unneeded VK_None argument to MCSymbolRefExpr::create calls The MCSymbolRefExpr::create overload with the specifier parameter is discouraged and being phased out. Expressions with relocation specifiers should use MCSpecifierExpr instead.	2025-06-27 21:22:46 -07:00
Amir Ayupov	0c77468288	[BOLT] Expose external entry count for functions (#141674 ) Record the number of function invocations from external code - code outside the binary, which may include JIT code and DSOs. Accounting external entry counts improves the fidelity of call graph flow conservation analysis. Test Plan: updated shrinkwrapping.test	2025-06-10 14:31:22 -07:00
Maksim Panchenko	06f13f8684	[BOLT] Fix references in ignored functions in CFG state (#140678 ) When we call setIgnored() on functions that already have CFG built, these functions are not going to get emitted and we risk missing external function references being updated. To mitigate the potential issues, run scanExternalRefs() on such functions to create patches/relocations. Since scanExternalRefs() relies on function relocations, we have to preserve relocations until the function is emitted. As a result, the memory overhead without debug info update could reach up to 2%.	2025-06-02 12:33:54 -07:00
Maksim Panchenko	778801cc84	[BOLT] Never call fixBranches() on non-simple functions (#141112 ) We should never call fixBranches() on a function with invalid CFG. E.g., ValidateInternalCalls modifies CFG for its internal analysis purposes. At the same time, it marks the function as non-simple with an assumption that fixBranches() will never run on that function. However, calculateEmittedSize() by default calls fixBranches() which can lead to all sorts of issues, including assertions firing in fixBranches(). The fix is to use the original size for non-simple functions in calculateEmittedSize() since we are supposed to emit the function unmodified. Additionally, add an assertion at the start of fixBranches().	2025-05-22 14:01:54 -07:00
Kazu Hirata	7c8b39740b	[BOLT] Use llvm::is_contained (NFC) (#140984 )	2025-05-21 20:32:09 -07:00
Maksim Panchenko	51e222ef48	[BOLT][AArch64] Fix crash for conditional tail calls (#140669 ) When conditional tail call is located in old code while BOLT is operating in lite mode, the call will require optional pending relocation with a type that is currently not supported resulting in a build-time crash. Before a proper fix is implemented, ignore conditional tail calls for relocation purposes and mark their target functions to be patched, i.e. to be served as veneers/thunks.	2025-05-20 10:38:00 -07:00
Kazu Hirata	e401fb8c47	[BOLT] Use llvm::replace (NFC) (#140199 )	2025-05-16 07:30:29 -07:00
Amir Ayupov	0289ca09be	[BOLT] Print heatmap from perf2bolt (#139194 ) Add perf2bolt `--heatmap` option to produce heatmaps during profile aggregation. Distinguish exclusive mode (`llvm-bolt-heatmap`) and optional mode (`perf2bolt --heatmap`), which impacts perf.data handling: exclusive mode covers all addresses, whereas optional mode consumes attached profile only covering function addresses. Test Plan: updated per2bolt tests: - pre-aggregated-perf.test: pre-aggregated data, - bolt-address-translation-yaml.test: pre-aggregated + BOLTed input, - perf_test.test: no-LBR perf data.	2025-05-13 13:23:18 -07:00
Amir Ayupov	e039d16ee5	[BOLT][NFC] Disambiguate sample as basic sample (#139350 ) Sample is a general term covering both basic (IP) and branch (LBR) profiles. Find and replace ambiguous uses of sample in a basic sample sense. Rename `RawBranchCount` into `RawSampleCount` reflecting its use for both kinds of profile. Rename `PF_LBR` profile type as `PF_BRANCH` reflecting non-LBR based branch profiles (non-brstack SPE, synthesized brstack ETM/PT). Follow-up to #137644. Test Plan: NFC	2025-05-12 17:15:16 -07:00
Maksim Panchenko	254c13d872	[BOLT][AArch64] Patch functions targeted by optional relocs (#138750 ) On AArch64, we create optional/weak relocations that may not be processed due to the relocated value overflow. When the overflow happens, we used to enforce patching for all functions in the binary via --force-patch option. This PR relaxes the requirement, and enforces patching only for functions that are target of optional relocations. Moreover, if the compact code model is used, the relocation overflow is guaranteed not to happen and the patching will be skipped.	2025-05-08 10:53:47 -07:00
Gergely Bálint	5b20b5721a	[BOLT][AArch64] Allow binary-analysis and heatmap tool to run with pac-ret binaries (#136664 ) OpNegateRAState support is only needed for tools that produce binaries.	2025-04-30 13:41:11 +01:00
Kazu Hirata	c6e7bb19f7	[BOLT] Use llvm::unique (NFC) (#136513 )	2025-04-20 18:29:51 -07:00
YongKang Zhu	823adc7a2d	[BOLT] Validate secondary entry point (#135731 ) Some functions have their sizes as zero in input binary's symbol table, like those compiled by assembler. When figuring out function sizes, we may create label symbol if it doesn't point to any constant island. However, before function size is known, marker symbol can not be correctly associated to a function and therefore all such checks would fail and we could end up adding a code label pointing to constant island as secondary entry point and later mistakenly marking the function as not simple. Querying the global marker symbol array has big throughput overhead. Instead we can run an extra check when post processing entry points to identify such label symbols that actually point to constant islands.	2025-04-15 13:19:15 -07:00
Paschalis Mpeis	3d24046b33	[BOLT] Skip out-of-range pending relocations (#116964 ) When a pending relocation is created it is also marked whether it is optional or not. It can be optional when such relocation is added as part of an optimization (i.e., `scanExternalRefs`). When bolt tries to `flushPendingRelocations`, it safely skips any optional relocations that cannot be encoded due to being out of range. A pre-requisite to that is the usage of the `-force-patch` flag. Alternatrively, BOLT will bail out with a relevant message. Background: BOLT, as part of scanExternalRefs, identifies external references from calls and creates some pending relocations for them. Those when flushed will update references to point to the optimized functions. This optimization can be disabled using `--no-scan`. BOLT can assert if any of these pending relocations cannot be encoded. This patch does not disable this optimization but instead selectively applies it given that a pending relocation is optional and `-force-patch` was enabled.	2025-04-04 17:31:14 +01:00
Alexey Moksyakov	19a319667b	[bolt][aarch64] Adding test with unsupported indirect branches (#127655 ) This test contains the set of common indirect branch patterns. Adding the support will be step by step	2025-04-01 13:49:09 +03:00
Kazu Hirata	0c7be9392f	[BOLT] Use *Set::insert_range (NFC) (#133601 )	2025-03-29 16:52:16 -07:00
Maksim Panchenko	96e5ee23a7	[BOLT][AArch64] Add partial support for lite mode (#133014 ) In lite mode, we only emit code for a subset of functions while preserving the original code in .bolt.org.text. This requires updating code references in non-emitted functions to ensure that: * Non-optimized versions of the optimized code never execute. * Function pointer comparison semantics is preserved. On x86-64, we can update code references in-place using "pending relocations" added in scanExternalRefs(). However, on AArch64, this is not always possible due to address range limitations and linker address "relaxation". There are two types of code-to-code references: control transfer (e.g., calls and branches) and function pointer materialization. AArch64-specific control transfer instructions are covered by #116964. For function pointer materialization, simply changing the immediate field of an instruction is not always sufficient. In some cases, we need to modify a pair of instructions, such as undoing linker relaxation and converting NOP+ADR into ADRP+ADD sequence. To achieve this, we use the instruction patch mechanism instead of pending relocations. Instruction patches are emitted via the regular MC layer, just like regular functions. However, they have a fixed address and do not have an associated symbol table entry. This allows us to make more complex changes to the code, ensuring that function pointers are correctly updated. Such mechanism should also be portable to RISC-V and other architectures. To summarize, for AArch64, we extend the scanExternalRefs() process to undo linker relaxation and use instruction patches to partially overwrite unoptimized code.	2025-03-27 21:33:25 -07:00
Maksim Panchenko	bac21719a8	[BOLT] Pass unfiltered relocations to disassembler. NFCI (#131202 ) Instead of filtering and modifying relocations in readRelocations(), preserve the relocation info and use it in the symbolizing disassembler. This change mostly affects AArch64, where we need to look at original linker relocations in order to properly symbolize instruction operands.	2025-03-14 18:44:33 -07:00
Paschalis Mpeis	2f9d94981c	[BOLT] Change Relocation Type to 32-bit NFCI (#130792 )	2025-03-14 18:15:59 +00:00
chrisPyr	038fff3f24	[NFC][BOLT] Make file-local cl::opt global variables static (#126472 ) #125983	2025-03-05 22:11:05 -08:00
Maksim Panchenko	b971d4d7c8	[BOLT][AArch64] Add symbolizer for AArch64 disassembler. NFCI (#127969 ) Add AArch64MCSymbolizer that symbolizes `MCInst` operands during disassembly. The symbolization was previously done in `BinaryFunction::disassemble()`, but it is also required by `scanExternalRefs()` for "lite" mode functionality. Hence, similar to x86, I've implemented the symbolizer interface that uses `BinaryFunction` relocations to properly create instruction operands. I expect the result of the disassembly to be identical after the change. AArch64 disassembler was not calling `tryAddingSymbolicOperand()` for `MOV` instructions. Fix that. Additionally, the disassembler marks `ldr` instructions as branches by setting `IsBranch` parameter to true. Ignore the parameter and rely on `MCPlusBuilder` interface instead. I've modified `--check-encoding` flag to check symolization of operands of instructions that have relocations against them.	2025-03-03 12:44:28 -08:00
Maksim Panchenko	074c2c6713	[BOLT] Refactor MCInst target symbol lookup. NFCI (#129131 ) In analyzeInstructionForFuncReference(), use MCPlusBuilder interface while scanning symbolic operands of MCInst. Should be NFC on x86, but will make the function work on other architectures. Note that it's currently unused on non-x86 as its functionality is exclusive to safe ICF that runs on x86 only.	2025-02-28 17:57:54 -08:00
Amir Ayupov	3968ebd00d	[BOLT] Keep multi-entry functions simple in aggregation mode (#128253 ) BOLT used to mark multi-entry functions non-simple in non-relocation mode with the reasoning that we can't move them due to potentially undetected references. However, in aggregation mode it doesn't apply as BOLT doesn't perform optimizations. Relax this constraint in case of an aggregation job. Test Plan: added entry-point-fallthru.s	2025-02-25 10:53:45 -08:00
YongKang Zhu	9fa77c1854	[BOLT][Linker][NFC] Remove lookupSymbol() in favor of lookupSymbolInfo() (#128070 ) Sometimes we need to know the size of a symbol besides its address, so maybe we can start using the existing `BOLTLinker::lookupSymbolInfo()` (that returns symbol address and size) and remove `BOLTLinker::lookupSymbol()` (that only returns symbol address). And for both we need to check return value as it is wrapped in `std::optional<>`, which makes the difference even smaller.	2025-02-20 17:14:33 -08:00
Maksim Panchenko	0ba391a85f	[BOLT] Improve constant island disassembly (#127971 ) * Add label that identifies constant island. * Support cases where the island is located after the function.	2025-02-20 11:16:01 -08:00

1 2 3 4 5

212 Commits