When building the combined summary index during a thin link, we already performed a memory optimization for non-prevailing copies of a function by not recording their allocation and callsite info in the associated function summary. We can save on the thin link time as well by avoiding building the memprof summary structures just to throw them away later in the non-prevailing case. The reason we were eagerly building these structures is that the memprof summaries *precede* the corresponding function summary record, and we don't know whether this is the prevailing copy of the function until we parse the function summary record. To facilitate the new handling, we emit the memprof summary records *after* the corresponding function summary record. The bitcode summary version is bumped, and the reader is changed to support both versions, for backwards compatibility. Note that there is already a memprof test that tests an older record type and will also test reading of the legacy version of the ordering: (llvm/test/ThinLTO/X86/memprof-old-alloc-context-summary.ll. To make the new handling even more efficient, the lookup/insertion of stack IDs in the combined summary index and the caching of their corresponding stack index in the StackIdToIndex map is made lazy. This resulted in a 27% reduction in thin link time for a large target (21% without the lazy insertion change).
13 lines
255 B
LLVM
13 lines
255 B
LLVM
; Check summary versioning
|
|
; RUN: opt -module-summary %s -o - | llvm-bcanalyzer -dump | FileCheck %s
|
|
|
|
; CHECK: <GLOBALVAL_SUMMARY_BLOCK
|
|
; CHECK: <VERSION op0=13/>
|
|
|
|
|
|
|
|
; Need a function for the summary to be populated.
|
|
define void @foo() {
|
|
ret void
|
|
}
|