Files
llvm-project/llvm/test/Bitcode/thinlto-unicode-module-paths.test
Ben Dunbobbin 7ab6bc066c [ThinLTO] Preserve Unicode characters in module paths when writing the combined-index (#194320)
`IndexBitcodeWriter::writeModStrings()` serializes module path strings
into a `SmallVector<unsigned>` before emitting `MST_CODE_ENTRY` records.
When a path contains UTF-8 bytes with the high bit set, appending from
`StringRef::begin()/end()` can be incorrect. Instead, append the module
path through `bytes_begin()/bytes_end()`, so the bitcode writer always
serializes unsigned bytes.

Fixes: https://github.com/llvm/llvm-project/issues/194318 (#194318)

Based on work by @kbelochapka and @romanova-ekaterina.
2026-04-29 09:11:41 +01:00

19 lines
517 B
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Test that ThinLTO combined-index handles Unicode characters in module paths.
RUN: rm -rf %t && split-file %s %t && cd %t
RUN: opt -module-summary α.ll -o α.bc
RUN: llvm-lto -thinlto-action=thinlink -o index.bc α.bc
RUN: llvm-bcanalyzer -dump index.bc | FileCheck %s
CHECK: <MODULE_STRTAB_BLOCK
CHECK-NEXT: <ENTRY abbrevid=
## UTF-8 for "α.bc" is CE B1 2E 62 63.
CHECK-SAME: op1=206 op2=177 op3=46 op4=98 op5=99/>
#--- α.ll
target triple = "x86_64-unknown-linux-gnu"
define i32 @f() {
ret i32 0
}