`IndexBitcodeWriter::writeModStrings()` serializes module path strings into a `SmallVector<unsigned>` before emitting `MST_CODE_ENTRY` records. When a path contains UTF-8 bytes with the high bit set, appending from `StringRef::begin()/end()` can be incorrect. Instead, append the module path through `bytes_begin()/bytes_end()`, so the bitcode writer always serializes unsigned bytes. Fixes: https://github.com/llvm/llvm-project/issues/194318 (#194318) Based on work by @kbelochapka and @romanova-ekaterina.
19 lines
517 B
Plaintext
19 lines
517 B
Plaintext
## Test that ThinLTO combined-index handles Unicode characters in module paths.
|
||
|
||
RUN: rm -rf %t && split-file %s %t && cd %t
|
||
RUN: opt -module-summary α.ll -o α.bc
|
||
RUN: llvm-lto -thinlto-action=thinlink -o index.bc α.bc
|
||
RUN: llvm-bcanalyzer -dump index.bc | FileCheck %s
|
||
|
||
CHECK: <MODULE_STRTAB_BLOCK
|
||
CHECK-NEXT: <ENTRY abbrevid=
|
||
## UTF-8 for "α.bc" is CE B1 2E 62 63.
|
||
CHECK-SAME: op1=206 op2=177 op3=46 op4=98 op5=99/>
|
||
|
||
#--- α.ll
|
||
target triple = "x86_64-unknown-linux-gnu"
|
||
|
||
define i32 @f() {
|
||
ret i32 0
|
||
}
|