Files
llvm-project/lld/ELF/InputFiles.cpp
Fangrui Song 83f8eee57d [ELF] Parallelize input file loading (#191690)
During `createFiles`, `addFile()` records a `LoadJob` for each
non-script input (archive, relocatable, DSO, bitcode, binary) with a
state-machine snapshot (`inWholeArchive`, `inLib`, `asNeeded`,
`withLOption`, `groupId`) and expands them on worker threads in
`loadFiles()`. Linker scripts are still processed inline since their
`INPUT()` and `GROUP()` commands recursively call `addFile()`.

Outside `createFiles()`, `loadFiles()` is called with a single job and
drained immediately (`deferLoad` is false). Two cases:
- `addDependentLibrary()`: `.deplibs` sections trigger `addFile()`
  during the serial `doParseFiles()` loop.
- `--just-symbols`: pushes files directly, bypassing
`addFile`/`LoadJob`.

Thread-safety:
- A mutex serializes `BitcodeFile` / fatLTO constructors that call
  `ctx.saver` / `ctx.uniqueSaver`. Zero contention on pure ELF links.
- Thin-archive member buffers accumulate in per-job `SmallVector`s and
  are merged into `ctx.memoryBuffers` in command-line order.
- `groupId` is pre-claimed during the serial walk and written to each
  produced file after construction (the `InputFile` constructor no
  longer reads `nextGroupId`).

Performance (--threads=8):

```
  clang-relassert (267 thin archives, 10 .o, 2 .so):
    965 +/- 32 ms -> 924 +/- 24 ms (1.05x, 80 runs)

    (Apple M4) 249.7ms ± 2.5ms -> 221.2ms ± 1.4ms (1.13x, 10 runs)

  chromium (532 .a, 3314 .o, 343 .so):
    8.071 +/- 0.472 s -> 7.370 +/- 0.198 s (1.10x, 20 runs)
```

Parallelizing all file kinds (not just archives) matters for
.o-dominated workloads like chromium where archive-only parallelization
shows no improvement.

Output is byte-identical to the old lld and deterministic across
`--threads` values.
2026-04-20 21:07:34 -07:00

77 KiB