Also move the equivalent helper function from ValueObjectConstResult to
ValueObject.h where it more properly belongs.
This patch is necessary if one were to use ValueObjectMemory for a
synthetic child. There aren't any current uses of this sort in lldb,
though there are on the swift fork.
The DW_OP_piece case was the deepest-nested body in
DWARFExpression::Evaluate, with a switch inside a switch. Move it to a
static Evaluate_DW_OP_piece helper, following the pattern already used
for Evaluate_DW_OP_deref_size and Evaluate_DW_OP_entry_value.
This PR improves robustness in capturing when user's intent is to treat
OpenACC region as sequential. It does so in the following ways:
- Ensure that `seq` acc.par_width is explicitly used when region is
serial. Previously it was not assigning any acc.par_width which causes
ambiguities because that way it is indistinguishable whether a region is
explicitly serial vs whether the region needs implicitly assigned
parallelism.
- Treas `acc parallel` and `acc kernels` with `num_gangs(1)`
`num_workers(1)` `vector_length(1)` exactly the same as `acc serial`.
This is because these are all parallelism dimensions expressible with
OpenACC clauses and being all set to 1 makes the semantics consistent
with those defined for `acc serial`.
Using Python's built-in help shows an empty screen when the statusline
is enabled. The issue is the pydoc pager (e.g. less) which doesn't play
nice with the statusline. Use the "plain" pager instead.
I considered making this conditional on the statusline, but to do that
right you would need to register a callback that toggles it every time
the setting changes and that doesn't seem worth the complexity.
Fixes#166610
This is extremely similar to getcpu, but was available in a much earlier
glibc, so a lot more code depends on it. Do a similar implementation. We
can only have a simple smoke test as the only documented failure mode in
the man page is running on a kernel that does not support the system
call, and such kernels (<2.6) are ancient at this point.
This patch is the second part in a patch series that will allow
enabling/disabling InstrumentationRuntime plugins in a running debug
session.
This patch contains a set of NFC changes that will be needed when
implementing support for enabling/disabling InstrumentationRuntime
plugins in the `debugger` and `target` domains.
In particular
* In several places the `PluginManager` was modified to get access to
information about disabled plugins. Previously interfaces would only
return information about enabled plugins.
* `InstrumentationRuntimeInstances` is now struct rather than a typedef
so it can contain a helper method that allows looking up a
`InstrumentationRuntimeGetType` callback based on the plugin name. This
is very similar to the existing `PluginsInstances::GetCallbackForName`
method.
rdar://167725878
This patch is the first part in a patch series that will allow
enabling/disabling InstrumentationRuntime plugins in a running debug
session.
This part adds the `--domain` flag to the `enable`, `disable`, `list`
sub commands of `plugin` shell command and plumbs the value of this flag
to where it will be needed in a subsequent patch. From the user
perspective the flag does nothing useful yet because all values passed
to the flag except `global` (the default and what represents LLDB's
existing behavior) are rejected. Subsequent patches will allow the flag
to do something useful.
The `--domain` flag adds a notion of "domain" to plugins with respect to
their enablement. Previously all plugins were treated as global and have
their enablement stored globally. This is despite the fact that some
plugins clearly are not global. For example the
`instrumentation-runtime` plugins clearly exist on a per-target basis
(the instances of the `InstrumentationRuntime` exist in each process).
In addition to this plugins being "global" means instances of the
`Debugger` instance are not properly isolated from each other. This PR
is a stepping stone towards fixing these design problems. The PR
introduces three different domains for plugins:
* `global` - Enablement of the plugin can be controlled globally. This
is the existing behavior of all LLDB plugins.
* `debugger` - Enablement of the plugin can be controlled on a per
`Debugger` basis.
* `target` - Enablement of the plugin can be controlled on a per
`Target` basis.
These values are encoded in the new `PluginDomainKind` enum.
It is important to note that the design in this PR means a plugin can
support more than one domain. In particular in future patches when
`instrumentation-runtime` plugins gain support for more than just the
`global` domain they will support the `debugger` and `target` domain as
well. The key reason that the `instrumentation-runtime` plugins need to
support more than one domain is that the plugins need a default
enablement value **before** the target exists. That default value will
need to come from the `global` domain. Architecturally it should
probably come from the `debugger` domain instead but refactoring
enablement into Debugger instances is much too large a refactor for this
patch series and is a problem that can be tackled later.
This patch modifies the `PluginNamespace` struct to:
* Store the set of domains supported by the namespace and provided some
helper methods to determine what is supported.
* Store one of two callbacks. Either `SetPluginEnabledGlobalDomain` (the
existing function interface used by most plugins) or
`SetPluginEnabledAllDomains` (a new interface used by
`InstrumentationRuntime` plugins).
In this patch the `InstrumentationRuntime` plugins use the new
`SetPluginEnabledAllDomains` function interface for enablement (i.e. the
interface of `PluginManager::SetInstrumentationRuntimePluginEnabled` has
changed) which passes the `Debugger` instance that made the request and
the domain the user provided to the `plugin enable` or `plugin disable`
command.
To make this patch easier to review the
`PluginManager::SetInstrumentationRuntimePluginEnabled` function
actually rejects all domains except `global` to keep the behavior change
down to a minimum. Proper support for enabling/disabling
instrumentation-runtime plugins in the `target`, and `debugger` domains
will be implemented in a subsequent patch.
The `plugin list` command implementations also reject any domain that
isn't `global`. Support for other domains will be added in the
subsequent patch that adds support for other domains in the
`instrumentation-runtime` plugins.
The `plugin enable`, `plugin disable`, `plugin list` commands will use
the `global` domain by default so that there is no behavior change for
existing workflows.
Two new shell tests are included that exercise the new code paths:
* `command-plugin-enable-disable-domain-flag.test` validates that
`--domain global` works for both global-only and multi-domain plugin
namespaces, and that `--domain debugger` and `--domain target` are
correctly rejected for now.
* `command-plugin-list-domain-flag.test` validates the same behavior for
the list command in both text and JSON output modes.
I am not experienced at adding flags to LLDB shell commands so I had
Claude Code write that part and also help write test cases.
Assisted-by: Claude Code
rdar://167725878
Reverts llvm/llvm-project#194501 as this is triggering an assert in CI:
```
Assertion failed: ((!HasTemplateParamsInName || Tag != dwarf::DW_TAG_subprogram) && "subprogram with template-like name should have a linkage name"), function getChildDeclContext, file DWARFLinkerDeclContext.cpp, line 114.
```
Unfortunately, this still ends up in a slightly awkward place between
Sema and CG, since a few CG phases create implicit parameters (e.g. for
`this`) which also need to be deduced into the correct address space by
Sema. This is intended to be clearly extensible for other targets that
also need this.
Changes the constructor for ImplicitParamDecl to be private again, so
that all users will go through the Create method, by making the object
ctor itself declared `protected` (like all the other VarDecl subtypes).
The memory is later cleaned up by the ASTContext bump allocator, and
since the stack is basically also a bump allocator, is is typically
equally fast. (Reverts 550d13aebb)
(If I got my commit stacked extraction right) This should allow removing
the special cases for OpenCL from EmitParmDecl once
https://github.com/llvm/llvm-project/pull/181390 lands, since this
aligns the behavior of that function with the declared intent of each
VarDecl.
This changed many tests because previously OpenCL just assumed that
allocations for parameters were actually made in the addrspace of Ty,
but didn't actually check against that properly, resulting in some
unnecessary copies. That will be fixed even more completely in
https://github.com/llvm/llvm-project/pull/181390 even more, removing
also the unnecessary addrspace casts.
#125791 introduced `strict-pack-match` flag for
`ClassTemplateSpecializationDecl`, and covered it with AST dump tests in
`ast-dump-template.cpp`, in both textual and JSON formats. However, JSON
test was generated by a script, which made it overspecified. This PR
extracts the relevant part of ≈9200 lines of FileCheck directives.
When not in compact-code-model the longjump pass may consider certain
branches in range, but later at JITLink hugify forces them out of range
probably because it aligns hot code at runtime.
Adds missing description to the decompose interface transform op.
The main objective is to clarify which values the op's return handle
contains without having to cross-reference AggregatedOpInterface.
Assisted-by: Claude
Add ABITypeMapper and ABIRewriteContext as the dialect-agnostic
bridge between MLIR dialects and the LLVM ABI Lowering Library.
ABITypeMapper maps MLIR built-in types (integer, float, vector,
index, memref) to abi::Type* using DataLayout for sizes and
alignment. Dialect-specific types fall back to integer mapping
via DataLayoutTypeInterface.
ABIRewriteContext defines the abstract interface that each dialect
(CIR, FIR) implements to rewrite function definitions and call
sites after ABI classification. See the CIR ABI lowering design
document (clang/docs/ClangIRABILowering.md, Section 4) for the
architectural context.
Unit tests for both components (18 test cases).
When debugging PExpect tests, the 60 second timeout can make that
process rather tedious. For TestStatusline, I used a class variable to
easily override it while iterating but the idea is applicable more
generally.
The parallel DWARF linker deduplicates types across compile units using
a shared TypePool. When multiple CUs define the same type,
allocateTypeDie uses compare_exchange_strong to race for setting the
canonical DIE. The first thread to succeed stores the DIE and clones its
attributes, while subsequent threads use it the canonical one. Which
thread wins depends on OS thread scheduling, making the output
non-deterministic.
This PR fixes the non-determinism by assigning each CompileUnit a
priority based on its position in the link order (object file index, CU
index within the file). When a CU wants to mark DIE as canonical, it
acquires the spinlock, and only stores its DIE if its priority is
strictly lower than the current canonical DIE. This ensures that the
canonical DIE is always the lowest-priority (i.e. first) CU that defines
that type. The replaced DIE is leaked into the bump allocator and the
existing DebugTypeDeclFilePatch and accelerator record filters skips the
orphaned DIEs via getFinalDie() checks.
This PR also removes the AllowNonDeterministicOutput option, which was
never set in the first place, and is now obsolete.
`DW_OP_addr_sect_offset4` is not a real DWARF opcode; it was a
proprietary LLDB proposal that was never adopted (and has no llvm::dwarf
constant). The same shared-library sliding problem is handled today by
evaluating DW_OP_addr as a FileAddress and converting via
Value::ConvertToLoadAddress.
Convert five tests to use new HLFIR lowering instead of legacy FIR
lowering:
Lower/Intrinsics/c_f_pointer.f90, Lower/Intrinsics/c_loc.f90,
Lower/default-initialization-globals.f90, Lower/cray-pointer.f90,
Lower/loops.f90
SIFixSGPRCopies was incorrectly handling inline assembly operands with
SGPR ("s") constraints when the value came from a memory load (which
produces a VGPR). The pass would fail to insert the necessary
v_readfirstlane instruction instead directly passes the vgpr value.
example:
asm sideeffect buffer_load_dwordx4 $0, $1, $2, 0 =v,v,s,n
previously it generated:
buffer_load_dwordx4 v[0:3], v0, v[8:11] (but sgpr is expected), 0 offen
The fix adds readfirstlanes during lowering when there is a copy from
divergent register to SGPR.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Uses `FindAvailableLoadedValue` to resolve load instructions in call
arguments to constants before inline cost analysis. This gives the
inliner more precise cost estimate and option to inline functions which
would not be inlined otherwise.
The `-O3` doesn't inline empty `std::set` and `std::map` because node
deletion is recursive. The inliner doesn't know that `nullptr` is passed
in as it is a `load` from a member.
This addresses both `libstdc++` and `libc++`:
- `libstdc++` - `FindAvailableLoadedValue` requires `MaxInstToScan=0`,
because relevant store is 7 instructions away and `DefMaxInstsToScan =
6`. Benchmarking on large LLVM TUs showed no measurable compile-time
difference between limit=6 and whole basic block
- `libc++` - uses `memset` to zero all members in ctor, this patch
handles only `memset` to zero (the type mismatch case), which could be
generalized but seems very rare
The store-to-load pattern is created and consumed within the same CGSCC
inliner invocation: the ctor is inlined first (creating stores to the
object), and then the dtor's inline cost is evaluated (seeing loads from
the same object). No pass has an opportunity to simplify the IR in
between.
The `-flto` build eliminates empty `std::set` because the IR is
simplified enough in the regular optimization pass. However, when the
code is not header-only in a different TU, `-flto` doesn't help.
The change is much more general than just `std::set` and `std::map`. I
saw several impacts of it on LLVM codebase with `-O3`. Some function
reduce in size due to better dead-code elimination. Some increase due to
more aggressive inlining opportunities, and some are greatly simplified.
In my experiments I saw no measurable regression in compile times
compiling many large LLVM TUs. I measured ~1% faster compilation due to
following opt passes being faster. However, this needs more benchmarks.
Closes#183994
While adding implementation status for nl_types.h, I noticed docgen
resolves it to nl-types.h instead of nl_types.h. As a result, headers
with underscores are not matched correctly and their implementation
status is not marked.
This patch fixes the handling of underscored header names in docgen so
they are processed consistently.
Added ErrorOr-returning syscall wrappers for mmap, munmap, mprotect, and
pkey_mprotect in src/__support/OSUtil/linux/syscall_wrappers/. Migrated
the sys/mman Linux entrypoint implementations to use them, following the
design in libc/docs/dev/syscall_wrapper_refactor.rst.
Removed the shared mprotect_common.h in favour of per-syscall wrapper
headers. Added hdr/sys_mman_macros.h proxy header.
Deleted check.rst, Helpers/Styles.rst, dev/cmake_build_rules.rst, and
dev/clang_tidy_checks.rst. Moved the |check| substitution into
rst_prolog in conf.py so it is available globally without per-file
include directives.
Removed all '.. include:: check.rst' lines from hand-written header docs
and from the docgen.py generator that emits them for auto-generated
header pages.
Merged the clang-tidy checks documentation into code_style.rst under a
new 'Static Analysis & Clang-Tidy' section, preserving the
_clang_tidy_checks label for existing cross-references.
Updated code examples in both libc docs and the upstream clang-tidy
check docs to replace the stale LLVM_LIBC_ENTRYPOINT macro with the
current LLVM_LIBC_FUNCTION macro.
Updated dev/index.rst to drop the two deleted toctree entries.
The extern "C" declaration of aligned_alloc in the proxy header lacked a
noexcept specifier, producing warnings when compiled as C++. Added a
__cplusplus guard so C++ gets noexcept while C compilation remains
unaffected.
The getMemBuffer() has a default parameter RequiresNullTerminator which
is set to true.
In ModuleCache the MemoryBuffer::getOpenFile is called with /*
RequiresNullTerminator=*/false. This means that initial contents of the
MemoryBuffer may not have a trailing 0x0 at the end of the file.
When assertions are enabled and RequiresNullTerminator is true the
MemoryBuffer will trigger a "Buffer is not null terminated!" assertion
failure if BufEnd[0] != 0.
We have at one build with assertions enabled that is triggering this
MemoryBuffer assertion failure in the check-clang tests:
* ClangScanDeps/modules-dep-args.c
* Driver/modules-driver-import-std.cpp
The failure is specific to one particular machine, we have not been able
to reproduce locally. It is possible that the failure is filesystem type
or path length dependent.
Changing the RequiresNullTerminator in getMemBuffer to false to match
the value of RequiresNullTerminator in getOpenFile fixes the problem and
all tests pass.
`WAIT_ASYNCMARK` emits no bytes but was inheriting `Size = 4` from
`SOPP_Pseudo`.
Without the fix, #194362 causes: `Size mismatch for: WAIT_ASYNCMARK 1
Expected exact size: 4 Actual size: 0`
---------
Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>
Teach `DAGCombiner::reduceLoadWidth` to look through freeze SDNodes when
narrowing loads. The narrowed result is then wrapped in freeze to
preserve the original semantics. Currently, several folds were blocked
by the freeze:
```
and(freeze(load), 0xff) -> AssertZext(freeze(zextload, i8))
trunc(freeze(load i32), i8) -> freeze(load i8)
sext_inreg(freeze(load), i8) -> AssertSext(freeze(sextload, i8))
```
and many other patterns due to legalizer or upstream IR passes inserting
freeze. This generally has the positive effects of narrowing the load
type.
Add support for specifying the null pointer bit representation per
address space in DataLayout via new pointer spec flags:
- 'z': null pointer is all-zeros
- 'o': null pointer is all-ones
When neither flag is present, the null pointer value is zero.
No target DataLayout strings are updated in this change. This is pure
infrastructure for a future ConstantPointerNull semantic change to
support targets with non-zero null pointers (e.g. AMDGPU).
The fix does the following in expandPartwordCmpXchg and
insertRMWCmpXchgLoop.
- Issues volatile operations in the emulation loops if the original
operation is volatile.
- A preheader load is used for initializing the "cmp" and "new" values
of the cmpxchg in the loop. Makes this load atomic. This is done under a
target hook (`issueAtomicInitLoadForAtomicEmulation()`) , to allow
backends to migrate independently.
- `processAtomicInstr` is called on this load, to massage it into
something that can be lowered in SelectionDAG / GISel.
- This caused 3 kinds of failures.
1. Caused by change to codegen: updated these either using the scripts,
or mechanically (using claude) to match the new codegen.
2. Crashes caused by newly created atomic loads not being processed by
AtomicExpandPass. (The atomic load if tested in an independent test does
not cause a crash). To fix these, added recursive calls to
processAtomicInstr on the newly created atomic loads. These calls
convert the loads to libcalls, or cast them to integer types.
3. Crashes in X86, AMDGPU, and AArch64 caused by unhandled vector types.
These loads crash even with upstream LLVM, due to the lack of support in
these targets for vector atomic loads (the corresponding vector
atomicrmw instructions are supported). Disabled issuing atomic loads for
these backends. Will follow up with individual PRs to revert to default
behavior.