Files
llvm-project/llvm/lib/TableGen/TGParser.h
Henrich Lauko 89d150a797 [TableGen] Add let append/prepend syntax for field concatenation (#182382)
## Motivation

LLVM TableGen currently lacks a way to **accumulate** field values
across class hierarchies. When a derived class sets a field via `let`,
it completely replaces the parent's value. This forces users into
verbose workarounds like:

```tablegen
class Op { // This is generic MLIR Base 
  code extraClassDeclaration = ?;
}

// Some Generic shared base
class MyShared1OpClass : Op {
  code shared1ExtraClassDeclaration = [{ some generic code 1 }];
}

class MyShared2OpClass : MyShared1OpClass {
  code shared2ExtraClassDeclaration = [{ some generic code 2 }];
}

def MyOp : MyShared2OpClass {
  // need to manually concatenate shared code
  let extraClassDeclaration =   
      shared1ExtraClassDeclaration
    # shared2ExtraClassDeclaration
    # [{ additional specialized code }]; 
}
```

Instead I propose a more natural incremental solution without
unnecessery intermediate definitions:

```
class Op {
  code extraClassDeclaration = ?;
}

class MyShared1OpClass : Op {
  let append extraClassDeclaration = [{ some generic code 1 }];
}

class MyShared2OpClass : MyShared1OpClass {
  let append extraClassDeclaration = [{ some generic code 2 }];
}

def MyOp : MyShared2OpClass {
  let append extraClassDeclaration = [{ additional specialized code }]; 
}
```

This is especially painful in MLIR, where dialect authors want base
op/type/attribute classes to inject shared C++ declarations into all
derived definitions. I attempted to solve this in PR
https://github.com/llvm/llvm-project/pull/182265 with MLIR-specific
`inheritableExtraClassDeclaration`/`Definition` fields, but as
@joker-eph [pointed
out](https://github.com/llvm/llvm-project/pull/182265#discussion_r2098718600),
this is ad-hoc -- the same inheritance problem exists for `traits`,
`arguments`, `results`, and any other list/string/dag field. Rather than
adding `inheritable*` variants per field, we should solve this at the
language level.

## Design

This PR adds two new modifiers to the `let` statement: **`append`** and
**`prepend`**.

```tablegen
class Base {
  list<int> items = [1, 2];
  string text = "hello";
  dag d = (op);
}

def Example : Base {
  let append items = [3, 4];    // items = [1, 2, 3, 4]
  let prepend items = [0];      // items = [0, 1, 2]
  let append text = " world";   // text = "hello world"
  let prepend text = "say ";    // text = "say hello"
  let append d = (op 3:$a);     // d = (op 3:$a)
}
```

### Supported types

| Field type | Operation | Concat operator |
|---|---|---|
| `list<T>` | append/prepend | `!listconcat` |
| `string` / `code` | append/prepend | `!strconcat` |
| `dag` | append/prepend | `!con` |
| Other (`bit`, `int`, `bits`) | -- | Error |

### Semantics

- **`let append`** concatenates the new value **after** the current
value
- **`let prepend`** concatenates the new value **before** the current
value
- If the current value is **unset** (`?`), the new value is used
directly
- A plain **`let`** (without modifier) still replaces, allowing opt-out
from accumulated values
- Works in both **body-level** (`def Foo { let append ... }`) and
**top-level** (`let append ... in { }`) contexts

### Multi-level inheritance

Accumulation works naturally across inheritance chains:

```tablegen
class Base {
  list<int> items = [1, 2];
}

class Middle : Base {
  let append items = [3];    // items = [1, 2, 3]
}

def Leaf : Middle {
  let append items = [4];    // items = [1, 2, 3, 4]
}
```

### Multiple inheritance

TableGen supports multiple inheritance (`def D : A, B { ... }`), where
parent classes are processed left to right and the **last parent class's
value wins** for any shared field. `let append`/`let prepend` operates
on whatever value the field has *after* inheritance resolution — it does
not accumulate across sibling parents:

```tablegen
class A { list<int> items = [1, 2]; }
class B { list<int> items = [3, 4]; }

def D : A, B {
  let append items = [5];  // items = [3, 4, 5]  (A's value is lost)
}
```

This also applies to diamond inheritance:

```tablegen
class Base  { list<int> items = [1]; }
class Left  : Base { let append items = [2]; }  // [1, 2]
class Right : Base { let append items = [3]; }  // [1, 3]

def D : Left, Right {
  let append items = [4];  // items = [1, 3, 4]  (Left's [2] is lost)
}
```

This is consistent with how plain `let` works with multiple inheritance
— it is the standard last-writer-wins rule. Users who need accumulation
from multiple parents should use a single-inheritance chain instead.

## Backward compatibility

This proposal is **fully backward compatible**. The keywords `append`
and `prepend` are implemented as **context-sensitive keywords** — they
are only recognized as modifiers when they appear immediately after
`let` (in both body-level and top-level contexts). In all other
positions, `append` and `prepend` remain valid identifiers and can be
used as field names, class names, def names, etc. This means:

- No existing `.td` files (in-tree or out-of-tree) will break
- Fields named `append` or `prepend` continue to work: `let append
append = [5];` is valid (the first `append` is the modifier, the second
is the field name)
- The parser checks for the identifier string value after `let`, not for
a reserved token

RFC:
https://discourse.llvm.org/t/rfc-tablegen-add-let-append-prepend-syntax-for-field-concatenation/89924/
2026-03-09 18:54:08 +01:00

346 lines
13 KiB
C++

//===- TGParser.h - Parser for TableGen Files -------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This class represents the Parser for tablegen files.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_TABLEGEN_TGPARSER_H
#define LLVM_LIB_TABLEGEN_TGPARSER_H
#include "TGLexer.h"
#include "llvm/TableGen/Error.h"
#include "llvm/TableGen/Record.h"
#include <map>
namespace llvm {
class SourceMgr;
class Twine;
struct ForeachLoop;
struct MultiClass;
struct SubClassReference;
struct SubMultiClassReference;
/// Specifies how a 'let' assignment interacts with the existing field value.
/// - Replace: overwrite the field (default behavior).
/// - Append: concatenate the new value after the existing value.
/// - Prepend: concatenate the new value before the existing value.
enum class LetMode { Replace, Append, Prepend };
/// Parsed let mode keyword and field name (e.g. `let append x` yields
/// Mode=Append, Name="x"; plain `let x` yields Mode=Replace, Name="x").
struct LetModeAndName {
LetMode Mode;
SMLoc Loc; // Source location of the field name.
std::string Name; // The field name being assigned.
};
struct LetRecord {
const StringInit *Name;
std::vector<unsigned> Bits;
const Init *Value;
SMLoc Loc;
LetMode Mode;
LetRecord(const StringInit *N, ArrayRef<unsigned> B, const Init *V, SMLoc L,
LetMode M = LetMode::Replace)
: Name(N), Bits(B), Value(V), Loc(L), Mode(M) {}
};
/// RecordsEntry - Holds exactly one of a Record, ForeachLoop, or
/// AssertionInfo.
struct RecordsEntry {
std::unique_ptr<Record> Rec;
std::unique_ptr<ForeachLoop> Loop;
std::unique_ptr<Record::AssertionInfo> Assertion;
std::unique_ptr<Record::DumpInfo> Dump;
void dump() const;
RecordsEntry() = default;
RecordsEntry(std::unique_ptr<Record> Rec);
RecordsEntry(std::unique_ptr<ForeachLoop> Loop);
RecordsEntry(std::unique_ptr<Record::AssertionInfo> Assertion);
RecordsEntry(std::unique_ptr<Record::DumpInfo> Dump);
};
/// ForeachLoop - Record the iteration state associated with a for loop.
/// This is used to instantiate items in the loop body.
///
/// IterVar is allowed to be null, in which case no iteration variable is
/// defined in the loop at all. (This happens when a ForeachLoop is
/// constructed by desugaring an if statement.)
struct ForeachLoop {
SMLoc Loc;
const VarInit *IterVar;
const Init *ListValue;
std::vector<RecordsEntry> Entries;
void dump() const;
ForeachLoop(SMLoc Loc, const VarInit *IVar, const Init *LValue)
: Loc(Loc), IterVar(IVar), ListValue(LValue) {}
};
struct DefsetRecord {
SMLoc Loc;
const RecTy *EltTy = nullptr;
SmallVector<Init *, 16> Elements;
};
struct MultiClass {
Record Rec; // Placeholder for template args and Name.
std::vector<RecordsEntry> Entries;
void dump() const;
MultiClass(StringRef Name, SMLoc Loc, RecordKeeper &Records)
: Rec(Name, Loc, Records, Record::RK_MultiClass) {}
};
class TGVarScope {
public:
enum ScopeKind { SK_Local, SK_Record, SK_ForeachLoop, SK_MultiClass };
private:
ScopeKind Kind;
std::unique_ptr<TGVarScope> Parent;
// A scope to hold variable definitions from defvar.
std::map<std::string, const Init *, std::less<>> Vars;
Record *CurRec = nullptr;
ForeachLoop *CurLoop = nullptr;
MultiClass *CurMultiClass = nullptr;
public:
TGVarScope(std::unique_ptr<TGVarScope> Parent)
: Kind(SK_Local), Parent(std::move(Parent)) {}
TGVarScope(std::unique_ptr<TGVarScope> Parent, Record *Rec)
: Kind(SK_Record), Parent(std::move(Parent)), CurRec(Rec) {}
TGVarScope(std::unique_ptr<TGVarScope> Parent, ForeachLoop *Loop)
: Kind(SK_ForeachLoop), Parent(std::move(Parent)), CurLoop(Loop) {}
TGVarScope(std::unique_ptr<TGVarScope> Parent, MultiClass *Multiclass)
: Kind(SK_MultiClass), Parent(std::move(Parent)),
CurMultiClass(Multiclass) {}
std::unique_ptr<TGVarScope> extractParent() {
// This is expected to be called just before we are destructed, so
// it doesn't much matter what state we leave 'parent' in.
return std::move(Parent);
}
const Init *getVar(RecordKeeper &Records, MultiClass *ParsingMultiClass,
const StringInit *Name, SMRange NameLoc,
bool TrackReferenceLocs) const;
bool varAlreadyDefined(StringRef Name) const {
// When we check whether a variable is already defined, for the purpose of
// reporting an error on redefinition, we don't look up to the parent
// scope, because it's all right to shadow an outer definition with an
// inner one.
return Vars.find(Name) != Vars.end();
}
void addVar(StringRef Name, const Init *I) {
bool Ins = Vars.try_emplace(Name.str(), I).second;
(void)Ins;
assert(Ins && "Local variable already exists");
}
bool isOutermost() const { return Parent == nullptr; }
};
class TGParser {
TGLexer Lex;
std::vector<SmallVector<LetRecord, 4>> LetStack;
std::map<std::string, std::unique_ptr<MultiClass>> MultiClasses;
std::map<std::string, const RecTy *> TypeAliases;
/// Loops - Keep track of any foreach loops we are within.
///
std::vector<std::unique_ptr<ForeachLoop>> Loops;
SmallVector<DefsetRecord *, 2> Defsets;
/// CurMultiClass - If we are parsing a 'multiclass' definition, this is the
/// current value.
MultiClass *CurMultiClass;
/// CurScope - Innermost of the current nested scopes for 'defvar' variables.
std::unique_ptr<TGVarScope> CurScope;
// Record tracker
RecordKeeper &Records;
// A "named boolean" indicating how to parse identifiers. Usually
// identifiers map to some existing object but in special cases
// (e.g. parsing def names) no such object exists yet because we are
// in the middle of creating in. For those situations, allow the
// parser to ignore missing object errors.
enum IDParseMode {
ParseValueMode, // We are parsing a value we expect to look up.
ParseNameMode, // We are parsing a name of an object that does not yet
// exist.
};
bool NoWarnOnUnusedTemplateArgs = false;
bool TrackReferenceLocs = false;
public:
TGParser(SourceMgr &SM, ArrayRef<std::string> Macros, RecordKeeper &records,
const bool NoWarnOnUnusedTemplateArgs = false,
const bool TrackReferenceLocs = false)
: Lex(SM, Macros), CurMultiClass(nullptr), Records(records),
NoWarnOnUnusedTemplateArgs(NoWarnOnUnusedTemplateArgs),
TrackReferenceLocs(TrackReferenceLocs) {}
/// ParseFile - Main entrypoint for parsing a tblgen file. These parser
/// routines return true on error, or false on success.
bool ParseFile();
bool Error(SMLoc L, const Twine &Msg) const {
PrintError(L, Msg);
return true;
}
bool TokError(const Twine &Msg) const { return Error(Lex.getLoc(), Msg); }
const TGLexer::DependenciesSetTy &getDependencies() const {
return Lex.getDependencies();
}
TGVarScope *PushScope() {
CurScope = std::make_unique<TGVarScope>(std::move(CurScope));
// Returns a pointer to the new scope, so that the caller can pass it back
// to PopScope which will check by assertion that the pushes and pops
// match up properly.
return CurScope.get();
}
TGVarScope *PushScope(Record *Rec) {
CurScope = std::make_unique<TGVarScope>(std::move(CurScope), Rec);
return CurScope.get();
}
TGVarScope *PushScope(ForeachLoop *Loop) {
CurScope = std::make_unique<TGVarScope>(std::move(CurScope), Loop);
return CurScope.get();
}
TGVarScope *PushScope(MultiClass *Multiclass) {
CurScope = std::make_unique<TGVarScope>(std::move(CurScope), Multiclass);
return CurScope.get();
}
void PopScope(TGVarScope *ExpectedStackTop) {
assert(ExpectedStackTop == CurScope.get() &&
"Mismatched pushes and pops of local variable scopes");
CurScope = CurScope->extractParent();
}
private: // Semantic analysis methods.
bool AddValue(Record *TheRec, SMLoc Loc, const RecordVal &RV);
/// Set the value of a RecordVal within the given record. If `OverrideDefLoc`
/// is set, the provided location overrides any existing location of the
/// RecordVal. An optional `Mode` specifies append/prepend concatenation.
bool SetValue(Record *TheRec, SMLoc Loc, const Init *ValName,
ArrayRef<unsigned> BitList, const Init *V,
bool AllowSelfAssignment = false, bool OverrideDefLoc = true,
LetMode Mode = LetMode::Replace);
bool AddSubClass(Record *Rec, SubClassReference &SubClass);
bool AddSubClass(RecordsEntry &Entry, SubClassReference &SubClass);
bool AddSubMultiClass(MultiClass *CurMC,
SubMultiClassReference &SubMultiClass);
using SubstStack = SmallVector<std::pair<const Init *, const Init *>, 8>;
bool addEntry(RecordsEntry E);
bool resolve(const ForeachLoop &Loop, SubstStack &Stack, bool Final,
std::vector<RecordsEntry> *Dest, SMLoc *Loc = nullptr);
bool resolve(const std::vector<RecordsEntry> &Source, SubstStack &Substs,
bool Final, std::vector<RecordsEntry> *Dest,
SMLoc *Loc = nullptr);
bool addDefOne(std::unique_ptr<Record> Rec);
using ArgValueHandler = std::function<void(const Init *, const Init *)>;
bool resolveArguments(
const Record *Rec, ArrayRef<const ArgumentInit *> ArgValues, SMLoc Loc,
ArgValueHandler ArgValueHandler = [](const Init *, const Init *) {});
bool resolveArgumentsOfClass(MapResolver &R, const Record *Rec,
ArrayRef<const ArgumentInit *> ArgValues,
SMLoc Loc);
bool resolveArgumentsOfMultiClass(SubstStack &Substs, MultiClass *MC,
ArrayRef<const ArgumentInit *> ArgValues,
const Init *DefmName, SMLoc Loc);
private: // Parser methods.
bool consume(tgtok::TokKind K);
bool ParseObjectList(MultiClass *MC = nullptr);
bool ParseObject(MultiClass *MC);
bool ParseClass();
bool ParseMultiClass();
bool ParseDefm(MultiClass *CurMultiClass);
bool ParseDef(MultiClass *CurMultiClass);
bool ParseDefset();
bool ParseDeftype();
bool ParseDefvar(Record *CurRec = nullptr);
bool ParseDump(MultiClass *CurMultiClass, Record *CurRec = nullptr);
bool ParseForeach(MultiClass *CurMultiClass);
bool ParseIf(MultiClass *CurMultiClass);
bool ParseIfBody(MultiClass *CurMultiClass, StringRef Kind);
bool ParseAssert(MultiClass *CurMultiClass, Record *CurRec = nullptr);
bool ParseTopLevelLet(MultiClass *CurMultiClass);
LetModeAndName ParseLetModeAndName();
void ParseLetList(SmallVectorImpl<LetRecord> &Result);
bool ParseObjectBody(Record *CurRec);
bool ParseBody(Record *CurRec);
bool ParseBodyItem(Record *CurRec);
bool ParseTemplateArgList(Record *CurRec);
const Init *ParseDeclaration(Record *CurRec, bool ParsingTemplateArgs);
const VarInit *ParseForeachDeclaration(const Init *&ForeachListValue);
SubClassReference ParseSubClassReference(Record *CurRec, bool isDefm);
SubMultiClassReference ParseSubMultiClassReference(MultiClass *CurMC);
const Init *ParseIDValue(Record *CurRec, const StringInit *Name,
SMRange NameLoc, IDParseMode Mode = ParseValueMode);
const Init *ParseSimpleValue(Record *CurRec, const RecTy *ItemType = nullptr,
IDParseMode Mode = ParseValueMode);
const Init *ParseValue(Record *CurRec, const RecTy *ItemType = nullptr,
IDParseMode Mode = ParseValueMode);
void ParseValueList(SmallVectorImpl<const Init *> &Result, Record *CurRec,
const RecTy *ItemType = nullptr);
bool ParseTemplateArgValueList(SmallVectorImpl<const ArgumentInit *> &Result,
SmallVectorImpl<SMLoc> &ArgLocs,
Record *CurRec, const Record *ArgsRec);
void ParseDagArgList(
SmallVectorImpl<std::pair<const Init *, const StringInit *>> &Result,
Record *CurRec);
bool ParseOptionalRangeList(SmallVectorImpl<unsigned> &Ranges);
bool ParseOptionalBitList(SmallVectorImpl<unsigned> &Ranges);
const TypedInit *ParseSliceElement(Record *CurRec);
const TypedInit *ParseSliceElements(Record *CurRec, bool Single = false);
void ParseRangeList(SmallVectorImpl<unsigned> &Result);
bool ParseRangePiece(SmallVectorImpl<unsigned> &Ranges,
const TypedInit *FirstItem = nullptr);
const RecTy *ParseType();
const Init *ParseOperation(Record *CurRec, const RecTy *ItemType);
const Init *ParseOperationSubstr(Record *CurRec, const RecTy *ItemType);
const Init *ParseOperationFind(Record *CurRec, const RecTy *ItemType);
const Init *ParseOperationForEachFilter(Record *CurRec,
const RecTy *ItemType);
const Init *ParseOperationCond(Record *CurRec, const RecTy *ItemType);
const RecTy *ParseOperatorType();
const Init *ParseObjectName(MultiClass *CurMultiClass);
const Record *ParseClassID();
MultiClass *ParseMultiClassID();
bool ApplyLetStack(Record *CurRec);
bool ApplyLetStack(RecordsEntry &Entry);
bool CheckTemplateArgValues(SmallVectorImpl<const ArgumentInit *> &Values,
ArrayRef<SMLoc> ValuesLocs,
const Record *ArgsRec);
};
} // end namespace llvm
#endif