Files
llvm-project/llvm/utils/TableGen/DecoderTree.h
Sergei Barannikov 60bdf09654 [TableGen][DecoderEmitter] Rework table construction/emission (#155889)
### Current state

We have FilterChooser class, which can be thought of as a **tree of
encodings**. Tree nodes are instances of FilterChooser itself, and come
in two types:

* A node containing single encoding that has *constant* bits in the
specified bit range, a.k.a. singleton node.
* A node containing only child nodes, where each child represents a set
of encodings that have the same *constant* bits in the specified bit
range.

Either of these nodes can have an additional child, which represents a
set of encodings that have some *unknown* bits in the same bit range.

As can be seen, the **data structure is very high level**.

The encoding tree represented by FilterChooser is then converted into a
finite-state machine (FSM), represented as **byte array**. The
translation is straightforward: for each node of the tree we emit a
sequence of opcodes that check encoding bits and predicates for each
encoding. For a singleton node we also emit a terminal "decode" opcode.

The translation is done in one go, and this has negative consequences:

* We miss optimization opportunities.
* We have to use "fixups" when encoding transitions in the FSM since we
don't know the size of the data we want to jump over in advance. We have
to emit the data first and then fix up the location of the jump. This
means the fixup size has to be large enough to encode the longest jump,
so **most of the transitions are encoded inefficiently**.
* Finally, when converting the FSM into human readable form, we have to
**decode the byte array we've just emitted**. This is also done in one
go, so we **can't do any pretty printing**.

### This PR

We introduce an intermediary data structure, decoder tree, that can be
thought as **AST of the decoder program**.
This data structure is **low level** and as such allows for optimization
and analysis.
It resolves all the issues listed above. We now can:
* Emit more optimal opcode sequences.
* Compute the size of the data to be emitted in advance, avoiding
fixups.
* Do pretty printing.

Serialization is done by a new class, DecoderTableEmitter, which
converts the AST into a FSM in **textual form**, streamed right into the
output file.

### Results
* The new approach immediately resulted in 12% total table size savings
across all in-tree targets, without implementing any optimizations on
the AST. Many tables observe ~20% size reduction.
* The generated file is much more readable.
* The implementation is arguably simpler and more straightforward (the
diff is only +150~200 lines, which feels rather small for the benefits
the change gives).
2025-09-20 01:58:53 +00:00

226 lines
6.7 KiB
C++

//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_UTILS_TABLEGEN_DECODERTREE_H
#define LLVM_UTILS_TABLEGEN_DECODERTREE_H
#include "llvm/ADT/CachedHashString.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
#include <map>
#include <memory>
namespace llvm {
class InstructionEncoding;
using PredicateSet = SetVector<CachedHashString>;
using DecoderSet = SetVector<CachedHashString>;
/// Context shared across decoder trees.
/// Predicates and decoders are shared across decoder trees to provide more
/// opportunities for uniqueness. If SpecializeDecodersPerBitwidth is enabled,
/// decoders are shared across all trees for a given bitwidth, else they are
/// shared across all trees. Predicates are always shared across all trees.
struct DecoderContext {
PredicateSet Predicates;
DecoderSet Decoders;
unsigned getPredicateIndex(StringRef Predicate) {
Predicates.insert(CachedHashString(Predicate));
PredicateSet::const_iterator I = find(Predicates, Predicate);
return std::distance(Predicates.begin(), I);
}
unsigned getDecoderIndex(StringRef Decoder) {
Decoders.insert(CachedHashString(Decoder));
DecoderSet::const_iterator I = find(Decoders, Decoder);
return std::distance(Decoders.begin(), I);
}
};
class DecoderTreeNode {
public:
virtual ~DecoderTreeNode();
enum KindTy {
CheckAny,
CheckAll,
CheckField,
SwitchField,
CheckPredicate,
SoftFail,
Decode,
};
KindTy getKind() const { return Kind; }
protected:
explicit DecoderTreeNode(KindTy Kind) : Kind(Kind) {}
private:
KindTy Kind;
};
/// Common base class for nodes with multiple children.
class CheckManyNode : public DecoderTreeNode {
SmallVector<std::unique_ptr<DecoderTreeNode>, 0> Children;
static const DecoderTreeNode *
mapElement(decltype(Children)::const_reference Element) {
return Element.get();
}
protected:
explicit CheckManyNode(KindTy Kind) : DecoderTreeNode(Kind) {}
public:
void addChild(std::unique_ptr<DecoderTreeNode> Child) {
Children.push_back(std::move(Child));
}
using child_iterator = mapped_iterator<decltype(Children)::const_iterator,
decltype(&mapElement)>;
child_iterator child_begin() const {
return child_iterator(Children.begin(), mapElement);
}
child_iterator child_end() const {
return child_iterator(Children.end(), mapElement);
}
iterator_range<child_iterator> children() const {
return make_range(child_begin(), child_end());
}
};
/// Executes child nodes one by one until one of them succeeds or all fail.
/// The node fails if all child nodes fail. It never succeeds, because if a
/// child node succeeds, it does not return.
class CheckAnyNode : public CheckManyNode {
public:
CheckAnyNode() : CheckManyNode(CheckAny) {}
};
/// Executes child nodes one by one until one of them fails all all succeed.
/// The node fails if any of the child nodes fails.
class CheckAllNode : public CheckManyNode {
public:
CheckAllNode() : CheckManyNode(CheckAll) {}
};
/// Checks the value of encoding bits in the specified range.
class CheckFieldNode : public DecoderTreeNode {
unsigned StartBit;
unsigned NumBits;
uint64_t Value;
public:
CheckFieldNode(unsigned StartBit, unsigned NumBits, uint64_t Value)
: DecoderTreeNode(CheckField), StartBit(StartBit), NumBits(NumBits),
Value(Value) {}
unsigned getStartBit() const { return StartBit; }
unsigned getNumBits() const { return NumBits; }
uint64_t getValue() const { return Value; }
};
/// Switch based on the value of encoding bits in the specified range.
/// If the value of the bits in the range doesn't match any of the cases,
/// the node fails. This is semantically equivalent to CheckAny node where
/// every child is a CheckField node, but is faster.
class SwitchFieldNode : public DecoderTreeNode {
unsigned StartBit;
unsigned NumBits;
std::map<uint64_t, std::unique_ptr<DecoderTreeNode>> Cases;
static std::pair<uint64_t, const DecoderTreeNode *>
mapElement(decltype(Cases)::const_reference Element) {
return std::pair(Element.first, Element.second.get());
}
public:
SwitchFieldNode(unsigned StartBit, unsigned NumBits)
: DecoderTreeNode(SwitchField), StartBit(StartBit), NumBits(NumBits) {}
void addCase(uint64_t Value, std::unique_ptr<DecoderTreeNode> N) {
Cases.try_emplace(Value, std::move(N));
}
unsigned getStartBit() const { return StartBit; }
unsigned getNumBits() const { return NumBits; }
using case_iterator =
mapped_iterator<decltype(Cases)::const_iterator, decltype(&mapElement)>;
case_iterator case_begin() const {
return case_iterator(Cases.begin(), mapElement);
}
case_iterator case_end() const {
return case_iterator(Cases.end(), mapElement);
}
iterator_range<case_iterator> cases() const {
return make_range(case_begin(), case_end());
}
};
/// Checks that the instruction to be decoded has its predicates satisfied.
class CheckPredicateNode : public DecoderTreeNode {
unsigned PredicateIndex;
public:
explicit CheckPredicateNode(unsigned PredicateIndex)
: DecoderTreeNode(CheckPredicate), PredicateIndex(PredicateIndex) {}
unsigned getPredicateIndex() const { return PredicateIndex; }
};
/// Checks if the encoding bits are correct w.r.t. SoftFail semantics.
/// This is the only node that can never fail.
class SoftFailNode : public DecoderTreeNode {
uint64_t PositiveMask, NegativeMask;
public:
SoftFailNode(uint64_t PositiveMask, uint64_t NegativeMask)
: DecoderTreeNode(SoftFail), PositiveMask(PositiveMask),
NegativeMask(NegativeMask) {}
uint64_t getPositiveMask() const { return PositiveMask; }
uint64_t getNegativeMask() const { return NegativeMask; }
};
/// Attempts to decode the specified encoding as the specified instruction.
class DecodeNode : public DecoderTreeNode {
StringRef EncodingName;
unsigned InstOpcode;
unsigned DecoderIndex;
public:
DecodeNode(StringRef EncodingName, unsigned InstOpcode, unsigned DecoderIndex)
: DecoderTreeNode(Decode), EncodingName(EncodingName),
InstOpcode(InstOpcode), DecoderIndex(DecoderIndex) {}
StringRef getEncodingName() const { return EncodingName; }
unsigned getInstOpcode() const { return InstOpcode; }
unsigned getDecoderIndex() const { return DecoderIndex; }
};
} // namespace llvm
#endif // LLVM_UTILS_TABLEGEN_DECODERTREE_H