Files
llvm-project/lldb/tools/debugserver/source/JSONGenerator.h
Jason Molenda f098aa3b5b [lldb][Darwin] debugserver expedite new binary info, lldb use (#192754)
When lldb stops at the "new binaries loaded" internal breakpoint, it
must read the list of addresses of the new binaries out of process
memory, then send a jGetLoadedDynamicLibrariesInfos packet to
debugserver to get the filepath, uuid, and addresses of where all the
segments are loaded in memory.

It's possible for debugserver to find the "new binaries loaded" function
address in the inferior itself, recognize when it has stopped at a
breakpoint there, and expedite some/all of the information lldb is going
to ask for in the stop info packet that we send to lldb. This will make
big improvements to a large-batch-of-binaries loaded stop event, but
also focuses even more on the single-binary-loaded `dlopen()` use case,
which can be quite expensive when many binaries are loaded one by one.

This PR reduces the packet traffic for a new binary load notifications
by

1. When debugserver sees a thread that has hit a breakpoint, and the pc
matches the new-binaries-loaded function address, reads the list of
binaries that have been newly added and includes them in the stop info
packet (or the jThreadsInfo packet) in the `added-binaries` key. The
value is a list (array) of binary addresses.

2. If the number of binaries is small (today: one), debugserver may
collect the full information that jGetLoadedDynamicLibrariesInfos would
send back about it, and also expedite that in the stop info packet (or
jThreadsInfo) in the `detailed-binaries-info` key. This is a JSON
string, and the stop info packet is a semicolon separated series of
key-values, so it must be asciihex encoded, just like the `jstopinfo`
key. In the jThreadsInfo packet, the JSON for the binary information is
included in the response as-is as the value-dictionary.

3. If the remote stub doesn't provide these new keys, lldb will use the
same process as before. However, in
DynamicLoaderMacOS::NotifyBreakpointHit I was reading the load addresses
out of memory individually, with each binary having a 24-byte entry.
lldb's memory cache meant we read 512 bytes per 8-byte read, but when
1000 binaries were being loaded at process launch time, that was 24,000
bytes of VM that we would read in 512 byte batches. This patch changes
that to read the entire VM range that we will be accessing in one large
memory read (as large as the remote gdb RSP stub will support),
dramatically reducing packet traffic in that case.

4. debugserver needs to read the "new binaries loaded" function pointer
out of the "dyld_all_image_infos" structure in the inferior, and it is a
signed function pointer on arm64e processes, so debugserver needs to
strip off the signing bits before comparing the pc. I hoisted the strip
function out of DNBArchImplArm64 into DNBFixAddress(), and the only
complicated bit here is in DNBProcessAddrSize(), when an arm64e
debugserver is debugging an arm64_32 process on a watch. It's not a
common combination (mostly we will have arm64e debugservers debugging
arm64 processes, or arm64_32 debugservers debuggging arm64_32 processe),
but it is supported.

5. A very minor enhancement, I have debugserver now include a new key,
`sizeof_mh_and_loadcmds` in the full binary information that
jGetLoadedDynamicLibrariesInfos returns. When lldb needs to read a
binary out of memory, it needs to read the Mach-O header & load
commands, and it doesn't know the full size of that, so we end up doing
one read of the Mach-O header, then the header + load commands. I'm not
using this information in lldb yet, but I would like to, to improve
that.

At an implementation detail level, ProcessGDBRemote collects these two
new data from the stop packet / jThreadsInfo, and passes them to the
method that creates a new ThreadGDBRemote. I added two methods to the
Thread base class to retrieve the information. DynamicLoaderMacOS will
try to read the data from the thread that hit the "new binaries loaded"
breakpoint, and if the number of entries matches the number expected by
the register value, uses them. Else it falls back to fetching them the
traditional way. On an old debugserver that doesn't support these new
expedited fields, DynamicLoaderMacOS will get back a zero-length of
binary addresses and a null StructuredData dictionary for the detailed
image information, and behave as it always does. I tested this patch
with both the debugserver changes, and without.

Testing is clearly the big questionmark here - I added none. While
writing these patches, I had some bugs and the lldb testsuite on macOS
was very good at finding them, simply with our normal process launching
and dlopen'ing in our existing API tests. I could imagine a test that
would capture the packet log and try to ensure that the expedited
information is being used by lldb and we are not re-fetching the
information, though.

rdar://175033129

---------

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Co-authored-by: Felipe de Azevedo Piovezan <piovezan.fpi@gmail.com>
2026-04-24 13:18:40 -07:00

404 lines
10 KiB
C++

//===-- JSONGenerator.h ----------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#ifndef LLDB_TOOLS_DEBUGSERVER_SOURCE_JSONGENERATOR_H
#define LLDB_TOOLS_DEBUGSERVER_SOURCE_JSONGENERATOR_H
#include <iomanip>
#include <sstream>
#include <string>
#include <utility>
#include <vector>
/// \class JSONGenerator JSONGenerator.h
/// A class which can construct structured data for the sole purpose
/// of printing it in JSON format.
///
/// A stripped down version of lldb's StructuredData objects which are much
/// general purpose. This variant is intended only for assembling information
/// and printing it as a JSON string.
class JSONGenerator {
public:
class Object;
class Array;
class Integer;
class Float;
class Boolean;
class String;
class Dictionary;
class Generic;
typedef std::shared_ptr<Object> ObjectSP;
typedef std::shared_ptr<Array> ArraySP;
typedef std::shared_ptr<Integer> IntegerSP;
typedef std::shared_ptr<Float> FloatSP;
typedef std::shared_ptr<Boolean> BooleanSP;
typedef std::shared_ptr<String> StringSP;
typedef std::shared_ptr<Dictionary> DictionarySP;
typedef std::shared_ptr<Generic> GenericSP;
enum class Type {
eTypeInvalid = -1,
eTypeNull = 0,
eTypeGeneric,
eTypeArray,
eTypeInteger,
eTypeFloat,
eTypeBoolean,
eTypeString,
eTypeDictionary
};
class Object : public std::enable_shared_from_this<Object> {
public:
Object(Type t = Type::eTypeInvalid) : m_type(t) {}
virtual ~Object() {}
virtual bool IsValid() const { return true; }
virtual void Clear() { m_type = Type::eTypeInvalid; }
Type GetType() const { return m_type; }
void SetType(Type t) { m_type = t; }
Array *GetAsArray() {
if (m_type == Type::eTypeArray)
return (Array *)this;
return NULL;
}
Dictionary *GetAsDictionary() {
if (m_type == Type::eTypeDictionary)
return (Dictionary *)this;
return NULL;
}
Integer *GetAsInteger() {
if (m_type == Type::eTypeInteger)
return (Integer *)this;
return NULL;
}
Float *GetAsFloat() {
if (m_type == Type::eTypeFloat)
return (Float *)this;
return NULL;
}
Boolean *GetAsBoolean() {
if (m_type == Type::eTypeBoolean)
return (Boolean *)this;
return NULL;
}
String *GetAsString() {
if (m_type == Type::eTypeString)
return (String *)this;
return NULL;
}
Generic *GetAsGeneric() {
if (m_type == Type::eTypeGeneric)
return (Generic *)this;
return NULL;
}
virtual void Dump(std::ostream &s) const = 0;
virtual void DumpBinaryEscaped(std::ostream &s) const = 0;
private:
Type m_type;
};
class Array : public Object {
public:
Array() : Object(Type::eTypeArray) {}
virtual ~Array() {}
void AddItem(ObjectSP item) { m_items.push_back(item); }
void AddIntegerItem(uint64_t value) {
AddItem(ObjectSP(new Integer(value)));
}
void Dump(std::ostream &s) const override {
s << "[";
const size_t arrsize = m_items.size();
for (size_t i = 0; i < arrsize; ++i) {
m_items[i]->Dump(s);
if (i + 1 < arrsize)
s << ",";
}
s << "]";
}
void DumpBinaryEscaped(std::ostream &s) const override {
s << "[";
const size_t arrsize = m_items.size();
for (size_t i = 0; i < arrsize; ++i) {
m_items[i]->DumpBinaryEscaped(s);
if (i + 1 < arrsize)
s << ",";
}
s << "]";
}
protected:
typedef std::vector<ObjectSP> collection;
collection m_items;
};
class Integer : public Object {
public:
Integer(uint64_t value = 0) : Object(Type::eTypeInteger), m_value(value) {}
virtual ~Integer() {}
void SetValue(uint64_t value) { m_value = value; }
uint64_t GetValue() const { return m_value; }
void Dump(std::ostream &s) const override { s << m_value; }
void DumpBinaryEscaped(std::ostream &s) const override { Dump(s); }
protected:
uint64_t m_value;
};
class Float : public Object {
public:
Float(double d = 0.0) : Object(Type::eTypeFloat), m_value(d) {}
virtual ~Float() {}
void SetValue(double value) { m_value = value; }
void Dump(std::ostream &s) const override { s << m_value; }
void DumpBinaryEscaped(std::ostream &s) const override { Dump(s); }
protected:
double m_value;
};
class Boolean : public Object {
public:
Boolean(bool b = false) : Object(Type::eTypeBoolean), m_value(b) {}
virtual ~Boolean() {}
void SetValue(bool value) { m_value = value; }
void Dump(std::ostream &s) const override {
if (m_value)
s << "true";
else
s << "false";
}
void DumpBinaryEscaped(std::ostream &s) const override { Dump(s); }
protected:
bool m_value;
};
class String : public Object {
public:
String() : Object(Type::eTypeString), m_value() {}
String(const std::string &s) : Object(Type::eTypeString), m_value(s) {}
String(const std::string &&s) : Object(Type::eTypeString), m_value(s) {}
void SetValue(const std::string &string) { m_value = string; }
void Dump(std::ostream &s) const override {
s << '"';
const size_t strsize = m_value.size();
for (size_t i = 0; i < strsize; ++i) {
char ch = m_value[i];
if (ch == '"')
s << '\\';
s << ch;
}
s << '"';
}
void DumpBinaryEscaped(std::ostream &s) const override {
s << '"';
const size_t strsize = m_value.size();
for (size_t i = 0; i < strsize; ++i) {
char ch = m_value[i];
if (ch == '"')
s << '\\';
// gdb remote serial protocol binary escaping
if (ch == '#' || ch == '$' || ch == '}' || ch == '*') {
s << '}'; // 0x7d next character is escaped
s << static_cast<char>(ch ^ 0x20);
} else {
s << ch;
}
}
s << '"';
}
protected:
std::string m_value;
};
class Dictionary : public Object {
public:
Dictionary() : Object(Type::eTypeDictionary), m_dict() {}
virtual ~Dictionary() {}
void AddItem(std::string key, ObjectSP value) {
m_dict.push_back(Pair(key, value));
}
void AddIntegerItem(std::string key, uint64_t value) {
AddItem(key, ObjectSP(new Integer(value)));
}
void AddFloatItem(std::string key, double value) {
AddItem(key, ObjectSP(new Float(value)));
}
void AddStringItem(std::string key, std::string value) {
AddItem(key, ObjectSP(new String(std::move(value))));
}
void AddBytesAsHexASCIIString(std::string key, const uint8_t *src,
size_t src_len) {
if (src && src_len) {
std::ostringstream strm;
for (size_t i = 0; i < src_len; i++)
strm << std::setfill('0') << std::hex << std::right << std::setw(2)
<< ((uint32_t)(src[i]));
AddItem(key, ObjectSP(new String(std::move(strm.str()))));
} else {
AddItem(key, ObjectSP(new String()));
}
}
void AddBooleanItem(std::string key, bool value) {
AddItem(key, ObjectSP(new Boolean(value)));
}
ObjectSP GetValueForKey(const std::string &key) const {
for (const auto &kv : m_dict)
if (kv.first == key)
return kv.second;
return {};
}
void Dump(std::ostream &s) const override {
bool have_printed_one_elem = false;
s << "{";
for (collection::const_iterator iter = m_dict.begin();
iter != m_dict.end(); ++iter) {
if (!have_printed_one_elem) {
have_printed_one_elem = true;
} else {
s << ",";
}
s << "\"" << iter->first.c_str() << "\":";
iter->second->Dump(s);
}
s << "}";
}
void DumpBinaryEscaped(std::ostream &s) const override {
bool have_printed_one_elem = false;
s << "{";
for (collection::const_iterator iter = m_dict.begin();
iter != m_dict.end(); ++iter) {
if (!have_printed_one_elem) {
have_printed_one_elem = true;
} else {
s << ",";
}
s << "\"" << binary_encode_string(iter->first) << "\":";
iter->second->DumpBinaryEscaped(s);
}
// '}' must be escaped for the gdb remote serial
// protocol.
s << "}";
s << static_cast<char>('}' ^ 0x20);
}
protected:
std::string binary_encode_string(const std::string &s) const {
std::string output;
const size_t s_size = s.size();
const char *s_chars = s.c_str();
for (size_t i = 0; i < s_size; i++) {
unsigned char ch = *(s_chars + i);
if (ch == '#' || ch == '$' || ch == '}' || ch == '*') {
output.push_back('}'); // 0x7d
output.push_back(ch ^ 0x20);
} else {
output.push_back(ch);
}
}
return output;
}
// Keep the dictionary as a vector so the dictionary doesn't reorder itself
// when you dump it
// We aren't accessing keys by name, so this won't affect performance
typedef std::pair<std::string, ObjectSP> Pair;
typedef std::vector<Pair> collection;
collection m_dict;
};
class Null : public Object {
public:
Null() : Object(Type::eTypeNull) {}
virtual ~Null() {}
bool IsValid() const override { return false; }
void Dump(std::ostream &s) const override { s << "null"; }
void DumpBinaryEscaped(std::ostream &s) const override { Dump(s); }
protected:
};
class Generic : public Object {
public:
explicit Generic(void *object = nullptr)
: Object(Type::eTypeGeneric), m_object(object) {}
void SetValue(void *value) { m_object = value; }
void *GetValue() const { return m_object; }
bool IsValid() const override { return m_object != nullptr; }
void Dump(std::ostream &s) const override;
void DumpBinaryEscaped(std::ostream &s) const override;
private:
void *m_object;
};
}; // class JSONGenerator
#endif // LLDB_TOOLS_DEBUGSERVER_SOURCE_JSONGENERATOR_H