Files
Leandro Lacerda 34028294e4 [Offload] Add support for measuring elapsed time between events (#186856)
This patch adds `olGetEventElapsedTime` to the new LLVM Offload API, as
requested in
[#185728](https://github.com/llvm/llvm-project/issues/185728), and adds
the corresponding support in `plugins-nextgen`.

A main motivation for this change is to make it possible to measure the
elapsed time of work submitted to a queue, especially kernel launches.
This is relevant to the intended use of the new Offload API for
microbenchmarking GPU libc math functions.

### Summary

The new API returns the elapsed time, in milliseconds, between two
events on the same device.

To support the common pattern `create start event → enqueue kernel →
create end event → sync end event → get elapsed time`, `olCreateEvent`
now always creates and records a backend event through the device
interface. For backends that materialize real event state, this gives
the event concrete backend state that can be used for elapsed-time
measurement. For backends that do not materialize backend event state,
`EventInfo` may still remain null and existing event operations continue
to treat such events as trivially complete.

Previously, an event created on an empty queue could be represented only
as a logical event. That representation was sufficient for sync and
completion queries, but it was not suitable for elapsed-time measurement
because there was no backend event state to timestamp. The new behavior
preserves the meaning of completion of prior work while also allowing
backends with timing support to attach real event state.

### Changes in `plugins-nextgen`

#### Common interface

Add elapsed-time support to the common device and plugin interfaces:

* `GenericPluginTy::get_event_elapsed_time`
* `GenericDeviceTy::getEventElapsedTime`
* `GenericDeviceTy::getEventElapsedTimeImpl`

#### AMDGPU

* Add the required ROCr declarations and wrappers.
* Enable queue profiling at queue creation time.
* Record events by enqueuing a real barrier marker packet on the stream.
* Retain the timing signal needed to query the recorded marker later.
* Implement `getEventElapsedTimeImpl` using
`hsa_amd_profiling_get_dispatch_time`, converting the result to
milliseconds with `HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY`.

This follows the ROCm/HIP approach of enabling queue profiling at HSA
queue creation time, while keeping the AMDGPU queue path simpler than
the lazy-enable alternative discussed during review.

#### CUDA

* Add the required CUDA driver declarations and wrappers.
* Implement `getEventElapsedTimeImpl` with `cuEventElapsedTime`.

#### Host

* Add `getEventElapsedTimeImpl` that stores `0.0f` in the output
pointer, when present, and returns success.

Reason: the host plugin does not materialize backend event state and
already treats event operations as trivially successful. Returning
`0.0f` preserves that model without introducing a new failure mode.

#### Level Zero

* Add `getEventElapsedTimeImpl`, but leave it unimplemented.

Reason: the Level Zero plugin currently does not provide standalone
backend event support for this event model. For example, `waitEventImpl`
/ `syncEventImpl` are still unimplemented there.

---------

Signed-off-by: Leandro Augusto Lacerda Campos <leandrolcampos@yahoo.com.br>
Signed-off-by: Leandro A. Lacerda Campos <leandrolcampos@yahoo.com.br>
2026-04-01 14:13:44 -05:00

103 lines
3.9 KiB
TableGen

//===-- Event.td - Event definitions for Offload -----------*- tablegen -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file contains Offload API definitions related to the event handle
//
//===----------------------------------------------------------------------===//
def olCreateEvent : Function {
let desc = "Enqueue an event to `Queue` and return it.";
let details = [
"This event can be used with `olSyncEvent`, `olWaitEvents`, and `olGetEventElapsedTime`.",
"It will be complete once all enqueued work prior to the `olCreateEvent` call is complete.",
];
let params = [
Param<"ol_queue_handle_t", "Queue", "queue to create the event for", PARAM_IN>,
Param<"ol_event_handle_t*", "Event", "output pointer for the created event", PARAM_OUT>
];
let returns = [];
}
def olDestroyEvent : Function {
let desc = "Destroy the event and free all underlying resources.";
let details = [];
let params = [
Param<"ol_event_handle_t", "Event", "handle of the event", PARAM_IN>
];
let returns = [];
}
def olSyncEvent : Function {
let desc = "Block the calling thread until the event is complete.";
let details = [];
let params = [
Param<"ol_event_handle_t", "Event", "handle of the event", PARAM_IN>
];
let returns = [];
}
def olGetEventElapsedTime : Function {
let desc = "Get the elapsed time in milliseconds between two events.";
let details = [
"The elapsed time is returned in milliseconds.",
"The queues associated with `StartEvent` and `EndEvent` must belong to the same device."
];
let params = [
Param<"ol_event_handle_t", "StartEvent", "handle of the start event", PARAM_IN>,
Param<"ol_event_handle_t", "EndEvent", "handle of the end event", PARAM_IN>,
Param<"float*", "ElapsedTime", "output pointer for the elapsed time in milliseconds", PARAM_OUT>
];
let returns = [];
}
def ol_event_info_t : Enum {
let desc = "Supported event info.";
let is_typed = 1;
let etors = [
TaggedEtor<"QUEUE", "ol_queue_handle_t", "The handle of the queue associated with the device.">,
TaggedEtor<"IS_COMPLETE", "bool", "True if and only if the event is complete.">,
];
}
def olGetEventInfo : Function {
let desc = "Queries the given property of the event.";
let details = [
"`olGetEventInfoSize` can be used to query the storage size "
"required for the given query."
];
let params = [
Param<"ol_event_handle_t", "Event", "handle of the event", PARAM_IN>,
Param<"ol_event_info_t", "PropName", "type of the info to retrieve", PARAM_IN>,
Param<"size_t", "PropSize", "the number of bytes pointed to by PropValue.", PARAM_IN>,
TypeTaggedParam<"void*", "PropValue", "array of bytes holding the info. "
"If PropSize is not equal to or greater to the real number of bytes needed to return the info "
"then the OL_ERRC_INVALID_SIZE error is returned and PropValue is not used.", PARAM_OUT,
TypeInfo<"PropName" , "PropSize">>
];
let returns = [
Return<"OL_ERRC_INVALID_SIZE", [
"`PropSize == 0`",
"If `PropSize` is less than the real number of bytes needed to return the info."
]>,
Return<"OL_ERRC_INVALID_EVENT">
];
}
def olGetEventInfoSize : Function {
let desc = "Returns the storage size of the given event query.";
let details = [];
let params = [
Param<"ol_event_handle_t", "Event", "handle of the event", PARAM_IN>,
Param<"ol_event_info_t", "PropName", "type of the info to query", PARAM_IN>,
Param<"size_t*", "PropSizeRet", "pointer to the number of bytes required to store the query", PARAM_OUT>
];
let returns = [
Return<"OL_ERRC_INVALID_EVENT">
];
}