Summary: We already handle this with other targets, we should be able to unify the handling here.
144 lines
5.4 KiB
ReStructuredText
144 lines
5.4 KiB
ReStructuredText
.. _libc_gpu_testing:
|
|
|
|
|
|
=========================
|
|
Testing the GPU C library
|
|
=========================
|
|
|
|
.. note::
|
|
Running GPU tests with high parallelism is likely to cause spurious failures,
|
|
out of resource errors, or indefinite hangs. limiting the number of threads
|
|
used while testing using ``LIBC_GPU_TEST_JOBS=<N>`` is highly recommended.
|
|
|
|
Testing infrastructure
|
|
======================
|
|
|
|
The LLVM C library supports different kinds of :ref:`tests <build_and_test>`
|
|
depending on the build configuration. The GPU target is considered a full build
|
|
and therefore provides all of its own utilities to build and run the generated
|
|
tests. Currently the GPU supports two kinds of tests.
|
|
|
|
#. **Hermetic tests** - These are unit tests built with a test suite similar to
|
|
Google's ``gtest`` infrastructure. These use the same infrastructure as unit
|
|
tests except that the entire environment is self-hosted. This allows us to
|
|
run them on the GPU using our custom utilities. These are used to test the
|
|
majority of functional implementations.
|
|
|
|
#. **Integration tests** - These are lightweight tests that simply call a
|
|
``main`` function and checks if it returns non-zero. These are primarily used
|
|
to test interfaces that are sensitive to threading.
|
|
|
|
The GPU uses the same testing infrastructure as the other supported ``libc``
|
|
targets. We do this by treating the GPU as a standard hosted environment capable
|
|
of launching a ``main`` function. Effectively, this means building our own
|
|
startup libraries and loader.
|
|
|
|
Testing utilities
|
|
=================
|
|
|
|
We provide two utilities to execute arbitrary programs on the GPU. That is the
|
|
``loader`` and the ``start`` object.
|
|
|
|
Startup object
|
|
--------------
|
|
|
|
This object mimics the standard object used by existing C library
|
|
implementations. Its job is to perform the necessary setup prior to calling the
|
|
``main`` function. In the GPU case, this means exporting GPU kernels that will
|
|
perform the necessary operations. Here we use ``_begin`` and ``_end`` to handle
|
|
calling global constructors and destructors while ``_start`` begins the standard
|
|
execution. The following code block shows the implementation for AMDGPU
|
|
architectures.
|
|
|
|
.. code-block:: c++
|
|
|
|
extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
|
|
_begin(int argc, char **argv, char **env) {
|
|
LIBC_NAMESPACE::atexit(&LIBC_NAMESPACE::call_fini_array_callbacks);
|
|
LIBC_NAMESPACE::call_init_array_callbacks(argc, argv, env);
|
|
}
|
|
|
|
extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
|
|
_start(int argc, char **argv, char **envp, int *ret) {
|
|
__atomic_fetch_or(ret, main(argc, argv, envp), __ATOMIC_RELAXED);
|
|
}
|
|
|
|
extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
|
|
_end(int retval) {
|
|
LIBC_NAMESPACE::exit(retval);
|
|
}
|
|
|
|
Loader runtime
|
|
--------------
|
|
|
|
The startup object provides a GPU executable with callable kernels for the
|
|
respective runtime. We can then define a minimal runtime that will launch these
|
|
kernels on the given device. Currently we provide the ``amdhsa-loader`` and
|
|
``nvptx-loader`` targeting the AMD HSA runtime and CUDA driver runtime
|
|
respectively. By default these will launch with a single thread on the GPU.
|
|
|
|
.. code-block:: sh
|
|
|
|
$> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=native -flto
|
|
$> amdhsa_loader --threads 1 --blocks 1 ./a.out
|
|
Test Passed!
|
|
|
|
The loader utility will forward any arguments passed after the executable image
|
|
to the program on the GPU as well as any set environment variables. The number
|
|
of threads and blocks to be set can be controlled with ``--threads`` and
|
|
``--blocks``. These also accept additional ``x``, ``y``, ``z`` variants for
|
|
multidimensional grids.
|
|
|
|
Running tests
|
|
=============
|
|
|
|
Tests will only be built and run if a GPU target architecture is set and the
|
|
corresponding loader utility was built. These can be overridden with the
|
|
``LIBC_GPU_TEST_ARCHITECTURE`` and ``CMAKE_CROSSCOMPILING_EMULATOR`` :ref:`CMake
|
|
options <gpu_cmake_options>`. Once built, they can be run like any other tests.
|
|
The CMake target depends on how the library was built.
|
|
|
|
#. **Cross build** - If the C library was built using ``LLVM_ENABLE_PROJECTS``
|
|
or a runtimes cross build, then the standard targets will be present in the
|
|
base CMake build directory.
|
|
|
|
#. All tests - You can run all supported tests with the command:
|
|
|
|
.. code-block:: sh
|
|
|
|
$> ninja check-libc
|
|
|
|
#. Hermetic tests - You can run hermetic with tests the command:
|
|
|
|
.. code-block:: sh
|
|
|
|
$> ninja libc-hermetic-tests
|
|
|
|
#. Integration tests - You can run integration tests by the command:
|
|
|
|
.. code-block:: sh
|
|
|
|
$> ninja libc-integration-tests
|
|
|
|
#. **Runtimes build** - If the library was built using ``LLVM_ENABLE_RUNTIMES``
|
|
then the actual ``libc`` build will be in a separate directory.
|
|
|
|
#. All tests - You can run all supported tests with the command:
|
|
|
|
.. code-block:: sh
|
|
|
|
$> ninja check-libc-amdgcn-amd-amdhsa
|
|
$> ninja check-libc-nvptx64-nvidia-cuda
|
|
|
|
#. Specific tests - You can use the same targets as above by entering the
|
|
runtimes build directory.
|
|
|
|
.. code-block:: sh
|
|
|
|
$> ninja -C runtimes/runtimes-amdgcn-amd-amdhsa-bins check-libc
|
|
$> ninja -C runtimes/runtimes-nvptx64-nvidia-cuda-bins check-libc
|
|
$> cd runtimes/runtimes-amdgcn-amd-amdhsa-bins && ninja check-libc
|
|
$> cd runtimes/runtimes-nvptx64-nvidia-cuda-bins && ninja check-libc
|
|
|
|
Tests can also be built and run manually using the respective loader utility.
|