This patch is limited to single-word replacements to fix spelling and/or grammar to ease the review process. Punctuation and markdown fixes are specifically excluded.
64 lines
2.2 KiB
ReStructuredText
64 lines
2.2 KiB
ReStructuredText
==========
|
|
KernelInfo
|
|
==========
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
Introduction
|
|
============
|
|
|
|
This LLVM IR pass reports various statistics for code compiled for GPUs. The
|
|
goal of these statistics is to help identify bad code patterns and ways to
|
|
mitigate them. The pass operates at the LLVM IR level so that it can, in
|
|
theory, support any LLVM-based compiler for programming languages supporting
|
|
GPUs.
|
|
|
|
By default, the pass runs at the end of LTO, and options like
|
|
``-Rpass=kernel-info`` enable its remarks. Example ``opt`` and ``clang``
|
|
command lines appear in the next section.
|
|
|
|
Remarks include summary statistics (e.g., total size of static allocas) and
|
|
individual occurrences (e.g., source location of each alloca). Examples of the
|
|
output appear in tests in `llvm/test/Analysis/KernelInfo`.
|
|
|
|
Example Command Lines
|
|
=====================
|
|
|
|
To analyze a C program as it appears to an LLVM GPU backend at the end of LTO:
|
|
|
|
.. code-block:: shell
|
|
|
|
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
|
|
-Rpass=kernel-info
|
|
|
|
To analyze specified LLVM IR, perhaps previously generated by something like
|
|
``clang -save-temps -g -fopenmp --offload-arch=native test.c``:
|
|
|
|
.. code-block:: shell
|
|
|
|
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
|
|
-pass-remarks=kernel-info -passes=kernel-info
|
|
|
|
When specifying an LLVM pass pipeline on the command line, ``kernel-info`` still
|
|
runs at the end of LTO by default. ``-no-kernel-info-end-lto`` disables that
|
|
behavior so you can position ``kernel-info`` explicitly:
|
|
|
|
.. code-block:: shell
|
|
|
|
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
|
|
-Rpass=kernel-info \
|
|
-Xoffload-linker --lto-newpm-passes='lto<O2>'
|
|
|
|
$ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
|
|
-Rpass=kernel-info -mllvm -no-kernel-info-end-lto \
|
|
-Xoffload-linker --lto-newpm-passes='module(kernel-info),lto<O2>'
|
|
|
|
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
|
|
-pass-remarks=kernel-info \
|
|
-passes='lto<O2>'
|
|
|
|
$ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
|
|
-pass-remarks=kernel-info -no-kernel-info-end-lto \
|
|
-passes='module(kernel-info),lto<O2>'
|