From a1955678d5ca45f385d47862d41bdeb853bd6b12 Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Sat, 25 May 2024 17:09:34 +0200 Subject: [PATCH] WIP --- .wordlist.txt | 1 + docs/how-to/debugging.rst | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/.wordlist.txt b/.wordlist.txt index 1238e5dfa0..117a7ce1db 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -37,6 +37,7 @@ libstdc linearizing LOC LUID +ltrace Malloc malloc multicore diff --git a/docs/how-to/debugging.rst b/docs/how-to/debugging.rst index 8635b290f1..3b08dc9dc3 100644 --- a/docs/how-to/debugging.rst +++ b/docs/how-to/debugging.rst @@ -94,10 +94,10 @@ Debugging You can use ROCgdb for debugging and profiling. ROCgdb is the ROCm source-level debugger for Linux and is based on GNU Project debugger (GDB). -the GNU source-level debugger, equivalent of cuda-gdb, can be used with debugger frontends, such as eclipse, vscode, or gdb-dashboard. +the GNU source-level debugger, equivalent of cuda-gdb, can be used with debugger frontends, such as Eclipse, Visual Studio Code, or GDB dashboard. For details, see (https://github.com/ROCm/ROCgdb). -Below is a sample how to use ROCgdb run and debug HIP application, rocgdb is installed with ROCM package in the folder /opt/rocm/bin. +Below is a sample how to use ROCgdb run and debug HIP application, ROCgdb is installed with ROCM package in the folder /opt/rocm/bin. .. code-block:: console @@ -379,7 +379,7 @@ General debugging tips This ``gdb`` command does not use an equal (=) sign. * The GDB backtrace shows a path in the runtime. This is because a fault is caught by the runtime, but it is generated by an asynchronous command running on the GPU. -* To determine the true location of a fault, you can force the kernels to run synchronously by setting the environment variables ``AMD_SERIALIZE_KERNEL=3`` and ``AMD_SERIALIZE_COPY=3``. This forces HIP runtime to wait for the kernel to finish running before retuning. If the fault occurs when a kernel is running, you can see the code that launched the kernel inside the backtrace. The thread that's causing the issue is typically the one inside ``libhsa-runtime64.so``. +* To determine the true location of a fault, you can force the kernels to run synchronously by setting the environment variables ``AMD_SERIALIZE_KERNEL=3`` and ``AMD_SERIALIZE_COPY=3``. This forces HIP runtime to wait for the kernel to finish running before returning. If the fault occurs when a kernel is running, you can see the code that launched the kernel inside the backtrace. The thread that's causing the issue is typically the one inside ``libhsa-runtime64.so``. * VM faults inside kernels can be caused by: * Incorrect code (e.g., a for loop that extends past array boundaries)