diff --git a/.wordlist.txt b/.wordlist.txt index 839f2192f7..45af247c0d 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -31,6 +31,7 @@ frontends gedit GPGPU hardcoded +HC HIP's hipcc hipexamine diff --git a/docs/how-to/debugging.rst b/docs/how-to/debugging.rst index 73257d3a67..c90f7ec7d8 100644 --- a/docs/how-to/debugging.rst +++ b/docs/how-to/debugging.rst @@ -23,7 +23,7 @@ can use ltrace to visualize the runtime behavior of the entire ROCm software sta Here's a simple command-line example that uses ltrace to trace HIP APIs and output: - .. code-block:: console +.. code-block:: console $ ltrace -C -e "hip*" ./hipGetChanDesc hipGetChanDesc->hipCreateChannelDesc(0x7ffdc4b66860, 32, 0, 0) = 0x7ffdc4b66860 @@ -36,7 +36,7 @@ Here's a simple command-line example that uses ltrace to trace HIP APIs and outp Here's another example that uses ltrace to trace hsa APIs and output: - .. code-block:: console +.. code-block:: console $ ltrace -C -e "hsa*" ./hipGetChanDesc libamdhip64.so.4->hsa_init(0, 0x7fff325a69d0, 0x9c80e0, 0 @@ -99,7 +99,7 @@ For details, see (https://github.com/ROCm/ROCgdb). Below is a sample how to use ROCgdb run and debug HIP application, ROCgdb is installed with ROCM package in the folder /opt/rocm/bin. - .. code-block:: console +.. code-block:: console $ export PATH=$PATH:/opt/rocm/bin $ rocgdb ./hipTexObjPitch @@ -132,7 +132,7 @@ Debugging HIP applications The following Linux example shows how to get useful information from the debugger while running a simple memory copy test, which caused a segmentation fault issue. - .. code-block:: console +.. code-block:: console test: simpleTest2 numElements=4194304 sizeElements=4194304 bytes Segmentation fault (core dumped) @@ -231,13 +231,13 @@ For systems with multiple devices, you can choose to make only certain device(s) ``HIP_VISIBLE_DEVICES`` (or ``CUDA_VISIBLE_DEVICES`` on an NVIDIA platform). Once enabled, HIP can only view devices that have indices present in the sequence. For example: - .. code-block:: console +.. code-block:: console $ HIP_VISIBLE_DEVICES=0,1 Or in the application: - .. code-block:: cpp +.. code-block:: cpp if (totalDeviceNum > 2) { setenv("HIP_VISIBLE_DEVICES", "0,1,2", 1); @@ -375,7 +375,7 @@ General debugging tips * ``gdb --args`` can be used to pass the executable and arguments to ``gdb``. * You can set environment variables (``set env``) from within GDB on Linux: - .. code-block:: bash + .. code-block:: bash (gdb) set env AMD_SERIALIZE_KERNEL 3 diff --git a/docs/how-to/hip_porting_guide.md b/docs/how-to/hip_porting_guide.md index 12432925f2..1a51339b66 100644 --- a/docs/how-to/hip_porting_guide.md +++ b/docs/how-to/hip_porting_guide.md @@ -38,7 +38,7 @@ info: TOTAL-converted 89 CUDA->HIP refs( dev:3 mem:32 kern:2 builtin:37 math:0 s hipexamine-perl scans each code file (cpp, c, h, hpp, etc.) found in the specified directory: -* Files with no CUDA code (ie `kmeans.h`) print one line summary just listing the source file name. +* Files with no CUDA code (`kmeans.h`) print one line summary just listing the source file name. * Files with CUDA code print a summary of what was found - for example the `kmeans_cuda_kernel.cu` file: ```shell @@ -318,14 +318,14 @@ Applications with these interfaces should use the default libstdc++ linking. Applications which are compiled entirely with hipcc, and which benefit from advanced C++ features not supported in libstdc++, and which do not require portability to NVCC, may choose to use libc++. -### HIP Headers (hip_runtime.h, hip_runtime_api.h) +### HIP Headers (`hip_runtime.h`, `hip_runtime_api.h`) -The hip_runtime.h and hip_runtime_api.h files define the types, functions and enumerations needed to compile a HIP program: +The `hip_runtime.h` and `hip_runtime_api.h` files define the types, functions and enumerations needed to compile a HIP program: -* `hip_runtime_api.h`: defines all the HIP runtime APIs (e.g., `hipMalloc`) and the types required to call them. A source file that is only calling HIP APIs but neither defines nor launches any kernels can include `hip_runtime_api.h`. hip_runtime_api.h uses no custom hc language features and can be compiled using a standard C++ compiler. +* `hip_runtime_api.h`: defines all the HIP runtime APIs (e.g., `hipMalloc`) and the types required to call them. A source file that is only calling HIP APIs but neither defines nor launches any kernels can include `hip_runtime_api.h`. `hip_runtime_api.h` uses no custom Heterogeneous Compute (HC) language features and can be compiled using a standard C++ compiler. * `hip_runtime.h`: included in `hip_runtime_api.h`. It additionally provides the types and defines required to create and launch kernels. hip_runtime.h can be compiled using a standard C++ compiler but will expose a subset of the available functions. -CUDA has slightly different contents for these two files. In some cases you may need to convert hipified code to include the richer hip_runtime.h instead of hip_runtime_api.h. +CUDA has slightly different contents for these two files. In some cases you may need to convert hipified code to include the richer `hip_runtime.h` instead of `hip_runtime_api.h`. ### Using a Standard C++ Compiler diff --git a/docs/install/build.rst b/docs/install/build.rst index 73ba893dbd..a4785e9230 100644 --- a/docs/install/build.rst +++ b/docs/install/build.rst @@ -11,13 +11,13 @@ Before building and running HIP, make sure drivers and prebuilt packages are ins You also need to install Python 3, which includes the ``CppHeaderParser`` package. Install Python 3 using the following command: - .. code-block:: shell +.. code-block:: shell apt-get install python3 Check and install ``CppHeaderParser`` package using the command: - .. code-block:: shell +.. code-block:: shell pip3 install CppHeaderParser @@ -29,7 +29,7 @@ Building the HIP runtime Set the repository branch using the variable: ``ROCM_BRANCH``. For example, for ROCm 6.1, use: - .. code-block:: shell +.. code-block:: shell export ROCM_BRANCH=rocm-6.1.x @@ -49,7 +49,7 @@ Set the repository branch using the variable: ``ROCM_BRANCH``. For example, for ``hipother`` provides files required to support the HIP back-end implementation on some non-AMD platforms, like NVIDIA. - .. code-block:: shell + .. code-block:: shell git clone -b "$ROCM_BRANCH" https://github.com/ROCm/clr.git git clone -b "$ROCM_BRANCH" https://github.com/ROCm/hip.git @@ -66,7 +66,7 @@ Set the repository branch using the variable: ``ROCM_BRANCH``. For example, for #. Set the environment variables. - .. code-block:: shell + .. code-block:: shell export CLR_DIR="$(readlink -f clr)" export HIP_DIR="$(readlink -f hip)" @@ -74,7 +74,7 @@ Set the repository branch using the variable: ``ROCM_BRANCH``. For example, for #. Build HIP. - .. code-block:: shell + .. code-block:: shell cd "$CLR_DIR" mkdir -p build; cd build @@ -116,7 +116,7 @@ Set the repository branch using the variable: ``ROCM_BRANCH``. For example, for Usage: - .. code-block:: shell + .. code-block:: shell `hip_prof_gen.py [-v] []` @@ -131,7 +131,7 @@ Set the repository branch using the variable: ``ROCM_BRANCH``. For example, for Example usage: - .. code-block:: shell + .. code-block:: shell hip_prof_gen.py -v -p -t --priv /include/hip/hip_runtime_api.h \ /src /include/hip/amd_detail/hip_prof_str.h \ @@ -142,7 +142,7 @@ Set the repository branch using the variable: ``ROCM_BRANCH``. For example, for #. Get the HIP source code. - .. code-block:: shell + .. code-block:: shell git clone -b "$ROCM_BRANCH" https://github.com/ROCm/clr.git git clone -b "$ROCM_BRANCH" https://github.com/ROCm/hip.git @@ -150,7 +150,7 @@ Set the repository branch using the variable: ``ROCM_BRANCH``. For example, for #. Set the environment variables. - .. code-block:: shell + .. code-block:: shell export CLR_DIR="$(readlink -f clr)" export HIP_DIR="$(readlink -f hip)" @@ -158,7 +158,7 @@ Set the repository branch using the variable: ``ROCM_BRANCH``. For example, for #. Build HIP. - .. code-block:: shell + .. code-block:: shell cd "$CLR_DIR" mkdir -p build; cd build @@ -180,13 +180,13 @@ Build HIP tests * Get HIP tests source code. - .. code-block:: shell + .. code-block:: shell git clone -b "$ROCM_BRANCH" https://github.com/ROCm/hip-tests.git * Build HIP tests from source. - .. code-block:: shell + .. code-block:: shell export HIPTESTS_DIR="$(readlink -f hip-tests)" cd "$HIPTESTS_DIR" @@ -199,14 +199,14 @@ Build HIP tests To run any single catch test, use this example: - .. code-block:: shell + .. code-block:: shell cd $HIPTESTS_DIR/build/catch_tests/unit/texture ./TextureTest * Build a HIP Catch2 standalone test. - .. code-block:: shell + .. code-block:: shell cd "$HIPTESTS_DIR" hipcc $HIPTESTS_DIR/catch/unit/memory/hipPointerGetAttributes.cc \ diff --git a/docs/install/install.rst b/docs/install/install.rst index 9aa60bb36e..d88ba6596c 100644 --- a/docs/install/install.rst +++ b/docs/install/install.rst @@ -48,7 +48,7 @@ Installation #. Install the NVIDIA driver. - .. code-block:: shell + .. code-block:: shell sudo apt-get install ubuntu-drivers-common && sudo ubuntu-drivers autoinstall sudo reboot @@ -59,7 +59,7 @@ Installation #. Install the ``hip-runtime-nvidia`` and ``hip-dev`` packages. This installs the CUDA SDK and HIP porting layer. - .. code-block:: shell + .. code-block:: shell apt-get install hip-runtime-nvidia hip-dev @@ -74,6 +74,6 @@ Verify your installation Run ``hipconfig`` in your installation path. - .. code-block:: shell +.. code-block:: shell /opt/rocm/bin/hipconfig --full diff --git a/docs/reference/kernel_language.rst b/docs/reference/kernel_language.rst index faef2aee2e..0a73b147e8 100644 --- a/docs/reference/kernel_language.rst +++ b/docs/reference/kernel_language.rst @@ -102,7 +102,7 @@ When using ``hipLaunchKernelGGL``, your first five parameters must be: You can include your kernel arguments after these parameters. - .. code-block:: cpp +.. code-block:: cpp // Example hipLaunchKernelGGL pseudocode: __global__ MyKernel(hipLaunchParm lp, float *A, float *B, float *C, size_t N) @@ -128,7 +128,7 @@ parameters. Kernel launch example ========================================================== - .. code-block:: cpp +.. code-block:: cpp // Example showing device function, __device__ __host__ // <- compile for both device and host @@ -279,7 +279,7 @@ dimensions. The dim3 constructor accepts between zero and three arguments. By default, it initializes unspecified dimensions to 1. - .. code-block:: cpp +.. code-block:: cpp typedef struct dim3 { uint32_t x; @@ -1450,7 +1450,7 @@ To read a high-resolution timer from the device, HIP provides the following buil * Returning the incremental counter value for every clock cycle on a device: - .. code-block:: cpp + .. code-block:: cpp clock_t clock() long long int clock64() @@ -1459,14 +1459,14 @@ To read a high-resolution timer from the device, HIP provides the following buil * Returning the wall clock count at a constant frequency on the device: - .. code-block:: cpp + .. code-block:: cpp long long int wall_clock64() This can be queried using the HIP API with the ``hipDeviceAttributeWallClockRate`` attribute of the device in HIP application code. For example: - .. code-block:: cpp + .. code-block:: cpp int wallClkRate = 0; //in kilohertz HIPCHECK(hipDeviceGetAttribute(&wallClkRate, hipDeviceAttributeWallClockRate, deviceId)); @@ -1809,7 +1809,7 @@ portable code to query the warp size. To get the default warp size of a GPU device, use ``hipGetDeviceProperties`` in you host functions. - .. code-block:: cpp +.. code-block:: cpp cudaDeviceProp props; cudaGetDeviceProperties(&props, deviceID); @@ -1835,7 +1835,7 @@ the correct type for the mask. Warp vote and ballot functions ------------------------------------------------------------------------------------------------------------- - .. code-block:: cpp +.. code-block:: cpp int __all(int predicate) int __any(int predicate) @@ -1883,7 +1883,7 @@ undefined. Warp match functions ------------------------------------------------------------------------------------------------------------- - .. code-block:: cpp +.. code-block:: cpp unsigned long long __match_any(T value) unsigned long long __match_all(T value, int *pred) @@ -1915,7 +1915,7 @@ Warp shuffle functions The default width is ``warpSize`` (see :ref:`warp-cross-lane`). Half-float shuffles are not supported. - .. code-block:: cpp +.. code-block:: cpp int __shfl (T var, int srcLane, int width=warpSize); int __shfl_up (T var, unsigned int delta, int width=warpSize); @@ -2103,7 +2103,8 @@ Assert The assert function is supported in HIP. Assert function is used for debugging purpose, when the input expression equals to zero, the execution will be stopped. - .. code-block:: cpp + +.. code-block:: cpp void assert(int input) @@ -2127,7 +2128,7 @@ This function produces a similar effect of using ``asm("trap")`` in the CUDA cod ``printf`` function is supported in HIP. The following is a simple example to print information in the kernel. - .. code-block:: cpp +.. code-block:: cpp #include @@ -2151,7 +2152,7 @@ GPU multiprocessors have a fixed pool of resources (primarily registers and shar ``__launch_bounds__`` allows the application to provide usage hints that influence the resources (primarily registers) used by the generated code. It is a function attribute that must be attached to a __global__ function: - .. code-block:: cpp +.. code-block:: cpp __global__ void __launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_WARPS_PER_EXECUTION_UNIT) MyKernel(hipGridLaunch lp, ...) @@ -2189,13 +2190,13 @@ Porting from CUDA ``__launch_bounds`` CUDA defines a ``__launch_bounds`` which is also designed to control occupancy: - .. code-block:: cpp +.. code-block:: cpp __launch_bounds(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_MULTIPROCESSOR) - The second parameter ``__launch_bounds`` parameters must be converted to the format used __hip_launch_bounds, which uses warps and execution-units rather than blocks and multi-processors (this conversion is performed automatically by HIPIFY tools). - .. code-block:: cpp +.. code-block:: cpp MIN_WARPS_PER_EXECUTION_UNIT = (MIN_BLOCKS_PER_MULTIPROCESSOR * MAX_THREADS_PER_BLOCK) / 32 @@ -2252,17 +2253,17 @@ Pragma Unroll Unroll with a bounds that is known at compile-time is supported. For example: - .. code-block:: cpp +.. code-block:: cpp #pragma unroll 16 /* hint to compiler to unroll next loop by 16 */ for (int i=0; i<16; i++) ... - .. code-block:: cpp +.. code-block:: cpp #pragma unroll 1 /* tell compiler to never unroll the loop */ for (int i=0; i<16; i++) ... - .. code-block:: cpp +.. code-block:: cpp #pragma unroll /* hint to compiler to completely unroll next loop. */ for (int i=0; i<16; i++) ... @@ -2272,7 +2273,7 @@ In-Line Assembly GCN ISA In-line assembly, is supported. For example: - .. code-block:: cpp +.. code-block:: cpp asm volatile ("v_mac_f32_e32 %0, %2, %3" : "=v" (out[i]) : "0"(out[i]), "v" (a), "v" (in[i])); @@ -2296,7 +2297,7 @@ Kernel Compilation hipcc now supports compiling C++/HIP kernels to binary code objects. The file format for binary is ``.co`` which means Code Object. The following command builds the code object using ``hipcc``. - .. code-block:: bash +.. code-block:: bash hipcc --genco --offload-arch=[TARGET GPU] [INPUT FILE] -o [OUTPUT FILE] diff --git a/docs/understand/programming_model.rst b/docs/understand/programming_model.rst index e0fbc2056d..d3c09b9676 100644 --- a/docs/understand/programming_model.rst +++ b/docs/understand/programming_model.rst @@ -84,7 +84,7 @@ identical instructions over the available SIMD engines. Consider the following kernel: - .. code-block:: cpp +.. code-block:: cpp __global__ void k(float4* a, const float4* b) {