Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed May 25, 2024
1 parent 14c144d commit 6599c6d
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions docs/how-to/hip_porting_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ and provides practical suggestions on how to port CUDA code and work through com

* Starting the port on a CUDA machine is often the easiest approach, since you can incrementally port pieces of the code to HIP while leaving the rest in CUDA. (Recall that on CUDA machines HIP is just a thin layer over CUDA, so the two code types can interoperate on nvcc platforms.) Also, the HIP port can be compared with the original CUDA code for function and performance.
* Once the CUDA code is ported to HIP and is running on the CUDA machine, compile the HIP code using the HIP compiler on an AMD machine.
* HIP ports can replace CUDA versions: HIP can deliver the same performance as a native CUDA implementation, with the benefit of portability to both Nvidia and AMD architectures as well as a path to future C++ standard support. You can handle platform-specific features through conditional compilation or by adding them to the open-source HIP infrastructure.
* HIP ports can replace CUDA versions: HIP can deliver the same performance as a native CUDA implementation, with the benefit of portability to both NVIDIA and AMD architectures as well as a path to future C++ standard support. You can handle platform-specific features through conditional compilation or by adding them to the open-source HIP infrastructure.
* Use **[hipconvertinplace-perl.sh](https://github.com/ROCm/HIPIFY/blob/amd-staging/bin/hipconvertinplace-perl.sh)** to hipify all code files in the CUDA source directory.

### Scanning existing CUDA code to scope the porting effort
Expand Down Expand Up @@ -380,11 +380,11 @@ __global__ void dot(double *a,double *b,const int n) __attribute__((amdgpu_flat_
## memcpyToSymbol
HIP support for hipMemcpyToSymbol is complete. This feature allows a kernel
HIP support for `hipMemcpyToSymbol` is complete. This feature allows a kernel
to define a device-side data symbol which can be accessed on the host side. The symbol
can be in __constant or device space.
Note that the symbol name needs to be encased in the HIP_SYMBOL macro, as shown in the code example below. This also applies to hipMemcpyFromSymbol, hipGetSymbolAddress, and hipGetSymbolSize.
Note that the symbol name needs to be encased in the HIP_SYMBOL macro, as shown in the code example below. This also applies to `hipMemcpyFromSymbol`, `hipGetSymbolAddress`, and `hipGetSymbolSize`.
For example:
Expand Down Expand Up @@ -436,7 +436,7 @@ int main()
## CU_POINTER_ATTRIBUTE_MEMORY_TYPE
To get pointer's memory type in HIP/HIP-Clang, developers should use hipPointerGetAttributes API. First parameter of the API is hipPointerAttribute_t which has 'type' as member variable. 'type' indicates input pointer is allocated on device or host.
To get pointer's memory type in HIP/HIP-Clang, developers should use `hipPointerGetAttributes` API. First parameter of the API is `hipPointerAttribute_t` which has 'type' as member variable. 'type' indicates input pointer is allocated on device or host.
For example:
Expand All @@ -452,9 +452,9 @@ hipPointerAttribute_t attr;
hipPointerGetAttributes(&attr, ptrHost); /*attr.type will have value as hipMemoryTypeHost*/
```
Please note, hipMemoryType enum values are different from cudaMemoryType enum values.
Please note, `hipMemoryType` enum values are different from cudaMemoryType enum values.
For example, on AMD platform, hipMemoryType is defined in hip_runtime_api.h,
For example, on AMD platform, `hipMemoryType` is defined in `hip_runtime_api.h`,
```cpp
typedef enum hipMemoryType {
Expand All @@ -478,15 +478,15 @@ enum cudaMemoryType
}
```
In this case, memory type translation for hipPointerGetAttributes needs to be handled properly on nvidia platform to get the correct memory type in CUDA, which is done in the file nvidia_hip_runtime_api.h.
In this case, memory type translation for hipPointerGetAttributes needs to be handled properly on NVIDIA platform to get the correct memory type in CUDA, which is done in the file `nvidia_hip_runtime_api.h`.
So in any HIP applications which use HIP APIs involving memory types, developers should use #ifdef in order to assign the correct enum values depending on Nvidia or AMD platform.
So in any HIP applications which use HIP APIs involving memory types, developers should use #ifdef in order to assign the correct enum values depending on NVIDIA or AMD platform.
As an example, please see the code from the [link](https://github.com/ROCm/hip-tests/tree/develop/catch/unit/memory/hipMemcpyParam2D.cc).
With the #ifdef condition, HIP APIs work as expected on both AMD and NVIDIA platforms.
Note, cudaMemoryTypeUnregstered is currently not supported in hipMemoryType enum, due to HIP functionality backward compatibility.
Note, `cudaMemoryTypeUnregstered` is currently not supported in `hipMemoryType` enum, due to HIP functionality backward compatibility.
## threadfence_system
Expand All @@ -496,13 +496,13 @@ HIP/HIP-Clang does not provide this functionality. As a workaround, users can s
### Textures and Cache Control
Compute programs sometimes use textures either to access dedicated texture caches or to use the texture-sampling hardware for interpolation and clamping. The former approach uses simple point samplers with linear interpolation, essentially only reading a single point. The latter approach uses the sampler hardware to interpolate and combine multiple samples. AMD hardware, as well as recent competing hardware, has a unified texture/L1 cache, so it no longer has a dedicated texture cache. But the nvcc path often caches global loads in the L2 cache, and some programs may benefit from explicit control of the L1 cache contents. We recommend the __ldg instruction for this purpose.
Compute programs sometimes use textures either to access dedicated texture caches or to use the texture-sampling hardware for interpolation and clamping. The former approach uses simple point samplers with linear interpolation, essentially only reading a single point. The latter approach uses the sampler hardware to interpolate and combine multiple samples. AMD hardware, as well as recent competing hardware, has a unified texture/L1 cache, so it no longer has a dedicated texture cache. But the nvcc path often caches global loads in the L2 cache, and some programs may benefit from explicit control of the L1 cache contents. We recommend the `__ldg` instruction for this purpose.
AMD compilers currently load all data into both the L1 and L2 caches, so __ldg is treated as a no-op.
AMD compilers currently load all data into both the L1 and L2 caches, so `__ldg` is treated as a no-op.
We recommend the following for functional portability:
* For programs that use textures only to benefit from improved caching, use the __ldg instruction
* For programs that use textures only to benefit from improved caching, use the `__ldg` instruction
* Programs that use texture object and reference APIs, work well on HIP
## More Tips
Expand Down

0 comments on commit 6599c6d

Please sign in to comment.