Skip to content

Commit

Permalink
Add asynchronous execution documentation page
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed Jan 14, 2025
1 parent 4bb7864 commit eb68b2e
Show file tree
Hide file tree
Showing 7 changed files with 822 additions and 5 deletions.
3 changes: 2 additions & 1 deletion .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ APUs
AQL
AXPY
asm
Asynchrony
asynchrony
backtrace
Bitcode
bitcode
Expand Down Expand Up @@ -124,6 +124,7 @@ overindexing
oversubscription
overutilized
parallelizable
parallelized
pixelated
pragmas
preallocated
Expand Down

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/how-to/hip_runtime_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Here are the various HIP Runtime API high level functions:
* :doc:`./hip_runtime_api/initialization`
* :doc:`./hip_runtime_api/memory_management`
* :doc:`./hip_runtime_api/error_handling`
* :doc:`./hip_runtime_api/asynchronous`
* :doc:`./hip_runtime_api/cooperative_groups`
* :doc:`./hip_runtime_api/hipgraph`
* :doc:`./hip_runtime_api/call_stack`
Expand Down
534 changes: 534 additions & 0 deletions docs/how-to/hip_runtime_api/asynchronous.rst

Large diffs are not rendered by default.

8 changes: 6 additions & 2 deletions docs/how-to/performance_guidelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
developers optimize the performance of HIP-capable GPU architectures.
:keywords: AMD, ROCm, HIP, CUDA, performance, guidelines

.. _how_to_performance_guidelines:

*******************************************************************************
Performance guidelines
*******************************************************************************
Expand Down Expand Up @@ -32,12 +34,14 @@ reveal and efficiently provide as much parallelism as possible. The parallelism
can be performed at the application level, device level, and multiprocessor
level.

.. _application_parallel_execution:

Application level
--------------------------------------------------------------------------------

To enable parallel execution of the application across the host and devices, use
asynchronous calls and streams. Assign workloads based on efficiency: serial to
the host or parallel to the devices.
:ref:`asynchronous calls and streams <asynchronous_how-to>`. Assign workloads
based on efficiency: serial to the host or parallel to the devices.

For parallel workloads, when threads belonging to the same block need to
synchronize to share data, use :cpp:func:`__syncthreads()` (see:
Expand Down
5 changes: 3 additions & 2 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,10 @@ subtrees:
- file: how-to/hip_runtime_api/memory_management/virtual_memory
- file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator
- file: how-to/hip_runtime_api/error_handling
- file: how-to/hip_runtime_api/cooperative_groups
- file: how-to/hip_runtime_api/hipgraph
- file: how-to/hip_runtime_api/call_stack
- file: how-to/hip_runtime_api/asynchronous
- file: how-to/hip_runtime_api/hipgraph
- file: how-to/hip_runtime_api/cooperative_groups
- file: how-to/hip_runtime_api/multi_device
- file: how-to/hip_runtime_api/opengl_interop
- file: how-to/hip_runtime_api/external_interop
Expand Down

0 comments on commit eb68b2e

Please sign in to comment.