Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed May 28, 2024
1 parent 4cd64cc commit 4b94e62
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 5 deletions.
9 changes: 9 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ ALUs
AmgX
APU
AQL
AXPY
Asynchrony
backtrace
Bitcode
Expand All @@ -23,6 +24,8 @@ EIGEN
EIGEN's
enqueue
enqueues
entrypoint
entrypoints
enum
embeded
extern
Expand All @@ -40,6 +43,7 @@ hipother
HIPRTC
hcBLAS
icc
IILE
inplace
Interoperation
interoperate
Expand Down Expand Up @@ -67,6 +71,8 @@ NDRange
nonnegative
Numa
Nsight
overindex
overindexing
oversubscription
preconditioners
prefetched
Expand All @@ -80,13 +86,16 @@ ROCm's
rocTX
RTC
RTTI
SAXPY
scalarizing
sceneries
shaders
SIMT
SPMV
structs
SYCL
syntaxes
tradeoffs
typedefs
WinGDB
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
2 changes: 1 addition & 1 deletion docs/tutorials/reduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -948,7 +948,7 @@ which all Multi Processors can access and is also on-chip memory.
Processor for longer than necessary.

Without launching a second kernel, have the last block collect the results of
all other blocks from GDS (either implicitly exploiting the sceduling behavior
all other blocks from GDS (either implicitly exploiting the scheduling behavior
or relying on Global Wave Sync, yet another AMD-specific feature) to merge them
for a final tree-like reduction.

Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/saxpy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Heterogenous Programming

Heterogenous programming and offloading APIs are often mentioned together.
Heterogenous programming deals with devices of varying capabilities at once
while the term offloading focuses on the "remote" and asnychronous aspect of
while the term offloading focuses on the "remote" and asynchronous aspect of
the computation. HIP encompasses both: it exposes GPGPU (General Purpose GPU)
programming much like ordinary host-side CPU programming and let's us move data
to and from device as need be.
Expand Down Expand Up @@ -71,7 +71,7 @@ work, then issue:
git clone https://github.com/amd/rocm-examples.git
Inside the repo, you should find ``HIP-Basic\saxpy\main.hip`` which is a
Inside the repository, you should find ``HIP-Basic\saxpy\main.hip``, which is a
sufficiently simple implementation of SAXPY. It was already mentioned
that HIP code will mostly deal with where and when data has to be and
how devices will transform it. The very first HIP calls deal with
Expand Down Expand Up @@ -120,8 +120,8 @@ First let's discuss the signature of the offloaded function:
entrypoint to a device program, such that it can be launched from the host.
- The function does not return anything, because there is no trivial way to
construct a return channel of a parallel invocation. Device-side entrypoints
may not return a value, their results should be communicated using out
params.
may not return a value, their results should be communicated using output
parameters.
- Device-side functions are typically called compute kernels, or just kernels
for short. This is to distinguish them from non-graphics-related graphics
shaders, or just shaders for short.
Expand Down

0 comments on commit 4b94e62

Please sign in to comment.