Add AMDGPU extension #470

pxl-th · 2023-02-03T18:13:14Z

This PR adds AMDGPU extension similar to #445.
Relies on JuliaGPU/AMDGPU.jl#370 and we'd need to tag a new version of AMDGPU.jl for compatibility, so I'll mark it as WIP for now.

For now it has following layers which use MIOpen:

Convolutions.
Mean/max pooling.
Softmax.
Activations.

Things like pooling require passing workspace buffer that was used in forward pass, to the backward pass.
Same for batch normalization, which allows passing computed mini-batch mean & inverse variance to the backward pass to speed-up the computation.
(Although batch norm is not defined in this package).

Therefore I've defined rrules for it as follows.
I decided it is much simpler than introducing some global caching for workspaces.

function rrule(::typeof(maxpool), x) # high-level description
    y, workspace = MIOpen.maxpool(x)
    pullback(dy) = MIOpen.∇maxpool(dy, y, x, workspace)
    y, pullback
end

Question

Do we need to register and maintain NNlibROC.jl package for compatibility with older Julia versions?
Julia 1.8 does not work properly with AMDGPU, and MIOpen artifact is only available for Julia 1.9.
So older versions would need to use system installation of ROCm stack.
Maybe we leave it only as an extension for AMDGPU?
MIOpen supports only cross-correlation as convolution (flipkern(conv_dims) is true).
If a user wants regular convolution, we can either:

detect that at a Flux level and transpose the kernel at creation time
or transpose it right before the call to MIOpen functions

For now I just throw an error.

PR Checklist

Tests are added
Documentation, if applicable
Update AMDGPU compat.

ToucheSir · 2023-02-03T19:46:51Z

I like this workspace model far more than global caching, so it seems reasonable. Do the backwards functions ever overwrite the workspace? If so, defensive copying may be required.

Do we need to register and maintain NNlibROC.jl package for compatibility with older Julia versions?
Julia 1.8 does not work properly with AMDGPU, and MIOpen artifact is only available for Julia 1.9.
So older versions would need to use system installation of ROCm stack.
Maybe we leave it only as an extension for AMDGPU?

If it's ok with you'd I'd like to avoid maintaining a separate package. Anyone who wants AMDGPU support can just use 1.9.

2. MIOpen supports only cross-correlation as convolution (flipkern(conv_dims) is true).
If a user wants regular convolution, we can either:

Is this still the case when considering the row to column major memory layout interop? I would've assumed it works like the CUDNN integration where we don't need to transpose for "normal" convs.

pxl-th · 2023-02-03T21:11:26Z

Do the backwards functions ever overwrite the workspace? If so, defensive copying may be required.

In the case a pullback is called more than once?
I need to check if they modify it.
UPD: no, they do not modify it.

If it's ok with you'd I'd like to avoid maintaining a separate package.

Yeah, I didn't want to maintain it either :)

I would've assumed it works like the CUDNN integration where we don't need to transpose for "normal" convs.

You mean that for CUDNN we specify regular or cross-correlation convolution?
If so, then MIOpen only has cross-correlation for convolution.

ToucheSir · 2023-02-05T21:48:07Z

Good to know. I think we'll have to support transposing eventually since Flux.Conv relies on it, but erroring for now is fine.

I also forgot to mention that normalization can wait for #452. I had NNlibROC (and NNlibCUDA) explicitly in mind while developing that PR.

pxl-th · 2023-02-06T16:40:35Z

Not sure how to proceed with AMDGPU in Project.toml.
Should it be in [extras] section?
Julia 1.6 fails and AMDGPU.jl cannot be installed, because there are no artifacts for it, they are available only for 1.7+.

CarloLucibello · 2023-02-06T18:36:21Z

We can just support 1.7+, or even 1.9+

pxl-th · 2023-02-06T23:14:16Z

I just realized, that MIOpen for Julia 1.7 might not even support certain functions that I'm using. We can go with 1.9 only, since it supports most of the devices (if not all) that 1.7 supports anyway.

Is it ok then for 1.6 CI to fail for now?

ToucheSir · 2023-02-07T00:59:43Z

As-is it seems like we wouldn't be able to install NNlib on 1.6? Is there something we can do to fix that?

pxl-th · 2023-02-07T05:40:22Z

As-is it seems like we wouldn't be able to install NNlib on 1.6? Is there something we can do to fix that?

We can remove AMDGPU from test targets and add it manually when we want to run tests.
Or we can have test/Project.toml if we just want to be able to install on 1.6.

(accidentally closed the PR)

pxl-th · 2023-02-07T07:35:39Z

As-is it seems like we wouldn't be able to install NNlib on 1.6? Is there something we can do to fix that?

We can remove AMDGPU from test targets and add it manually when we want to run tests. Or we can have test/Project.toml if we just want to be able to install on 1.6.

(accidentally closed the PR)

Removed AMDGPU.jl from targets.

When running builkite job for the AMDGPU, we populate 'test' target with AMDGPU before running them.

.buildkite/pipeline.yml

ToucheSir

Got confirmation that tests pass locally and the CI failure may have been a machine hiccup. Let's merge and see if Buildkite has recovered.

Add AMDGPU extension

9c17a13

pxl-th marked this pull request as ready for review February 6, 2023 16:32

pxl-th closed this Feb 7, 2023

pxl-th reopened this Feb 7, 2023

Add compat bounds, buildkite AMDGPU job & fix docs

c68bc4b

When running builkite job for the AMDGPU, we populate 'test' target with AMDGPU before running them.

ToucheSir reviewed Feb 9, 2023

View reviewed changes

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

pxl-th added 2 commits February 10, 2023 00:34

Update buildkite config

d4bd704

Update buildkite config

44f7b3d

ToucheSir approved these changes Feb 13, 2023

View reviewed changes

ToucheSir merged commit 1b24915 into FluxML:master Feb 13, 2023

pxl-th deleted the amdgpu-extension branch February 13, 2023 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AMDGPU extension #470

Add AMDGPU extension #470

pxl-th commented Feb 3, 2023 •

edited

Loading

ToucheSir commented Feb 3, 2023

pxl-th commented Feb 3, 2023 •

edited

Loading

ToucheSir commented Feb 5, 2023

pxl-th commented Feb 6, 2023

CarloLucibello commented Feb 6, 2023

pxl-th commented Feb 6, 2023

ToucheSir commented Feb 7, 2023

pxl-th commented Feb 7, 2023 •

edited

Loading

pxl-th commented Feb 7, 2023

ToucheSir left a comment

Add AMDGPU extension #470

Add AMDGPU extension #470

Conversation

pxl-th commented Feb 3, 2023 • edited Loading

Question

PR Checklist

ToucheSir commented Feb 3, 2023

pxl-th commented Feb 3, 2023 • edited Loading

ToucheSir commented Feb 5, 2023

pxl-th commented Feb 6, 2023

CarloLucibello commented Feb 6, 2023

pxl-th commented Feb 6, 2023

ToucheSir commented Feb 7, 2023

pxl-th commented Feb 7, 2023 • edited Loading

pxl-th commented Feb 7, 2023

ToucheSir left a comment

Choose a reason for hiding this comment

pxl-th commented Feb 3, 2023 •

edited

Loading

pxl-th commented Feb 3, 2023 •

edited

Loading

pxl-th commented Feb 7, 2023 •

edited

Loading