Add some initial code. #1

chriselrod · 2021-06-17T15:53:42Z

This is a WIP. Just wanted to have something up in case anyone wanted to look.

mcabbott · 2021-08-08T18:47:58Z

How is it envisioned that this interacts with NNlib?

One version would look like this, but perhaps you have something else in mind:

NNlib defines a new function dense(f, W, X, b) and the gradient rules for this. It could just call * but can take a few steps through dense! etc. to provide places to hook on. Likewise conv_bias_act.
Flux calls those functions, instead of writing things like σ.(... .+ b) directly.
CUDA obviously handles mul! and broadcasting, NNlibCUDA overloads only in-place functions conv!(y::CuArray, ...) etc. Flux's Conv cannot be used with CUDA without this.
NNlibCPU wants to hook in slightly earlier, to fuse dense! instead. There should be a dense∇weight! for it, too?

chriselrod · 2021-08-09T02:19:20Z

@DhairyaLGandhi

My plan here was to commit type piracy and add methods to existing functions.

However, I think adding dense(f, W, X, b) (and the associated ChainRules) would be a good idea.

NNlibCPU wants to hook in slightly earlier, to fuse dense! instead. There should be a dense∇weight! for it, too?

What do you mean, in particular regarding fusion with conv?

mcabbott · 2021-08-09T03:04:07Z

Maybe clearer with code, I pushed the dense_bias_act(f, w, x, b) I was fiddling with, which is more or less what dense(f, W, X, b) could piratically extend.

In how that works now, it only defines a gradient for bias_act!, which gets Wx and does σ.(Wx .+ b), as that's where the fusion could save memory. But I think you want to do one more layer of fusion here, and fold the matmul into the same function. In which case perhaps it needs to change to match, rather than relying on the existing gradient for *.

I see you fuse out[m,n] = f(outₘₙ + b[m]) for the forward pass, what thoughts on the gradient of f? In my PR it gets computed once then used for Δx, Δw, Δb. Computing this is the copy which I thought you could avoid, by over-writing the forward pass, until I remembered that jacobian is going to do several backward passes off one forward.

Convolutional stuff is more WIP. There is a fused conv_bias_act! but only for relu & identity, so it needs to sometimes call bias_act! in addition. Do you plan a fused convlayer!(f, out, ...) here too?

chriselrod · 2021-08-09T03:09:54Z

In how that works now, it only defines a gradient for bias_act!, which gets Wx and does σ.(Wx .+ b), as that's where the fusion could save memory. But I think you want to do one more layer of fusion here, and fold the matmul into the same function. In which case perhaps it needs to change to match, rather than relying on the existing gradient for *.

I agree.
You noted on the GPU that tanh is as fast as identity.
Maybe tanh_faster isn't there yet on the CPU, but relu almost certainly is, so I think it's worth fusing.
Especially if multithreading.

I think for data locality, a simple pattern would be to have all NNlibCPU functions by default only thread across the batch, and break up data into batches in the same way. The idea being that this lets us minimize movement of data between cores.

Becomes some applications may have smaller batch sizes than the number of available cores, we probably want to add an option to instead break up a problem among threads in another way. This probably won't be able to yield as good locality.

mcabbott · 2021-08-09T03:53:18Z

Sounds like we're on the same page, good.

Re tanh, even if in the forward pass the computation dominates, the reverse is much cheaper. You'd have to do it in each of 3 different operations, probably? It would I think save you one more allocation, too.

chriselrod added 3 commits June 17, 2021 11:52

Add some initial code.

c4e0754

Started adding pooling layers

9ac94c5

3dconv

b5d6daf

chriselrod mentioned this pull request Aug 8, 2021

Add faster tanh implementation(s) FluxML/NNlib.jl#345

Merged

mcabbott mentioned this pull request Aug 9, 2021

Add gradients for conv_bias_act, and a similar dense_bias_act FluxML/NNlib.jl#346

Closed

ToucheSir mentioned this pull request Oct 2, 2022

Faster batch size = 1 inference FluxML/ONNX.jl#77

Open

mcabbott mentioned this pull request Jan 2, 2023

Replace the fallback mul! in batchedmul with Octavian.matmul!? FluxML/NNlib.jl#446

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some initial code. #1

Add some initial code. #1

chriselrod commented Jun 17, 2021

mcabbott commented Aug 8, 2021

chriselrod commented Aug 9, 2021

mcabbott commented Aug 9, 2021

chriselrod commented Aug 9, 2021

mcabbott commented Aug 9, 2021

Add some initial code. #1

Are you sure you want to change the base?

Add some initial code. #1

Conversation

chriselrod commented Jun 17, 2021

mcabbott commented Aug 8, 2021

chriselrod commented Aug 9, 2021

mcabbott commented Aug 9, 2021

chriselrod commented Aug 9, 2021

mcabbott commented Aug 9, 2021