-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use LogExpFunctions for losses #1866
base: master
Are you sure you want to change the base?
Conversation
@test xlogy(2, 3) ≈ 2.0 * log(3.0) | ||
@inferred xlogy(2, 3) | ||
@inferred xlogy(0, 1) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these tests pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested locally before removing and they do. https://github.com/JuliaStats/LogExpFunctions.jl/blob/master/test/basicfuns.jl also looks like a strict superset of the Flux tests.
function xlogy(x, y) | ||
result = x * log(y) | ||
ifelse(iszero(x), zero(result), result) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question is whether there are any performance differences, and whether we care. IIRC the replacements have if else
instead of ifelse
, but perhaps the compiler sorts it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found JuliaStats/LogExpFunctions.jl#26. GPU is the big question mark, but if #1791 is any indication there may not be a difference there either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty comparable:
using Flux.Losses: xlogx as f_xlogx, xlogy as f_xlogy
using LogExpFunctions: xlogx as l_xlogx, xlogy as l_xlogy
using BenchmarkTools, CUDA
x, y, out = ntuple(_ -> rand(Float32, 100_000), 3);
cx, cy, cout = ntuple(_ -> CUDA.rand(Float32, 100_000), 3);
julia> @btime $out .= f_xlogx.($x);
580.412 μs (0 allocations: 0 bytes)
julia> @btime $out .= l_xlogx.($x);
580.883 μs (0 allocations: 0 bytes)
julia> @btime $out .= f_xlogy.($x, $y);
622.826 μs (0 allocations: 0 bytes)
julia> @btime $out .= l_xlogy.($x, $y);
657.381 μs (0 allocations: 0 bytes)
julia> @btime CUDA.@sync $cout .= f_xlogx.($cx);
5.896 μs (7 allocations: 480 bytes)
julia> @btime CUDA.@sync $cout .= l_xlogx.($cx);
5.832 μs (7 allocations: 480 bytes)
julia> @btime CUDA.@sync $cout .= f_xlogy.($cx, $cy);
7.555 μs (23 allocations: 1.61 KiB)
julia> @btime CUDA.@sync $cout .= l_xlogy.($cx, $cy);
7.114 μs (23 allocations: 1.61 KiB)
I did a couple of runs and there was a not insignificant amount of variability, but at least the relative times aren't too far off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gret similar numbers.
Codecov Report
@@ Coverage Diff @@
## master #1866 +/- ##
==========================================
- Coverage 84.50% 84.47% -0.03%
==========================================
Files 21 21
Lines 1484 1475 -9
==========================================
- Hits 1254 1246 -8
+ Misses 230 229 -1
Continue to review full report at Codecov.
|
I'm not strongly opposed but I wonder what we gain here, the functions are so simple. Surely one could write a very fast Flux currently depends indirectly on this package (and its dependencies), but would ultimately like not to depend on Zygote, ForwardDiff, et. al. I lean towards a few lines of duplication being better than a tight web of every package depending on every other one. |
What about gradients here? Not that we were testing this before. |
I think the main benefit of external packages over code duplication is maintainability and testing. If it is a relatively lightweight dep and we use it for only a few functions, then I see no harm in having it. It'd be a different story if we take on a dep that provides many functions in Flux, making them intricately woven. |
Right, I just feel that maintenance and testing of lines
here might be less hassle than maintenance of lines in Project.toml, breaking changes downstream, etc. If we want it never to change, then never changing it seems simpler than hoping some other package won't. I agree this is subjective though. Besides such tradeoffs, the |
I suppose I hold the opposite perspective. There is a not significant amount of, for lack of a better term, derelict code kicking around in FluxML packages. Think a lot of the utility functions here (some which are not tested 🙈) and certain adjoints in Zygote. For this particular case, I don't think the extra dep is a problem for a few reasons. Firstly, if LogExpFunctions breaks then Flux is going to feel it either way, since it's on a critical path of some direct dependencies. Secondly, About broadcasting, I had a look back through the blame and the adjoint in question was added 2 years ago. Nowadays, every AD we care about has a fast path for broadcasting |
I'm still in favor overall |
This package is already in the dep tree through multiple paths (Zygote -> ForwardDiff, StatsBase, etc.), so we might as well make use of it.
PR Checklist
addedremoved ;)- [ ] Documentation, if applicable