-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjustments for Finch.jl#562
#57
Conversation
Overall I'm happy with the change, I would like only to investigate a bit further To repeat one of my comments: I run SDDMM with Executing:
:(function var"##compute#316"(prgm)
begin
V = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}
V_2 = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
V_3 = (((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
A0 = V::Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}
A2 = V_2::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
A5 = V_3::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
A8 = Tensor(Dense(SparseDict(Element{0.0, Float64}())))::Tensor{DenseLevel{Int64, SparseLevel{Int64, Finch.DictTable{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}
@finch mode = :fast begin
A8 .= 0.0
for i49 = _
for i47 = _
for i48 = _
A8[i48, i47] << + >>= (*)((*)(A0[i48, i47], A2[i47, i49]), A5[i48, i49])
end
end
end
return A8
end
return (swizzle(A8, 2, 1),)
end
end) And for the no-copy constructor (that differs only in Executing:
:(function var"##compute#308"(prgm)
begin
V = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int32}, PlusOneVector{Int32}, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
V_2 = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
V_3 = (((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
A0 = V::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int32}, PlusOneVector{Int32}, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
A2 = V_2::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
A5 = V_3::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
A8 = Tensor(Dense(SparseDict(Element{0.0, Float64}())))::Tensor{DenseLevel{Int64, SparseLevel{Int64, Finch.DictTable{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}
@finch mode = :fast begin
A8 .= 0.0
for i49 = _
for i47 = _
for i48 = _
A8[i48, i47] << + >>= (*)((*)(A0[i48, i47], A2[i47, i49]), A5[i48, i49])
end
end
end
return A8
end
return (swizzle(A8, 2, 1),)
end
end) |
d5a2f95
to
0604844
Compare
Ouch -- that's a big perf hit. I would wait to merge this as |
@hameerabbasi It turns out that |
I created an issue for it: finch-tensor/Finch.jl#565 |
@hameerabbasi Would you have a spare moment to review it? |
It's fairly minimal code -- there's not anything to review besides a dependency version bump. But if that dependency introduces a major regression, then we should wait for that regression to be fixed before bumping the dependency. |
AFAIU it's not necessarily a regression, as My current opinion is it's better to merge it and provide zero-copy/ |
A regression doesn't need to be a bug, it can be anything which practically renders the new version inferior to the old one. If such a situation arises, I'd re-evaluate after each release if the newest version is better for my use-case than whatever version I'm using now, and then upgrade. |
Then I think I can open a PR which drops PlusOneVector from finch-tensor, which acknowledges that it's not used internally. Then this PR won't be a regression. |
I think plusonevector introduces a type instability which we can fix pretty easily |
But i think we should merge this change because it strictly makes the finch-tensor more robust. This is a bug fix, and we need to fix plusonevector in any case |
Also, changes to the heuristic scheduler should be evaluated on mean improvement to all benchmarks. There will certainly be performance regressions on individual benchmarks as we develop it, but we need to make these changes to eventually reach better performance for the variety of kernels. |
There's a one-line fix for the performance bug in finch-tensor/Finch.jl#567, with a 100x perf improvement |
I bumped the version in there to 0.6.30 |
Hi @willow-ahrens,
Here's a PR with adjustments required by finch-tensor/Finch.jl#562.
One thing that I'm concerned about is reduced performance of code that uses
PlusOneVector
(zero-copy SciPy constructors), which is way slower than copy counterparts.For example, in SDDMM, changing copy to no-copy CSR matrix introduces x10 speed reduction.