-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new generic matrix operators #137
Conversation
I was considering making the transpose on the left- and right-hand size a boolean parameter or even a boolean template, but I wasn't sure... 🤔 |
@niermann999 I think the generic plugin (#131) would actually be super useful here; do you think we can incorporate this after merging the generic plugin? |
Yes, sure. It should be possible to use this with any linear algebra backend, also the vectorized ones (I did this in the last PR with the determinant and inverse now, until I had time to look into optimizations). Unfortunately, the full implementation of the generic plugin kind of happened on the way, so it is three PRs down the pipeline... |
@niermann999 do you have some ideas on how I would now move this to your new generic algebra plugin? |
Can you also try to add this operation to the matrix benchmarks? |
I am onboard with the direction but could you make this operators more CPU-vectorization friendly? I am not sure if the same technique used here can be applied |
Hi Beomki, thanks for the comment but I am not sure which of the functions you mean; where do you see any particular vectorization unfriendliness? |
@stephenswat Sure - Following is the part of for (size_type k = 0; k < N; ++k) {
T += element_getter()(A, i, k) * element_getter()(B, k, j);
} element getter is taking A(i,k). As [A(i,0), A(i,1), ..., A(i, N-1)] are not adjacent to each other, taking these values is not vectorization friendly. But we can make this fix in one of next PRs |
376d545
to
ef228a1
Compare
@niermann999 this is now ready for review. |
ef228a1
to
e486e2a
Compare
c37e64c
to
800ed7b
Compare
Ah yes, more obscure MSVC clownery. 🤡 |
800ed7b
to
05fd7a7
Compare
Our current approach for e.g. matrix multiplication reads very naturally, e.g. `A = B*C` models $A = BC$, but this has a problem on GPU code. Indeed, if these matrices have size $N \times N$, then we first concretize an $N \times N$ matrix which we then have to copy element-by-element into $A$. This means that we need to keep that many registers live, which is quite large for e.g. $7 \times 7$ free matrices (which thus occupy 49 registers). This problem can be alleviated by implementing optimized operators. More precisely, this PR implements the following new methods: * `set_product(A, B, C)` computes $A = BC$ without intermediate values. * `set_product_left_transpose(A, B, C)` computes $A = B^TC$ without intermediate values. * `set_product_right_transpose(A, B, C)` computes $A = BC^T` without intermediate values. * `set_inplace_product_right(A, B)` computes $A = AB$ in place. * `set_inplace_product_left(A, B)` computes $A = BA$ in place. * `set_inplace_product_right_transpose(A, B)` computes $A = AB^T$ in place. * `set_inplace_product_left_transpose(A, B)` computes $A = B^TA$ in place.
Quality Gate failedFailed conditions |
@niermann999 finally passes the CI; what do you think? |
Our current approach for e.g. matrix multiplication reads very naturally, e.g.$A = BC$ , but this has a problem on GPU code. Indeed, if these matrices have size $N \times N$ , then we first concretize an $N \times N$ matrix which we then have to copy element-by-element into $A$ . This means that we need to keep that many registers live, which is quite large for e.g. $7 \times 7$ free matrices (which thus occupy 49 registers).
A = B*C
modelsThis problem can be alleviated by implementing optimized operators. More precisely, this PR implements the following new methods:
set_product(A, B, C)
computesset_product_left_transpose(A, B, C)
computesset_product_right_transpose(A, B, C)
computesset_inplace_product_right(A, B)
computesset_inplace_product_left(A, B)
computesset_inplace_product_right_transpose(A, B)
computesset_inplace_product_left_transpose(A, B)
computesIncludes tests; depends on #134.