Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new generic matrix operators #137

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

stephenswat
Copy link
Member

@stephenswat stephenswat commented Nov 19, 2024

Our current approach for e.g. matrix multiplication reads very naturally, e.g. A = B*C models $A = BC$, but this has a problem on GPU code. Indeed, if these matrices have size $N \times N$, then we first concretize an $N \times N$ matrix which we then have to copy element-by-element into $A$. This means that we need to keep that many registers live, which is quite large for e.g. $7 \times 7$ free matrices (which thus occupy 49 registers).

This problem can be alleviated by implementing optimized operators. More precisely, this PR implements the following new methods:

  • set_product(A, B, C) computes $A = BC$ without intermediate values.
  • set_product_left_transpose(A, B, C) computes $A = B^TC$ without intermediate values.
  • set_product_right_transpose(A, B, C) computes $A = BC^T$ without intermediate values.
  • set_inplace_product_right(A, B) computes $A = AB$ in place.
  • set_inplace_product_left(A, B) computes $A = BA$ in place.
  • set_inplace_product_right_transpose(A, B) computes $A = AB^T$ in place.
  • set_inplace_product_left_transpose(A, B) computes $A = B^TA$ in place.

Includes tests; depends on #134.

@stephenswat stephenswat added the enhancement New feature or request label Nov 19, 2024
@stephenswat
Copy link
Member Author

stephenswat commented Nov 19, 2024

I was considering making the transpose on the left- and right-hand size a boolean parameter or even a boolean template, but I wasn't sure... 🤔

@stephenswat
Copy link
Member Author

@niermann999 I think the generic plugin (#131) would actually be super useful here; do you think we can incorporate this after merging the generic plugin?

@niermann999
Copy link
Contributor

@niermann999 I think the generic plugin (#131) would actually be super useful here; do you think we can incorporate this after merging the generic plugin?

Yes, sure. It should be possible to use this with any linear algebra backend, also the vectorized ones (I did this in the last PR with the determinant and inverse now, until I had time to look into optimizations). Unfortunately, the full implementation of the generic plugin kind of happened on the way, so it is three PRs down the pipeline...

@stephenswat
Copy link
Member Author

@niermann999 do you have some ideas on how I would now move this to your new generic algebra plugin?

@niermann999
Copy link
Contributor

Can you also try to add this operation to the matrix benchmarks?

@beomki-yeo
Copy link
Contributor

beomki-yeo commented Nov 22, 2024

I am onboard with the direction but could you make this operators more CPU-vectorization friendly?

I am not sure if the same technique used here can be applied

@stephenswat
Copy link
Member Author

I am onboard with the direction but could you make this operators more CPU-vectorization friendly?

Hi Beomki, thanks for the comment but I am not sure which of the functions you mean; where do you see any particular vectorization unfriendliness?

@beomki-yeo
Copy link
Contributor

@stephenswat Sure - Following is the part of set_product

        for (size_type k = 0; k < N; ++k) {
          T += element_getter()(A, i, k) * element_getter()(B, k, j);
        }

element getter is taking A(i,k). As [A(i,0), A(i,1), ..., A(i, N-1)] are not adjacent to each other, taking these values is not vectorization friendly.

But we can make this fix in one of next PRs

@stephenswat
Copy link
Member Author

@niermann999 this is now ready for review.

common/include/algebra/concepts.hpp Outdated Show resolved Hide resolved
tests/common/test_host_basics.hpp Outdated Show resolved Hide resolved
@stephenswat stephenswat force-pushed the feat/more_mat_ops branch 5 times, most recently from c37e64c to 800ed7b Compare December 3, 2024 16:47
@stephenswat
Copy link
Member Author

Ah yes, more obscure MSVC clownery. 🤡

Our current approach for e.g. matrix multiplication reads very
naturally, e.g. `A = B*C` models $A = BC$, but this has a problem on GPU
code. Indeed, if these matrices have size $N \times N$, then we first
concretize an $N \times N$ matrix which we then have to copy
element-by-element into $A$. This means that we need to keep that many
registers live, which is quite large for e.g. $7 \times 7$ free
matrices (which thus occupy 49 registers).

This problem can be alleviated by implementing optimized operators. More
precisely, this PR implements the following new methods:

* `set_product(A, B, C)` computes $A = BC$ without intermediate values.
* `set_product_left_transpose(A, B, C)` computes $A = B^TC$ without
  intermediate values.
* `set_product_right_transpose(A, B, C)` computes $A = BC^T` without
  intermediate values.
* `set_inplace_product_right(A, B)` computes $A = AB$ in place.
* `set_inplace_product_left(A, B)` computes $A = BA$ in place.
* `set_inplace_product_right_transpose(A, B)` computes $A = AB^T$ in
  place.
* `set_inplace_product_left_transpose(A, B)` computes $A = B^TA$ in
  place.
Copy link

sonarqubecloud bot commented Dec 4, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
21.1% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@stephenswat
Copy link
Member Author

@niermann999 finally passes the CI; what do you think?

@stephenswat stephenswat merged commit d167b11 into acts-project:main Dec 4, 2024
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants