Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement batched serial gbtrf #2489

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

yasahi-hpc
Copy link
Contributor

This PR implements gbtrf function.

Following files are added:

  1. KokkosBatched_Gbtrf_Serial_Impl.hpp: Internal interfaces
  2. KokkosBatched_Gbtrf_Serial_Internal.hpp: Implementation details
  3. KokkosBatched_Gbtrf.hpp: APIs
  4. Test_Batched_SerialGbtrf.hpp: Unit tests for that

Detailed description

It computes an LU factorization of a real general M-by-N band matrix A using partial pivoting with row interchanges.
Here, the matrix has the following shape.

  • A: (batch_count, ldab, n)
    On entry, the matrix A in band storage. M-by-N matrix to be factored. On exit, the factors L and U from the factorization where U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 0 to KL+KU,
    and the multipliers used during the factorization are stored in rows KL+KU+1 to 2*KL+KU.
  • IPIV: (batch_count, min(m, n))
    The pivot indices; for 0 <= i < min(M,N), row i of the matrix was interchanged with row IPIV(i).
  • kl: The number of subdiagonals within the band of A. kl >= 0
  • ku: The number of superdiagonals within the band of A. ku >= 0
  • m: The number of rows of the matrix A. (optional)

Parallelization would be made in the following manner. This is efficient only when
A is given in LayoutLeft for GPUs and LayoutRight for CPUs (parallelized over batch direction).

Kokkos::parallel_for('gbtrf', 
    Kokkos::RangePolicy<execution_space> policy(0, n),
    [=](const int k) {
        auto aa = Kokkos::subview(m_a, k, Kokkos::ALL(), Kokkos::ALL());
        auto ipiv = Kokkos::subview(m_ipiv, k, Kokkos::ALL());

        KokkosBatched::SerialGbtrf<AlgoTagType>::invoke(aa, ipiv, kl, ku);
    });

Tests

  1. Make a random band matrix from random A and copy it to LU. Represent A in band storage AB and factorize it with gbtrf. Then, convert AB back into full storage A and extract L and U. Make a reference by getrf to get reference L and U from LU matrix. Finally, we confirm L and U are the same.
  2. Simple and small analytical test, i.e. choose A as follows to confirm LUB is factorized as expected.
A = [[1. -3. -2. 0.],
     [-1. 1 -3 -2],
     [2. -1. 1. -3],
     [0. 2. -1. 1.]]
LUB: [[0,       0,    0,    0],
      [0,       0,    0,   -3],
      [0,       0,    1,  1.5],
      [0,      -1, -2.5, -3.2],
      [2,    -2.5,   -3,  5.4],
      [-0.5, -0.2,    1,    0],
      [0.5,  -0.8,    0,    0]]
piv = [2 2 2 3]

Yuuichi Asahi added 5 commits January 28, 2025 03:21
@cwpearson cwpearson added the AT2-CI-APPROVAL Approve CI to run at SNL label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AT2-CI-APPROVAL Approve CI to run at SNL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants