Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements gbtrf function.
Following files are added:
KokkosBatched_Gbtrf_Serial_Impl.hpp
: Internal interfacesKokkosBatched_Gbtrf_Serial_Internal.hpp
: Implementation detailsKokkosBatched_Gbtrf.hpp
: APIsTest_Batched_SerialGbtrf.hpp
: Unit tests for thatDetailed description
It computes an LU factorization of a real general M-by-N band matrix
A
using partial pivoting with row interchanges.Here, the matrix has the following shape.
A
:(batch_count, ldab, n)
On entry, the matrix A in band storage. M-by-N matrix to be factored. On exit, the factors
L
andU
from the factorization whereU
is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 0 to KL+KU,and the multipliers used during the factorization are stored in rows KL+KU+1 to 2*KL+KU.
IPIV
:(batch_count, min(m, n))
The pivot indices; for
0 <= i < min(M,N)
, rowi
of the matrix was interchanged with rowIPIV(i)
.kl
: The number of subdiagonals within the band of A. kl >= 0ku
: The number of superdiagonals within the band of A. ku >= 0m
: The number of rows of the matrix A. (optional)Parallelization would be made in the following manner. This is efficient only when
A is given in
LayoutLeft
for GPUs andLayoutRight
for CPUs (parallelized over batch direction).Tests
A
and copy it toLU
. RepresentA
in band storageAB
and factorize it withgbtrf
. Then, convertAB
back into full storageA
and extractL
andU
. Make a reference bygetrf
to get referenceL
andU
fromLU
matrix. Finally, we confirmL
andU
are the same.A
as follows to confirmLUB
is factorized as expected.