Venado optimizations #755

mewall · 2025-01-14T16:54:26Z

Modify bml_transpose() fortran API to match the C API
o Add bml_transpose_new()
o Change the tests to use the new API
Add methods to get the pointer for MAGMA arrays, for use in Fortran OpenACC and OpenMP offload
o Write fortran wrapper for existing bml_get_data_ptr_dense()
o Add new bml_get_ld_dense()
Add bml_set_N_dense() to change the size of a bml array that's already been allocated
o This avoids unnecessary allocations and leads to substantial speedups
o Unsafe method that's exposed in fortran for dense matrices only

o Back out bml_get_ptr_dense() from bml_getters o Write fortran wrapper for existing bml_get_data_ptr_dense() method o Write new bml_get_ld_dense() to enable magma matrix pointer use

nicolasbock

LGTM

scripts/build_bml_cray.sh

jeanlucf22 · 2025-01-14T17:59:54Z

src/C-interface/dense/bml_setters_dense.c

+#include "magma_v2.h"
+#endif
+
+void bml_set_N_dense(


I don't see a use case for setting N after construction. Is it to avoid constructing a new matrix?

@jeanlucf22 Yes. This was addressed in the commit comment but the bullets needed fixing to clarify (now corrected). This is to avoid repeated memory allocations which became a bottleneck in MD simulations.

Looks dangerous to me: you modify N without modifying anything else will lead to matrices in inconsistent state!

As I mention in the comments, it is unsafe. It's possible to change N so that the matrix will exceed its original allocation. It's exposed only as bml_set_N_dense(), for experts. It works. Can you explain how the matrix will be inconsistent? It's needed for an application, we need to figure out how to make this method available in some form.

If needed we can protect against exceeding initial size by adding an extra variable to the struct keeping track of the size of the originally allocated matrix. But it's less intrusive to leave the struct alone...

I believe the performance issue comes from the allocation of the struct members domain and domain2:

bml/src/C-interface/dense/bml_types_dense.h

Line 30 in ef2ee26

bml_domain_t *domain;

I looked into it at some point, but the fix I tried was breaking qmd-progress.
These two struct members could be static or initialized as needed.
Increasing N would obviously lead to a memory allocation too small to hold an NxN matrix.

What is the use case? Is N decreasing and you want to reuse the allocation? Whatever it is, I think looking into domain and domain2 is a safer and better alternative.

The issue is needing to use a MAGMA working array of differing size on different MD iterations, where max N is known. The problem is that creating a new array each time is slow, due to the GPU allocation. The solution is to allocate using max N once and to resize the array using bml_set_N_dense() as needed. How can I use domain and domain2 to do this? I'm not familiar with those.

I understand now.

I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time.

Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case.

scripts/build_chicoma_hackathon.sh

mewall · 2025-01-28T00:41:44Z

Yes, I think such a function could be good. The dense case is essentially done, even for magma build, we’d just need to add the allocated size to the struct and add a check to make it work. Need to figure out the other matrix types, if they’ll be supported. Meanwhile, what do you think about merging the current function? Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________ From: Jean-Luc Fattebert ***@***.***> Sent: Monday, January 27, 2025 4:21:48 PM To: lanl/bml ***@***.***> Cc: Wall, Michael E ***@***.***>; Author ***@***.***> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755) @jeanlucf22 commented on this pull request.

________________________________ In src/C-interface/dense/bml_setters_dense.c<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$>:

@@ -3,6 +3,22 @@

#include "bml_setters_dense.h" #include "bml_types_dense.h" +#ifdef BML_USE_MAGMA +#include "magma_v2.h" +#endif + +void bml_set_N_dense( I understand now. I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time. Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case. — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEIOEH32HB66PF4BJQ32M25QZAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZWG42TCNBWGM__;!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8iWVuCBR$>. You are receiving this because you authored the thread.Message ID: ***@***.***>

jeanlucf22 · 2025-01-29T19:24:40Z

Yes, I think such a function could be good. The dense case is essentially done, even for magma build, we’d just need to add the allocated size to the struct and add a check to make it work. Need to figure out the other matrix types, if they’ll be supported. Meanwhile, what do you think about merging the current function? Get Outlook for iOShttps://aka.ms/o0ukef
________________________________ From: Jean-Luc Fattebert @.> Sent: Monday, January 27, 2025 4:21:48 PM To: lanl/bml @.> Cc: Wall, Michael E @.>; Author @.> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755) @jeanlucf22 commented on this pull request.
________________________________ In src/C-interface/dense/bml_setters_dense.chttps://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$:
@@ -3,6 +3,22 @@
#include "bml_setters_dense.h" #include "bml_types_dense.h" +#ifdef BML_USE_MAGMA +#include "magma_v2.h" +#endif + +void bml_set_N_dense( I understand now. I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time. Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case. — Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEIOEH32HB66PF4BJQ32M25QZAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZWG42TCNBWGM__;!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8iWVuCBR$. You are receiving this because you authored the thread.Message ID: @.***>

I don't think it is a good idea to merge as is. That's what branches are for if you need it as is right away.

jmohdyusof · 2025-01-29T20:06:41Z

FWIW, for the ellpack format, you can probably make this work simply by making nnz(i) = 0 for i > n. Obviously M is only an upper bound already, and the loops would continue to run over all N rows, unless you decided to introduce another variable to truncate them.

mewall · 2025-01-29T21:31:29Z

Can you please provide a list of requirements for merging this into master?

…

________________________________ From: Jean-Luc Fattebert ***@***.***> Sent: Wednesday, January 29, 2025 12:25 PM To: lanl/bml ***@***.***> Cc: Wall, Michael E ***@***.***>; Author ***@***.***> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755) Yes, I think such a function could be good. The dense case is essentially done, even for magma build, we’d just need to add the allocated size to the struct and add a check to make it work. Need to figure out the other matrix types, if they’ll be supported. Meanwhile, what do you think about merging the current function? Get Outlook for iOShttps://aka.ms/o0ukef<https://urldefense.com/v3/__https://aka.ms/o0ukef__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47KSGeMqG$>

________________________________ From: Jean-Luc Fattebert @.> Sent: Monday, January 27, 2025 4:21:48 PM To: lanl/bml @.> Cc: Wall, Michael E @.>; Author @.> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47HtHaUCk$>) @jeanlucf22<https://urldefense.com/v3/__https://github.com/jeanlucf22__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47HKzc6_0$> commented on this pull request.

________________________________ In src/C-interface/dense/bml_setters_dense.chttps://urldefense.com/v3/__#755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$: @@ -3,6 +3,22 @@ #include "bml_setters_dense.h" #include "bml_types_dense.h" +#ifdef BML_USE_MAGMA +#include "magma_v2.h" +#endif + +void bml_set_N_dense( I understand now. I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time. Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case. — Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__#755*discussion_r1931312603__;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEIOEH32HB66PF4BJQ32M25QZAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZWG42TCNBWGM__;!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8iWVuCBR$. You are receiving this because you authored the thread.Message ID: @.***> I don't think it is a good idea to merge as is. That's what branches are for if you need it as is right away. — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*issuecomment-2622639216__;Iw!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47BQYtol7$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEK7MTI6M5EYAESKTYT2NETI7AVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRSGYZTSMRRGY__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47E1cI_bk$>. You are receiving this because you authored the thread.Message ID: ***@***.***>

mewall · 2025-01-29T21:47:30Z

I'll take a look, I think for ellpack it's possible that the bml_set_N() method could work as well, for resizing N in an existing matrix buffer. There could also be a bml_set_M() method for resizing M. I think the expectation should be that the data are lost upon resizing, and that there's no zeroing of elements, as the purpose is to make it fast by avoiding memory access.

…

________________________________ From: Jamal Mohd-Yusof ***@***.***> Sent: Wednesday, January 29, 2025 1:07 PM To: lanl/bml ***@***.***> Cc: Wall, Michael E ***@***.***>; Author ***@***.***> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755) FWIW, for the ellpack format, you can probably make this work simply by making nnz(i) = 0 for i > n. Obviously M is only an upper bound already, and the loops would continue to run over all N rows, unless you decided to introduce another variable to truncate them. — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*issuecomment-2622730507__;Iw!!Bt8fGhp8LhKGRg!B99uRRq12z8IMCZGBgVUiQyDYd8GtmmNzTTbeYAvf0MbSkB75_z_8zxznk5Z9VDMvL1v_vGSlk4g39_XhAKe3P87$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEJ5TEFKXLLPCS5VK432NEYGRAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRSG4ZTANJQG4__;!!Bt8fGhp8LhKGRg!B99uRRq12z8IMCZGBgVUiQyDYd8GtmmNzTTbeYAvf0MbSkB75_z_8zxznk5Z9VDMvL1v_vGSlk4g39_XhChlp8l8$>. You are receiving this because you authored the thread.Message ID: ***@***.***>

jeanlucf22 · 2025-01-30T03:02:01Z

Can you please provide a list of requirements for merging this into master?
…
________________________________ From: Jean-Luc Fattebert @.> Sent: Wednesday, January 29, 2025 12:25 PM To: lanl/bml @.> Cc: Wall, Michael E @.>; Author @.> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755) Yes, I think such a function could be good. The dense case is essentially done, even for magma build, we’d just need to add the allocated size to the struct and add a check to make it work. Need to figure out the other matrix types, if they’ll be supported. Meanwhile, what do you think about merging the current function? Get Outlook for iOShttps://aka.ms/o0ukefhttps://urldefense.com/v3/__https://aka.ms/o0ukef__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47KSGeMqG$
________________________________ From: Jean-Luc Fattebert @.> Sent: Monday, January 27, 2025 4:21:48 PM To: lanl/bml @.> Cc: Wall, Michael E @.>; Author @.> Subject: [EXTERNAL] Re: [lanl/bml] Venado optimizations (PR #755 https://urldefense.com/v3/__https://github.com/lanl/bml/pull/755__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47HtHaUCk$) @jeanlucf22 https://urldefense.com/v3/__https://github.com/jeanlucf22__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47HKzc6_0$ commented on this pull request.
________________________________ In src/C-interface/dense/bml_setters_dense.chttps://urldefense.com/v3/#755*discussion_r1931312603;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$: @@ -3,6 +3,22 @@ #include "bml_setters_dense.h" #include "bml_types_dense.h" +#ifdef BML_USE_MAGMA +#include "magma_v2.h" +#endif + +void bml_set_N_dense( I understand now. I was not suggesting to use domain and domain2. I was just saying their allocation may be the main culprit when it comes to allocation time for a dense matrix. Maybe an issue to deal with another time. Another suggestion: having a function resizeNoAlloc(int n) that would just change N if n<=N, otherwise would change N and reallocate memory? Having an extra struct member keeping track of allocated memory size would be good in that case. — Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/#755*discussion_r1931312603;Iw!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8vBVVBy3$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEIOEH32HB66PF4BJQ32M25QZAVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZWG42TCNBWGM__;!!Bt8fGhp8LhKGRg!FbQ06GCuZLlZD10ZOodErg20Q_XH3WAFVKL0R0Oc8HqI9Aed_3wlkpMOHRdkO-20l936fvvr2iZGDdyW8iWVuCBR$. You are receiving this because you authored the thread.Message ID: @.> I don't think it is a good idea to merge as is. That's what branches are for if you need it as is right away. — Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/lanl/bml/pull/755*issuecomment-2622639216__;Iw!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47BQYtol7$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEK7MTI6M5EYAESKTYT2NETI7AVCNFSM6AAAAABVFKYFIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRSGYZTSMRRGY__;!!Bt8fGhp8LhKGRg!Cjdib0KzKOuP85xu8ga136DukYOoPPKVgcctLCsr3CufV8WwR9W6zwe798bqE7qu0gpHiural0miy5b47E1cI_bk$. You are receiving this because you authored the thread.Message ID: @.>

Have the function keep the matrix in a consistent state: memory allocated <= N*ld

mewall · 2025-01-30T16:22:09Z

Have the function keep the matrix in a consistent state: memory allocated <= N*ld

OK here's what I propose

A add a new variable "num_elems_allocated" to the following struct that holds the total number of elements in the dense matrix

bml/src/C-interface/dense/bml_types_dense.h

Line 15 in ef2ee26

struct bml_matrix_dense_t

If you can foresee any consequences for this modification elsewhere in the code, please let me know

Leave bml_set_N_dense() as a function to change N.

Fail using LOG_ERROR()

I will be unable to merge our PROGRESS optimizations into master without merging this into master, so please, if there is anything else that will hold this up, let me know now.

jeanlucf22 · 2025-01-30T16:47:36Z

Have the function keep the matrix in a consistent state: memory allocated <= N*ld

OK here's what I propose

A add a new variable "num_elems_allocated" to the following struct that holds the total number of elements in the dense matrix

bml/src/C-interface/dense/bml_types_dense.h

Line 15 in ef2ee26

struct bml_matrix_dense_t

If you can foresee any consequences for this modification elsewhere in the code, please let me know

Leave bml_set_N_dense() as a function to change N.

Fail using LOG_ERROR()

I will be unable to merge our PROGRESS optimizations into master without merging this into master, so please, if there is anything else that will hold this up, let me know now.

Personally, I'd prefer another name for the function, such as "resize" which indicates the new matrix is actually in a working state, just in a different size. I would also prefer to reallocate when N larger, instead of failing.

Any other opinion?

jmohdyusof · 2025-01-30T16:59:10Z

My 2c:
num_elements_allocated is somewhat confusing, is that NxN or nxn? It sounds like NxN, but your description sounds like the latter (which is more like n_in_use)

Is the proposal that resize is an asymmetric function that reallocates if N_new > N_old but just reuses the old data allocation if N_new <= N_old?

mewall · 2025-01-30T17:13:12Z

My 2c: num_elements_allocated is somewhat confusing, is that NxN or nxn? It sounds like NxN, but your description sounds like the latter (which is more like n_in_use)

It's NxN, or Nxld, depending on CPU vs MAGMA. I'm not sure what lower case n refers to though...is there another complexity I'm not seeing?

Is the proposal that resize is an asymmetric function that reallocated in N_new > N_old but just reuses the old data allocation if N_new <= N_old?

That's my understanding. My preference would be to just do the thing asked for and fail otherwise. If there are two different things done under the hood depending on how it's called that creates more complexity when optimizing at the application level.

FWIW this is why I would prefer to merge what I've committed already: it does what it says, works, is completely isolated in the code, and nobody but an expert will use it. Now we're into different territory...

mewall · 2025-01-30T17:15:22Z

Have the function keep the matrix in a consistent state: memory allocated <= N*ld

OK here's what I propose
A add a new variable "num_elems_allocated" to the following struct that holds the total number of elements in the dense matrix

bml/src/C-interface/dense/bml_types_dense.h

Line 15 in ef2ee26

struct bml_matrix_dense_t

If you can foresee any consequences for this modification elsewhere in the code, please let me know
Leave bml_set_N_dense() as a function to change N.
Fail using LOG_ERROR()
I will be unable to merge our PROGRESS optimizations into master without merging this into master, so please, if there is anything else that will hold this up, let me know now.

Personally, I'd prefer another name for the function, such as "resize" which indicates the new matrix is actually in a working state, just in a different size. I would also prefer to reallocate when N larger, instead of failing.

Any other opinion?

I like bml_set_N_dense() because that's what it does, and would prefer that the behavior is consistent from call to call...but I will do whatever is needed to get this merged!

jmohdyusof · 2025-01-30T17:40:34Z

In my comment above, n refers to the number of rows actually used (of the N allocated).
The confusion (to me) arises because in the past N==n, so things like set_N_dense are ambiguous unless we differentiate between them. Does set_N_dense currently always reallocate even if N_new <= N_old?

I have no particular opinion on this particular merge, just trying to avoid coming back to a code with confusing (to me) nomenclature.

mewall · 2025-01-30T17:58:52Z

In my comment above, n refers to the number of rows actually used (of the N allocated). The confusion (to me) arises because in the past N==n, so things like set_N_dense are ambiguous unless we differentiate between them. Does set_N_dense currently always reallocate even if N_new <= N_old?

I have no particular opinion on this particular merge, just trying to avoid coming back to a code with confusing (to me) nomenclature.

There is no current bml_set_N, bml_set_N_dense it's a new addition. There's a bml_get_N() in the introspection code, I just reasoned it was natural to use bml_set_N for the setting. I think it's an OK choice as the variable is N in the struct. I'm not aware of a way to select a fewer number of rows/columns to use in a currently allocated matrix, but I might have missed something when I poked around looking for a way to do what I needed to.

jmohdyusof · 2025-01-30T18:06:57Z

I generally agree with the set/get duality for naming.
So in the case of reusing an allocation, you just set N_new < N_old but retain the same pointer to allocated memory, and keep track of N_original somewhere to make sure N_new <= N_original, otherwise reallocate?

mewall · 2025-01-30T18:17:33Z

I generally agree with the set/get duality for naming. So in the case of reusing an allocation, you just set N_new < N_old but retain the same pointer to allocated memory, and keep track of N_original somewhere to make sure N_new <= N_original, otherwise reallocate?

That's right. The burden is currently on the programmer to ensure the method isn't called when N_new > N_original. Nothing in BML helps with that, AFAIK it would require keeping track of N_original in the struct (the call can happen many times with different values of N_new, so just comparing to the current N doesn't work).

jmohdyusof · 2025-01-30T18:19:59Z

In that case I agree that adding N_original (or N_allocated if you prefer) to the struct is the correct way to go.

mewall · 2025-01-30T18:26:09Z

In that case I agree that adding N_original (or N_allocated if you prefer) to the struct is the correct way to go.

I'm OK with using N_original (linear size) instead of num_elems_allocated (total size of the array). That should work for either the CPU of MAGMA code path.

@jeanlucf22?

jmohdyusof · 2025-01-30T18:29:18Z

I definitely prefer N specification over num_elements.

jeanlucf22 · 2025-01-30T19:34:23Z

In that case I agree that adding N_original (or N_allocated if you prefer) to the struct is the correct way to go.

I'm OK with using N_original (linear size) instead of num_elems_allocated (total size of the array). That should work for either the CPU of MAGMA code path.

@jeanlucf22?

Fine with N_allocated.

mewall · 2025-01-30T19:38:22Z

In that case I agree that adding N_original (or N_allocated if you prefer) to the struct is the correct way to go.

I'm OK with using N_original (linear size) instead of num_elems_allocated (total size of the array). That should work for either the CPU of MAGMA code path.
@jeanlucf22?

Fine with N_allocated.

OK, it sounds like we have a consensus around N_allocated. I'll make the change and push a revision.

mewall added 10 commits January 14, 2025 09:30

Add build script for venado hackathon

e827603

Add preliminary method to set matrix size

096d3fe

Add build scripts for hackathon

eac6576

Add bml_transpose_inplace Fortran subroutine

e7a8fc0

Modify Fortran bml_transpose API to match the C interface

b1dcbb1

Modify tests to use new transpose API

7ded60f

Venado build modifications

87d02dd

Move build scripts to scripts/ dir

d766123

Expose dense matrix pointer using bml_get_ptr_dense

84be6e9

New introspection methods

559bc8e

o Back out bml_get_ptr_dense() from bml_getters o Write fortran wrapper for existing bml_get_data_ptr_dense() method o Write new bml_get_ld_dense() to enable magma matrix pointer use

mewall requested review from nicolasbock, cnegre, suemni, jeanlucf22, tokshgithub and jmohdyusof as code owners January 14, 2025 16:54

suemni approved these changes Jan 14, 2025

View reviewed changes

nicolasbock approved these changes Jan 14, 2025

View reviewed changes

nicolasbock enabled auto-merge January 14, 2025 18:45

jeanlucf22 requested changes Jan 14, 2025

View reviewed changes

Remove LANL-specific build scripts

fc03d39

jeanlucf22 reviewed Jan 27, 2025

View reviewed changes

scripts/build_chicoma_hackathon.sh Outdated Show resolved Hide resolved

Remove an additional LANL specific build script

0fc96d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Venado optimizations #755

Venado optimizations #755

mewall commented Jan 14, 2025 •

edited

Loading

nicolasbock left a comment

jeanlucf22 Jan 14, 2025

mewall Jan 27, 2025

jeanlucf22 Jan 27, 2025

mewall Jan 27, 2025

mewall Jan 27, 2025

jeanlucf22 Jan 27, 2025

mewall Jan 27, 2025

jeanlucf22 Jan 27, 2025

mewall commented Jan 28, 2025 via email

jeanlucf22 commented Jan 29, 2025

jmohdyusof commented Jan 29, 2025

mewall commented Jan 29, 2025 via email

mewall commented Jan 29, 2025 via email

jeanlucf22 commented Jan 30, 2025

mewall commented Jan 30, 2025

jeanlucf22 commented Jan 30, 2025

jmohdyusof commented Jan 30, 2025 •

edited

Loading

mewall commented Jan 30, 2025

mewall commented Jan 30, 2025

jmohdyusof commented Jan 30, 2025

mewall commented Jan 30, 2025

jmohdyusof commented Jan 30, 2025

mewall commented Jan 30, 2025

jmohdyusof commented Jan 30, 2025

mewall commented Jan 30, 2025

jmohdyusof commented Jan 30, 2025

jeanlucf22 commented Jan 30, 2025

mewall commented Jan 30, 2025

Venado optimizations #755

Are you sure you want to change the base?

Venado optimizations #755

Conversation

mewall commented Jan 14, 2025 • edited Loading

nicolasbock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mewall commented Jan 28, 2025 via email

jeanlucf22 commented Jan 29, 2025

jmohdyusof commented Jan 29, 2025

mewall commented Jan 29, 2025 via email

mewall commented Jan 29, 2025 via email

jeanlucf22 commented Jan 30, 2025

mewall commented Jan 30, 2025

jeanlucf22 commented Jan 30, 2025

jmohdyusof commented Jan 30, 2025 • edited Loading

mewall commented Jan 30, 2025

mewall commented Jan 30, 2025

jmohdyusof commented Jan 30, 2025

mewall commented Jan 30, 2025

jmohdyusof commented Jan 30, 2025

mewall commented Jan 30, 2025

jmohdyusof commented Jan 30, 2025

mewall commented Jan 30, 2025

jmohdyusof commented Jan 30, 2025

jeanlucf22 commented Jan 30, 2025

mewall commented Jan 30, 2025

mewall commented Jan 14, 2025 •

edited

Loading

jmohdyusof commented Jan 30, 2025 •

edited

Loading