-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BENCH: profile memleaks asv #821
base: main
Are you sure you want to change the base?
BENCH: profile memleaks asv #821
Conversation
def peakmem_log_get_record(self, module): | ||
for i in range(100000): | ||
log = cffi_backend.log_open(self.ior_log_path) | ||
rec = cffi_backend.log_get_record(log, module) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since log_get_record()
automatically dispatches to _log_get_heatmap_record()
when HEATMAP
module is requested, I think we'll just automatically cover our bases for the memory leaks when a log file with += HEATMAP
is used here.
|
||
|
||
def peakmem_log_get_mounts(self, module): | ||
for i in range(100000): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magnitude of the mount leak is so massive we could probably cut this down by 10x so we don't hit peak memory values above the capacity of an average laptop. The current ratios are 0.00
since they span orders of magnitude improvements.
Was working on diff below the fold when I ran into gh-829. One other thing worth noting is that because the Python build system doesn't drive the building of the C code, the benchmarks are stuck running on a single version of the compiled C code. I think I've brought this up before... but that's a bigger problem for another discussion I think. diff --git a/darshan-util/pydarshan/benchmarks/benchmarks/cffi_backend.py b/darshan-util/pydarshan/benchmarks/benchmarks/cffi_backend.py
index e4c074b2..539e87b5 100644
--- a/darshan-util/pydarshan/benchmarks/benchmarks/cffi_backend.py
+++ b/darshan-util/pydarshan/benchmarks/benchmarks/cffi_backend.py
@@ -1,3 +1,4 @@
+import darshan
from darshan.backend import cffi_backend
from darshan.log_utils import get_log_path
@@ -27,7 +28,15 @@ class RetrieveLogData:
def peakmem_log_get_mounts(self, module):
- for i in range(100000):
+ for i in range(10000):
log = cffi_backend.log_open(self.ior_log_path)
rec = cffi_backend.log_get_mounts(log)
cffi_backend.log_close(log)
+
+
+ def peakmem_log_get_name_recs(self, module):
+ for i in range(10000):
+ with darshan.DarshanReport(self.ior_log_path,
+ read_all=True,
+ lookup_name_records=True) as report:
+ pass |
* add some `asv` peak memory benchmarks as regression guards against darshan-hpcgh-811; the only thing missing (IMO) here to completely close darshan-hpcgh-817 is a version of the log file with `HEATMAP` in it (see the diff comments for why I don't have this for `asv` suite at the moment) * although this won't automatically run in CI because it takes too long and too much memory, it still seems to do an excellent job of being sensitive to the patched leaks over the recent history of our project: - `asv continuous -e -b "peakmem.*" b6c8599 main` ``` before after ratio [b6c8599] [3fb2c36] <treddy_memleak_cffi_only_pt2~1> <main> - 239M 76M 0.32 cffi_backend.RetrieveLogData.peakmem_log_get_record('LUSTRE') - 241M 76M 0.32 cffi_backend.RetrieveLogData.peakmem_log_get_record('H5F') - 257M 75.8M 0.30 cffi_backend.RetrieveLogData.peakmem_log_get_record('STDIO') - 289M 76M 0.26 cffi_backend.RetrieveLogData.peakmem_log_get_record('MPI-IO') - 303M 76M 0.25 cffi_backend.RetrieveLogData.peakmem_log_get_record('POSIX') - 304M 75.4M 0.25 cffi_backend.RetrieveLogData.peakmem_log_get_record('DXT_POSIX') - 304M 75.3M 0.25 cffi_backend.RetrieveLogData.peakmem_log_get_record('DXT_MPIIO') - 324M 75.8M 0.23 cffi_backend.RetrieveLogData.peakmem_log_get_record('H5D') - 23.7G 75.4M 0.00 cffi_backend.RetrieveLogData.peakmem_log_get_mounts('DXT_POSIX') - 23.7G 75.4M 0.00 cffi_backend.RetrieveLogData.peakmem_log_get_mounts('H5F') - 23.7G 75.4M 0.00 cffi_backend.RetrieveLogData.peakmem_log_get_mounts('H5D') - 23.7G 75.4M 0.00 cffi_backend.RetrieveLogData.peakmem_log_get_mounts('STDIO') - 23.7G 75.4M 0.00 cffi_backend.RetrieveLogData.peakmem_log_get_mounts('MPI-IO') - 23.7G 75.4M 0.00 cffi_backend.RetrieveLogData.peakmem_log_get_mounts('DXT_MPIIO') - 23.7G 75.4M 0.00 cffi_backend.RetrieveLogData.peakmem_log_get_mounts('LUSTRE') - 23.7G 75.4M 0.00 cffi_backend.RetrieveLogData.peakmem_log_get_mounts('POSIX') SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. PERFORMANCE INCREASED. ```
* more reasonable memory footprint for `peakmem_log_get_mounts()` * add regression guard for darshan-hpcgh-824
4c0f192
to
efd6123
Compare
With the latest revisions pushed in, I think I've dealt with my own concerns at least, and we get the extra regression case nicely covered + stay within ~laptop memory footprint now I think:
|
We may want to open an issue to remind us to add a log with runtime |
Looks great, thanks @tylerjereddy. Should we add a test case for the
So, I guess that means for the results you shared that we are only considering leaks that can be resolved at the Python level, and that we don't have an easy way to do before/after with C library changes? I agree that's not ideal, as these leaks were addressed with both C and Python level changes. Feel free to defer that to another issue, particularly if you think there's some mechanisms available to help with that. |
I think I got that covered, no?
The Assuming that would be super annoying to do, we could have a |
Oh yeah, that's right. Forgot that call was already embedded in the path for other things, so should be covered.
I see, yeah, that does sound tricky (at least to me). I don't think it's particularly high priority, either, so I'm fine with just punting for now. |
add some
asv
peak memory benchmarks as regression guards against Memory leak in PyDarshan #811; the only thing missing (IMO) here to completely close TST, BENCH: memory profiling/testing #817 is a version of the log file withHEATMAP
in it (see the diff comments for why I don't have this forasv
suite at the moment)although this won't automatically run in CI because it takes too long and too much memory, it still seems to do an excellent job of being sensitive to the patched leaks over the recent history of our project:
asv continuous -e -b "peakmem.*" b6c8599ee8 main