Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BENCH: profile memleaks asv #821

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tylerjereddy
Copy link
Collaborator

  • add some asv peak memory benchmarks as regression guards against Memory leak in PyDarshan #811; the only thing missing (IMO) here to completely close TST, BENCH: memory profiling/testing #817 is a version of the log file with HEATMAP in it (see the diff comments for why I don't have this for asv suite at the moment)

  • although this won't automatically run in CI because it takes too long and too much memory, it still seems to do an excellent job of being sensitive to the patched leaks over the recent history of our project:

  • asv continuous -e -b "peakmem.*" b6c8599ee8 main
       before           after         ratio
     [b6c8599e]       [3fb2c368]
     <treddy_memleak_cffi_only_pt2~1>       <main>
-            239M              76M     0.32  cffi_backend.RetrieveLogData.peakmem_log_get_record('LUSTRE')
-            241M              76M     0.32  cffi_backend.RetrieveLogData.peakmem_log_get_record('H5F')
-            257M            75.8M     0.30  cffi_backend.RetrieveLogData.peakmem_log_get_record('STDIO')
-            289M              76M     0.26  cffi_backend.RetrieveLogData.peakmem_log_get_record('MPI-IO')
-            303M              76M     0.25  cffi_backend.RetrieveLogData.peakmem_log_get_record('POSIX')
-            304M            75.4M     0.25  cffi_backend.RetrieveLogData.peakmem_log_get_record('DXT_POSIX')
-            304M            75.3M     0.25  cffi_backend.RetrieveLogData.peakmem_log_get_record('DXT_MPIIO')
-            324M            75.8M     0.23  cffi_backend.RetrieveLogData.peakmem_log_get_record('H5D')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('DXT_POSIX')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('H5F')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('H5D')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('STDIO')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('MPI-IO')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('DXT_MPIIO')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('LUSTRE')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('POSIX')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

def peakmem_log_get_record(self, module):
for i in range(100000):
log = cffi_backend.log_open(self.ior_log_path)
rec = cffi_backend.log_get_record(log, module)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since log_get_record() automatically dispatches to _log_get_heatmap_record() when HEATMAP module is requested, I think we'll just automatically cover our bases for the memory leaks when a log file with += HEATMAP is used here.



def peakmem_log_get_mounts(self, module):
for i in range(100000):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magnitude of the mount leak is so massive we could probably cut this down by 10x so we don't hit peak memory values above the capacity of an average laptop. The current ratios are 0.00 since they span orders of magnitude improvements.

@tylerjereddy
Copy link
Collaborator Author

I suppose we might as well augment this PR with the reproducer from gh-824 to enforce the additional leak fix in gh-825.

@tylerjereddy
Copy link
Collaborator Author

Was working on diff below the fold when I ran into gh-829. One other thing worth noting is that because the Python build system doesn't drive the building of the C code, the benchmarks are stuck running on a single version of the compiled C code. I think I've brought this up before... but that's a bigger problem for another discussion I think.

diff --git a/darshan-util/pydarshan/benchmarks/benchmarks/cffi_backend.py b/darshan-util/pydarshan/benchmarks/benchmarks/cffi_backend.py
index e4c074b2..539e87b5 100644
--- a/darshan-util/pydarshan/benchmarks/benchmarks/cffi_backend.py
+++ b/darshan-util/pydarshan/benchmarks/benchmarks/cffi_backend.py
@@ -1,3 +1,4 @@
+import darshan
 from darshan.backend import cffi_backend
 from darshan.log_utils import get_log_path
 
@@ -27,7 +28,15 @@ class RetrieveLogData:
 
 
     def peakmem_log_get_mounts(self, module):
-        for i in range(100000):
+        for i in range(10000):
             log = cffi_backend.log_open(self.ior_log_path)
             rec = cffi_backend.log_get_mounts(log)
             cffi_backend.log_close(log)
+
+
+    def peakmem_log_get_name_recs(self, module):
+        for i in range(10000):
+            with darshan.DarshanReport(self.ior_log_path,
+                                       read_all=True,
+                                       lookup_name_records=True) as report:
+                pass

* add some `asv` peak memory benchmarks as regression guards
against darshan-hpcgh-811; the only thing missing (IMO) here to completely
close darshan-hpcgh-817 is a version of the log file with `HEATMAP` in it (see
the diff comments for why I don't have this for `asv` suite at the moment)

* although this won't automatically run in CI because it takes
too long and too much memory, it still seems to do an excellent
job of being sensitive to the patched leaks over the recent
history of our project:

- `asv continuous -e -b "peakmem.*" b6c8599 main`

```
       before           after         ratio
     [b6c8599]       [3fb2c36]
     <treddy_memleak_cffi_only_pt2~1>       <main>
-            239M              76M     0.32  cffi_backend.RetrieveLogData.peakmem_log_get_record('LUSTRE')
-            241M              76M     0.32  cffi_backend.RetrieveLogData.peakmem_log_get_record('H5F')
-            257M            75.8M     0.30  cffi_backend.RetrieveLogData.peakmem_log_get_record('STDIO')
-            289M              76M     0.26  cffi_backend.RetrieveLogData.peakmem_log_get_record('MPI-IO')
-            303M              76M     0.25  cffi_backend.RetrieveLogData.peakmem_log_get_record('POSIX')
-            304M            75.4M     0.25  cffi_backend.RetrieveLogData.peakmem_log_get_record('DXT_POSIX')
-            304M            75.3M     0.25  cffi_backend.RetrieveLogData.peakmem_log_get_record('DXT_MPIIO')
-            324M            75.8M     0.23  cffi_backend.RetrieveLogData.peakmem_log_get_record('H5D')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('DXT_POSIX')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('H5F')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('H5D')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('STDIO')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('MPI-IO')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('DXT_MPIIO')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('LUSTRE')
-           23.7G            75.4M     0.00  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('POSIX')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
```
* more reasonable memory footprint for
`peakmem_log_get_mounts()`

* add regression guard for darshan-hpcgh-824
@tylerjereddy tylerjereddy force-pushed the treddy_peakmem_bench_leaks branch from 4c0f192 to efd6123 Compare October 19, 2022 15:28
@tylerjereddy
Copy link
Collaborator Author

With the latest revisions pushed in, I think I've dealt with my own concerns at least, and we get the extra regression case nicely covered + stay within ~laptop memory footprint now I think:

       before           after         ratio
     [b6c8599e]       [fb6bb275]
     <treddy_memleak_cffi_only_pt2~1>       <main>    
-            240M            76.2M     0.32  cffi_backend.RetrieveLogData.peakmem_log_get_record('LUSTRE')
-            241M            76.3M     0.32  cffi_backend.RetrieveLogData.peakmem_log_get_record('H5F')
-            257M            76.2M     0.30  cffi_backend.RetrieveLogData.peakmem_log_get_record('STDIO')
-            289M            76.1M     0.26  cffi_backend.RetrieveLogData.peakmem_log_get_record('MPI-IO')
-            304M            76.2M     0.25  cffi_backend.RetrieveLogData.peakmem_log_get_record('POSIX')
-            304M            75.5M     0.25  cffi_backend.RetrieveLogData.peakmem_log_get_record('DXT_MPIIO')
-            304M            75.3M     0.25  cffi_backend.RetrieveLogData.peakmem_log_get_record('DXT_POSIX')
-            324M            76.2M     0.24  cffi_backend.RetrieveLogData.peakmem_log_get_record('H5D')
-           2.57G             102M     0.04  cffi_backend.RetrieveLogData.peakmem_log_get_name_recs('DXT_MPIIO')
-           2.57G             102M     0.04  cffi_backend.RetrieveLogData.peakmem_log_get_name_recs('DXT_POSIX')
-           2.57G             102M     0.04  cffi_backend.RetrieveLogData.peakmem_log_get_name_recs('H5D')
-           2.57G             102M     0.04  cffi_backend.RetrieveLogData.peakmem_log_get_name_recs('H5F')
-           2.57G             102M     0.04  cffi_backend.RetrieveLogData.peakmem_log_get_name_recs('POSIX')
-           2.57G             102M     0.04  cffi_backend.RetrieveLogData.peakmem_log_get_name_recs('STDIO')
-           2.57G             102M     0.04  cffi_backend.RetrieveLogData.peakmem_log_get_name_recs('MPI-IO')
-           2.57G             102M     0.04  cffi_backend.RetrieveLogData.peakmem_log_get_name_recs('LUSTRE')
-           2.44G            75.6M     0.03  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('DXT_POSIX')
-           2.44G            75.6M     0.03  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('MPI-IO')
-           2.44G            75.6M     0.03  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('H5F')
-           2.44G            75.6M     0.03  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('LUSTRE')
-           2.44G            75.6M     0.03  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('DXT_MPIIO')
-           2.44G            75.6M     0.03  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('STDIO')
-           2.44G            75.6M     0.03  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('H5D')
-           2.44G            75.5M     0.03  cffi_backend.RetrieveLogData.peakmem_log_get_mounts('POSIX')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@tylerjereddy
Copy link
Collaborator Author

We may want to open an issue to remind us to add a log with runtime HEATMAP available for this benchmarking in the future, if/when it becomes easier to consume the logs repo into the (memory) benchmark suite.

@shanedsnyder
Copy link
Contributor

Looks great, thanks @tylerjereddy.

Should we add a test case for the darshan_log_get_modules(), too? I think that's the only one not accounted for of the 4 originally reported leaks.

One other thing worth noting is that because the Python build system doesn't drive the building of the C code, the benchmarks are stuck running on a single version of the compiled C code. I think I've brought this up before... but that's a bigger problem for another discussion I think.

So, I guess that means for the results you shared that we are only considering leaks that can be resolved at the Python level, and that we don't have an easy way to do before/after with C library changes? I agree that's not ideal, as these leaks were addressed with both C and Python level changes. Feel free to defer that to another issue, particularly if you think there's some mechanisms available to help with that.

@tylerjereddy
Copy link
Collaborator Author

Should we add a test case for the darshan_log_get_modules()

I think I got that covered, no? log_get_record() calls log_get_generic_record which calls log_get_modules which calls libdutil.darshan_log_get_modules

Feel free to defer that to another issue, particularly if you think there's some mechanisms available to help with that.

The asv community is collecting feature requests--we could add one at airspeed-velocity/asv#1219. The current assumption is that you are dealing with a Python project, so you'd have something like pydarshan as a separate project that i.e., vendors/submodules the darshan-util C code.

Assuming that would be super annoying to do, we could have a pip-friendly build system that builds darshan-util and links it in so that pydarshan becomes more Python ecosystem friendly. But I've been hesitant to do that so far, since we'd have a second build system that is also nested and this feels like it would get confusing quickly.

@shanedsnyder
Copy link
Contributor

Should we add a test case for the darshan_log_get_modules()

I think I got that covered, no? log_get_record() calls log_get_generic_record which calls log_get_modules which calls libdutil.darshan_log_get_modules

Oh yeah, that's right. Forgot that call was already embedded in the path for other things, so should be covered.

Feel free to defer that to another issue, particularly if you think there's some mechanisms available to help with that.

The asv community is collecting feature requests--we could add one at airspeed-velocity/asv#1219. The current assumption is that you are dealing with a Python project, so you'd have something like pydarshan as a separate project that i.e., vendors/submodules the darshan-util C code.

Assuming that would be super annoying to do, we could have a pip-friendly build system that builds darshan-util and links it in so that pydarshan becomes more Python ecosystem friendly. But I've been hesitant to do that so far, since we'd have a second build system that is also nested and this feels like it would get confusing quickly.

I see, yeah, that does sound tricky (at least to me). I don't think it's particularly high priority, either, so I'm fine with just punting for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST, BENCH: memory profiling/testing
2 participants