Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: memleak part 3 (CFFI only) #819

Merged

Conversation

tylerjereddy
Copy link
Collaborator

  • this should be reviewed after BUG: memleak fix CFFI only pt1 #816, and fixes
    the residual memory leak there so that we get
    a flat memory profile with repeated calls
    to log_get_generic_record(); after rebasing, this PR
    should simplify to a few CFFI backend changes to free
    buf[0] in a few places and make explicit copies for NumPy
    where it is pointing to the dropped memory buffer

  • note that RUNTIME heatmap doesn't seem to like
    the extra free(), it causes some testing issues,
    so I've excluded that here--hopefully that module
    just doesn't have the same problem..

Gradual mem spike -> flat profile:

image

@tylerjereddy tylerjereddy added bug Something isn't working pydarshan labels Sep 22, 2022
@shanedsnyder
Copy link
Contributor

note that RUNTIME heatmap doesn't seem to like
the extra free(), it causes some testing issues,
so I've excluded that here--hopefully that module
just doesn't have the same problem..

I'll see if I can figure out what's going on here, that seems like it could be a problem

* this should be reviewed after darshan-hpcgh-816, and fixes
the residual memory leak there so that we get
a flat memory profile with repeated calls
to `log_get_generic_record()`

* note that `RUNTIME` heatmap doesn't seem to like
the extra `free()`, it causes some testing issues,
so I've excluded that here--hopefully that module
just doesn't have the same problem..
* drop some spurious `free()` usages for cases where
`darshan_log_get_record()` hasn't `malloc`'d anything
@tylerjereddy tylerjereddy force-pushed the treddy_issue_811_part3_cffi_only branch from af603fb to 3f0647d Compare September 23, 2022 17:47
@tylerjereddy
Copy link
Collaborator Author

Ok, I think I addressed the comments so far. I also rebased to vastly simplify the displayed diff.

* `free()` memory that is allocated by darshan-util library for
  heatmap records
* use `np.copy()` to make a copy of heatmap read/write arrays
  to avoid them from being freed out from under heatmap records
@shanedsnyder
Copy link
Contributor

note that RUNTIME heatmap doesn't seem to like
the extra free(), it causes some testing issues,
so I've excluded that here--hopefully that module
just doesn't have the same problem..

I'll see if I can figure out what's going on here, that seems like it could be a problem

I think I fixed the memory leak and the test errors via b21d644, which looked like they were ultimately caused by incorrect data being stored in heatmap records when we add in the necessary free() statement to avoid mem leaks. Looks like we just needed a np.copy() as in other places (e.g., DXT segment extraction) so record data isn't freed out from under heatmap records.

@tylerjereddy if it looks good to you, I think we can go ahead and merge. I'm not sure if we had a good reproducer for the heatmap module-specific memory leak, but might be good to confirm this is working as expected in terms of leaking if we do.

@tylerjereddy
Copy link
Collaborator Author

tylerjereddy commented Sep 27, 2022

@shanedsnyder Probably not HEATMAP reproducer yet, but maybe we can include that separately as part of #817. I'd probably just lift the reproducers from gh-811, and supplement for any modules not covered there, if needed. I guess that means putting stuff in the asv suite, which takes a bit more testing.

Once this is merged (it looks sensible to me as well) maybe we should ping in gh-811 for user confirmation of the fix if they are comfortable recompiling from main.

@shanedsnyder shanedsnyder merged commit 3fb2c36 into darshan-hpc:main Sep 27, 2022
@tylerjereddy tylerjereddy deleted the treddy_issue_811_part3_cffi_only branch September 27, 2022 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pydarshan
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants