-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heatmap time skew problem #955
Comments
Thanks @Nafi3 ! Confirmed that I can reproduce as well. I'd like to look at the raw data with |
Ok, the problem with the heatmap data is this: the heatmap skew parsing workaround that we added in 3.4.4 intentionally only normalizes the number of bins in each heatmap record if they are off by exactly 1. This was intentional, I had hoped that the Darshan shutdown would be synchronized at least enough that there would be minimal skew. The first log in the zip file has some ranks with 8 more heatmap bins than some of the other ranks. Some ranks have up to 106 bins while others only have 98. The bin width is .1 seconds in this log, meaning that some ranks shut down the heatmap module .8 seconds later than others. I'm going to look at the code a little more and think about this; I'm not sure if it is a good idea to try to normalize these logs if the skew is is arbitrarily big. (for reference for anyone following this issue; the root cause of the skew when logs are generated has already been fixed in #942 and released in Darshan 3.4.2; the issue here is if we can repair data in logs that triggered this issue previously) |
After some offline discussion, we've decided its best not to try to normalize the logs that are this skewed. The separate issue with the Lustre module data is being tracked in #956 . |
Hi! Following Issue #941 and fix #945 I have encountered some logs that utils is still unable to parse.
The text was updated successfully, but these errors were encountered: