PyDarshan job summary cleanup #910

shanedsnyder · 2023-02-28T16:05:42Z

From an in-person review of the current Darshan job summary reports at Argonne, it would be nice to try to address as much of this as we can before doing a release.

First some high-level things up for discussion:

drop blue I/O performance estimate text box, and instead add a per-module table detailing key details of the module (including total files/datasets(H5D)/variables(PNETCDF_VAR) accessed, total bytes read/written, and I/O performance estimate)
- this way, each module starts off with a high-level summary table that describes key details of I/O, and gets rid of awkward text object we are using now
disable DXT heatmap plot generation by default, print warning if DXT is available but HEATMAP isn't, and add CLI option to force DXT usage
- for some logs, it's probably just too expensive to generate DXT heatmaps by default, so this protects against that
- this would require us to either push a little on CLI stuff in WIP, ENH: rich-click CLI #809 or use something simple temporarily?
after dust settles, potentially reorganize plots/tables in per-module sections?
- nothing concrete to suggest just yet (Tyler -0.5 on this)

Smaller things:

tylerjereddy · 2023-02-28T16:56:36Z

There is some duplication with i.e., gh-804 here as well.

shanedsnyder · 2023-02-28T20:19:58Z

I embarrassingly can't figure this one out:

Erroneous "type" row in File Count Summary table

Seems like it ought to be a one-liner to move the "type" index up to a column like other values, but I can't quite seem to pull it off.

Most of the other issues I've already addressed in #907

tylerjereddy · 2023-02-28T20:53:01Z

It wasn't an error--it was an intentional index label. You can remove it, but then you'll need to shim the test suite as well (which is pretty good to do even for aesthetic changes anyway). This is pretty straightforward, but it wasn't a bug (it was intentional) so I'd prefer not using this approach:

--- a/darshan-util/pydarshan/darshan/lib/accum.py
+++ b/darshan-util/pydarshan/darshan/lib/accum.py
@@ -71,7 +71,6 @@ def log_file_count_summary_table(derived_metrics,
                              "read/write files":3}
     df = pd.DataFrame.from_dict(darshan_file_category, orient="index")
     df.rename(columns={0:"index"}, inplace=True)
-    df.index.rename('type', inplace=True)
     df["number of files"] = np.zeros(4, dtype=int)
     df["avg. size"] = np.zeros(4, dtype=str)
     df["max size"] = np.zeros(4, dtype=str)
diff --git a/darshan-util/pydarshan/darshan/tests/test_lib_accum.py b/darshan-util/pydarshan/darshan/tests/test_lib_accum.py
index 080264d1..3d23fb76 100644
--- a/darshan-util/pydarshan/darshan/tests/test_lib_accum.py
+++ b/darshan-util/pydarshan/darshan/tests/test_lib_accum.py
@@ -203,7 +203,6 @@ def test_file_count_summary_table(log_name,
                          "read-only files",
                          "write-only files",
                          "read/write files"]
-    expected_df.index.rename('type', inplace=True)
 
     log_path = get_log_path(log_name)
     with darshan.DarshanReport(log_path, read_all=True) as report:

I'd prefer if you didn't mutate the data structures for the sole purpose of removing a row in the report, and instead do something like this (a small test may be warranted) so that end users get what they want from to_html but no need to inconvenience developers who like index labels:

--- a/darshan-util/pydarshan/darshan/lib/accum.py
+++ b/darshan-util/pydarshan/darshan/lib/accum.py
@@ -100,5 +100,6 @@ def log_file_count_summary_table(derived_metrics,
     df.drop(columns="index", inplace=True)
     ret = plot_common_access_table.DarshanReportTable(df,
                                                       col_space=200,
-                                                      justify="center")
+                                                      justify="center",
+                                                      index_names=False)
     return ret

shanedsnyder · 2023-05-31T20:58:40Z

Last bits of this closed via #927.

shanedsnyder closed this as completed Feb 28, 2023

shanedsnyder reopened this Feb 28, 2023

shanedsnyder mentioned this issue Feb 28, 2023

PyDarshan job summary cleanup fixes #911

Merged

shanedsnyder mentioned this issue Mar 2, 2023

BUG: PNETCDF can have negative opcount frequencies #904

Open

shanedsnyder mentioned this issue Apr 4, 2023

ENH: human readable values in "data access by filesystem" figure #607

Closed

tylerjereddy mentioned this issue May 15, 2023

ENH: make DXT-based heatmap generation opt-in for summary tool #932

Merged

shanedsnyder added this to the pydarshan-job-summary-v2 milestone May 31, 2023

shanedsnyder closed this as completed May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyDarshan job summary cleanup #910

PyDarshan job summary cleanup #910

shanedsnyder commented Feb 28, 2023 •

edited

Loading

tylerjereddy commented Feb 28, 2023

shanedsnyder commented Feb 28, 2023

tylerjereddy commented Feb 28, 2023

shanedsnyder commented May 31, 2023

PyDarshan job summary cleanup #910

PyDarshan job summary cleanup #910

Comments

shanedsnyder commented Feb 28, 2023 • edited Loading

tylerjereddy commented Feb 28, 2023

shanedsnyder commented Feb 28, 2023

tylerjereddy commented Feb 28, 2023

shanedsnyder commented May 31, 2023

shanedsnyder commented Feb 28, 2023 •

edited

Loading