Payu runs input checksums on every run when submitting with -n N #526

Whyborn · 2024-10-02T04:52:43Z

Payu re-submissions in a -n N run job trigger re-generating the input manifest. For small jobs, this becomes a significant portion of run time (maybe this is only relevant for staged_cable jobs?). I don't think there's any reason to recompute the input manifest for subsequent runs.

The text was updated successfully, but these errors were encountered:

aidanheerdegen · 2024-10-02T07:02:50Z

Payu re-submissions in a -n N run job trigger re-generating the input manifest.

payu only checks the binhash hasn't changed. This should be a fast check. How many input files are there?

For small jobs, this becomes a significant portion of run time (maybe this is only relevant for staged_cable jobs?). I don't think there's any reason to recompute the input manifest for subsequent runs.

The point of the manifests is to record everything that goes into a run. Are you adding files to the manifest that aren't actually used? Typically directories were specified in the input section in config.yaml because it was easy and compact, but it's also kinda lazy and not specific. Consequently we've moved to explicitly specifying each input file

https://github.com/ACCESS-NRI/access-om2-configs/blob/release-025deg_jra55_ryf/config.yaml#L40-L52

This has the benefit of being much more specific about what the model needs to run, also any changes to specific input files are more "atomic" and are reflected directly in the config.yaml. Also it means we're calculating hashes only for the files that are used in the simulation.

There are exceptions though, e.g. JRA-55 RYF forcing data has a heap of files, so we use a directory

https://github.com/ACCESS-NRI/access-om2-configs/blob/release-025deg_jra55_ryf/config.yaml#L33

and even more for the IAF version

https://github.com/ACCESS-NRI/access-om2-configs/blob/release-025deg_jra55_iaf/config.yaml#L33-L43

Whyborn · 2024-10-08T03:16:07Z

At least one of the configurations we want to support has ~1000 input files (Met forcing files which are for some reason split into single year chunks). It might well be that the original dataset is not split like this, but the user who pulled it originally did it to make it easier to write the I/O handler.

I like moving to explicitly specifying the input files (does it support glob strings)?

aidanheerdegen · 2024-10-08T05:09:08Z

At least one of the configurations we want to support has ~1000 input files (Met forcing files which are for some reason split into single year chunks). It might well be that the original dataset is not split like this, but the user who pulled it originally did it to make it easier to write the I/O handler.

You have the option of using more CPUs. It is an embarassingly parallel problem, so will scale with nCPUs.

We could create a version of binhash that reads in less of the header:

https://github.com/ACCESS-NRI/yamanifest/blob/7f9aaaddc2d31ebe1cd1b9d92ab2df349e35ba82/yamanifest/hashing.py#L86-L87

This would also need some manual testing beforehand to check if it is worth the bother, and I doubt it would make much difference (I think file ops like opening and closing have a big overhead).

I like moving to explicitly specifying the input files (does it support glob strings)?

Not currently.

Originally it was just directories, but this logic branch was added to support adding specific filepaths

https://github.com/payu-org/payu/blob/master/payu/models/model.py#L277-L285

(Note that it is slightly weird, building a mock iterator so that it can reuse the main code loop below)

I don't think it would be difficult to invert the logic, test for a directory and otherwise assume a glob and populate a list of files rather than a single file.

If you think that is useful functionality probably best to create a specific issue for it and link back to this one.

In the mean time you could emulate this functionality with symbolic links: create some directories that group your inputs in some way and make symbolic links in the sub-dirs. That way you can select out just th inputs you need.

Whyborn added the feature label Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Payu runs input checksums on every run when submitting with -n N #526

Payu runs input checksums on every run when submitting with -n N #526

Whyborn commented Oct 2, 2024

aidanheerdegen commented Oct 2, 2024 •

edited

Loading

Whyborn commented Oct 8, 2024

aidanheerdegen commented Oct 8, 2024

Payu runs input checksums on every run when submitting with -n N #526

Payu runs input checksums on every run when submitting with -n N #526

Comments

Whyborn commented Oct 2, 2024

aidanheerdegen commented Oct 2, 2024 • edited Loading

Whyborn commented Oct 8, 2024

aidanheerdegen commented Oct 8, 2024

aidanheerdegen commented Oct 2, 2024 •

edited

Loading