Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HPSS support to fs copy action and related util tasks #678

Open
5 tasks
maddenp-noaa opened this issue Dec 12, 2024 · 0 comments
Open
5 tasks

Add HPSS support to fs copy action and related util tasks #678

maddenp-noaa opened this issue Dec 12, 2024 · 0 comments
Assignees

Comments

@maddenp-noaa
Copy link
Collaborator

Description

Support hsi:// and htar:// schemes in the fs copy mode/action, and in in uwtools.utils.tasks.

Semantics are as follows:

  • /dst/b.dat: hsi://src/file.dat: Call hsi to fetch ./src/a.dat the fileto/dst/with nameb.dat`.
  • /dst/<files>: !glob hsi://src/file.*: Call hsi to fetch all files from ./src/ that match glob pattern file.*, placing them with their original names in /dst/. The !regex tag can be used instead of !glob.
  • /dst/b.dat: htar://src/archive.tar?a.dat: Call htar to extract file a.dat from archive ./src/archive.tar and save as /dst/b.dat.
  • /dst/<files>: htar://src/archive.tar?a.dat&b.dat: Call htar to extract files a.dat and b.dat from ./src/archive.tar and save with their original names in /dst/.
  • /dst/<files>: !glob htar://src/archive.tar?*.dat: Call htar to extract all .dat files from ./src/archive.tar, saving them with their original names in /dst/.
  • /dst/: !glob htar://src/.tar?.dat: Call htarto extract all.datfiles from all.tarfiles in./src/, saving all files with their original names in /dst/`.

The !regex tag could be used in place of !glob for finer control.

Use uwtools.utils.tasks.executable() to ensure that hsi or htar, as appropriate are available. It's the caller's responsibility to make sure that these utilities are on their PATH.

Consider whether to extend filecopy() and existing() in uwtools.utils.tasks to support hsi:// and htar:// URLs. For simple cases, these might be handy but, for wildcard support, it may be unacceptably inefficient to repeatedly call hsi ls or htar -tf to establish existence of source files, when a single call could be made and the results reused for a set of source files. Similarly, where glob patterns are used, a single hsi cp command with a wildcard, or an htar command with a wildcard (and possibly a -T option to specify multiple threads (Hera provides an htar alias specifying -T 4, so that might be a useful value) could be more efficient. In particular, if files are on tape and need to be staged to disk, a wildcard command might stage all files at once and so be much faster than a series of individual requests, where each might have to wait for one file to be staged and transferred, then another, then another, etc.

AC

  • Functionality is implemented.
  • Appropriate (possible debug-level) messages are logged.
  • Unit tests are added.
  • CLI documentation is updated with examples.
  • Jupypter fs notebook is updated.
@maddenp-noaa maddenp-noaa self-assigned this Dec 12, 2024
@maddenp-noaa maddenp-noaa converted this from a draft issue Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant