Skip to content

Commit

Permalink
CLI: Add verdi process dump and the ProcessDumper (#6276)
Browse files Browse the repository at this point in the history
This commit adds functionality to write all files involved in the
execution of a workflow to disk. This is achieved via the new
`ProcessDumper` class, which exposes the top-level `dump` method, while
`verdi process dump` provides a wrapper for access via the CLI.

Instantiating the `ProcessDumper` class is used to set the available
options for the dumping. These are the `-o/--overwrite` option, the
`--io-dump-paths` option which can be used to provide custom
subdirectories for the folders created for each `CalculationNode`
(the dumped data being the `CalculationNode` repository, its `retrieved`
outputs, as well as the linked node inputs and outputs), the `-f/--flat`
option that disables the creation of these subdirectories, thus creating
all files in a flat hierarchy (for each step of the workflow), and the
`--include-inputs/--exclude-inputs` (`--include-outputs/--exclude-outputs`)
options to enable/disable the dumping of linked inputs (outputs) for each
`CalculationNode`. In addition, a `README` is created in the parent
dumping directory, as well as `.aiida_node_metadata.yaml` files with the
`Node`, `User`, and `Computer` information in the subdirectories created
for each `ProcessNode`. 

Nested workchains with considerable file I/O were needed for meaningful
testing of this feature, so it was required to extend the
`generate_calculation_node` fixture of `conftest.py`.  Moreover, the
`generate_calculation_node_add` and `generate_workchain_multiply_add`
fixtures that actually run the `ArithmeticAddCalculation` and
`MultiplyAddWorkchain` were also added. These could in the future
possibly be used to reduce code duplication where the objects are being
constructed in other parts of the test suite (benchmarking of manually
constructing the `ProcessNode`s vs. running the `Process` will still
have to be conducted). Lastly, the `generate_calculation_node_io` and
`generate_workchain_node_io` were added in `test_processes.py`, which
actually create the `CalculationNode`s and `WorkflowNode`s that are used
for the tests of the dumping functionality.

Co-Authored-By: Junfeng Qiao <[email protected]>
  • Loading branch information
qiaojunfeng authored May 27, 2024
1 parent fc2a84d commit c1cc2b0
Show file tree
Hide file tree
Showing 11 changed files with 1,222 additions and 7 deletions.
64 changes: 64 additions & 0 deletions docs/source/howto/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,70 @@ Ways to find and retrieve data that have previously been imported are described
If none of the currently available data types, as listed by ``verdi plugin list``, seem to fit your needs, you can also create your own custom type.
For details refer to the next section :ref:`"How to add support for custom data types"<topics:data_types:plugin>`.

.. _how-to:data:dump:

Dumping data to disk
--------------------

.. versionadded:: 2.6

It is now possible to dump your executed workflows to disk in a hierarchical directory tree structure. This can be
particularly useful if one is not yet familiar with the ``QueryBuilder`` or wants to quickly explore input/output files
using existing shell scripts or common terminal utilities, such as ``grep``. The dumping can be achieved with the command:

.. code-block:: shell
verdi process dump <pk>
For our beloved ``MultiplyAddWorkChain``, we obtain the following:

.. code-block:: shell
$ verdi process dump <pk> -p dump-multiply_add
Success: Raw files for WorkChainNode <pk> dumped into folder `dump-multiply_add`.
.. code-block:: shell
$ tree -a dump-multiply_add
dump-multiply_add
├── README.md
├── .aiida_node_metadata.yaml
├── 01-multiply
│ ├── .aiida_node_metadata.yaml
│ └── inputs
│ └── source_file
└── 02-ArithmeticAddCalculation
├── .aiida_node_metadata.yaml
├── inputs
│ ├── .aiida
│ │ ├── calcinfo.json
│ │ └── job_tmpl.json
│ ├── _aiidasubmit.sh
│ └── aiida.in
└── outputs
├── _scheduler-stderr.txt
├── _scheduler-stdout.txt
└── aiida.out
The ``README.md`` file provides a description of the directory structure, as well as useful information about the
top-level process. Further, numbered subdirectories are created for each step of the workflow, resulting in the
``01-multiply`` and ``02-ArithmeticAddCalculation`` folders. The raw calculation input and output files ``aiida.in`` and
``aiida.out`` of the ``ArithmeticAddCalculation`` are placed in ``inputs`` and ``outputs``. In addition, these also
contain the submission script ``_aiidasubmit.sh``, as well as the scheduler stdout and stderr, ``_scheduler-stdout.txt``
and ``_scheduler-stderr.txt``, respectively. Lastly, the source code of the ``multiply`` ``calcfunction`` presenting the
first step of the workflow is contained in the ``source_file``.

Upon having a closer look at the directory, we also find the hidden ``.aiida_node_metadata.yaml`` files, which are
created for every ``ProcessNode`` and contain additional information about the ``Node``, the ``User``, and the
``Computer``, as well as the ``.aiida`` subdirectory with machine-readable AiiDA-internal data in JSON format.

Since child processes are explored recursively, arbitrarily complex, nested workflows can be dumped. As already seen
above, the ``-p`` flag allows to specify a custom dumping path. If none is provided, it is automatically generated from
the ``process_label`` (or ``process_type``) and the ``pk``. In addition, the command provides the ``-o`` flag to
overwrite existing directories, the ``-f`` flag to dump all files for each ``CalculationNode`` of the workflow in a flat
directory structure, and the ``--include-inputs/--exclude-inputs`` (``--include-outputs/--exclude-outputs``) flags to
also dump additional node inputs (outputs) of each ``CalculationNode`` of the workflow into ``node_inputs``
(``node_outputs``) subdirectories. For a full list of available options, call :code:`verdi process dump --help`.

.. _how-to:data:import:provenance:

Expand Down
1 change: 1 addition & 0 deletions docs/source/reference/command_line.rst
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,7 @@ Below is a list with all available subcommands.
Commands:
call-root Show root process of the call stack for the given processes.
dump Dump process input and output files to disk.
kill Kill running processes.
list Show a list of running or terminated processes.
pause Pause running processes.
Expand Down
84 changes: 84 additions & 0 deletions src/aiida/cmdline/commands/cmd_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -481,3 +481,87 @@ def process_repair(manager, broker, dry_run):
if pid not in set_process_tasks:
process_controller.continue_process(pid)
echo.echo_report(f'Revived process `{pid}`')


@verdi_process.command('dump')
@arguments.PROCESS()
@options.PATH()
@options.OVERWRITE()
@click.option(
'--include-inputs/--exclude-inputs',
default=True,
show_default=True,
help='Include the linked input nodes of the `CalculationNode`(s).',
)
@click.option(
'--include-outputs/--exclude-outputs',
default=False,
show_default=True,
help='Include the linked output nodes of the `CalculationNode`(s).',
)
@click.option(
'--include-attributes/--exclude-attributes',
default=True,
show_default=True,
help='Include attributes in the `.aiida_node_metadata.yaml` written for every `ProcessNode`.',
)
@click.option(
'--include-extras/--exclude-extras',
default=True,
show_default=True,
help='Include extras in the `.aiida_node_metadata.yaml` written for every `ProcessNode`.',
)
@click.option(
'-f',
'--flat',
is_flag=True,
default=False,
help='Dump files in a flat directory for every step of the workflow.',
)
def process_dump(
process,
path,
overwrite,
include_inputs,
include_outputs,
include_attributes,
include_extras,
flat,
) -> None:
"""Dump process input and output files to disk.
Child calculations/workflows (also called `CalcJob`s/`CalcFunction`s and `WorkChain`s/`WorkFunction`s in AiiDA
jargon) run by the parent workflow are contained in the directory tree as sub-folders and are sorted by their
creation time. The directory tree thus mirrors the logical execution of the workflow, which can also be queried by
running `verdi process status <pk>` on the command line.
By default, input and output files of each calculation can be found in the corresponding "inputs" and
"outputs" directories (the former also contains the hidden ".aiida" folder with machine-readable job execution
settings). Additional input and output files (depending on the type of calculation) are placed in the "node_inputs"
and "node_outputs", respectively.
Lastly, every folder also contains a hidden, human-readable `.aiida_node_metadata.yaml` file with the relevant AiiDA
node data for further inspection.
"""

from aiida.tools.dumping.processes import ProcessDumper

process_dumper = ProcessDumper(
include_inputs=include_inputs,
include_outputs=include_outputs,
include_attributes=include_attributes,
include_extras=include_extras,
overwrite=overwrite,
flat=flat,
)

try:
dump_path = process_dumper.dump(process_node=process, output_path=path)
except FileExistsError:
echo.echo_critical(
'Dumping directory exists and overwrite is False. Set overwrite to True, or delete directory manually.'
)
except Exception as e:
echo.echo_critical(f'Unexpected error while dumping {process.__class__.__name__} <{process.pk}>:\n ({e!s}).')

echo.echo_success(f'Raw files for {process.__class__.__name__} <{process.pk}> dumped into folder `{dump_path}`.')
21 changes: 21 additions & 0 deletions src/aiida/cmdline/params/options/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
###########################################################################
"""Module with pre-defined reusable commandline options that can be used as `click` decorators."""

import pathlib

import click

from aiida.brokers.rabbitmq.defaults import BROKER_DEFAULTS
Expand Down Expand Up @@ -77,6 +79,8 @@
'OLDER_THAN',
'ORDER_BY',
'ORDER_DIRECTION',
'OVERWRITE',
'PATH',
'PAST_DAYS',
'PAUSED',
'PORT',
Expand Down Expand Up @@ -743,3 +747,20 @@ def set_log_level(_ctx, _param, value):
is_flag=True,
help='Print the full traceback in case an exception is raised.',
)

PATH = OverridableOption(
'-p',
'--path',
type=click.Path(path_type=pathlib.Path),
show_default=False,
help='Base path for operations that write to disk.',
)

OVERWRITE = OverridableOption(
'--overwrite',
'-o',
is_flag=True,
default=False,
show_default=True,
help='Overwrite file/directory if writing to disk.',
)
4 changes: 2 additions & 2 deletions src/aiida/engine/daemon/execmanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

from aiida.common import AIIDA_LOGGER, exceptions
from aiida.common.datastructures import CalcInfo, FileCopyOperation
from aiida.common.folders import SandboxFolder
from aiida.common.folders import Folder, SandboxFolder
from aiida.common.links import LinkType
from aiida.engine.processes.exit_code import ExitCode
from aiida.manage.configuration import get_config_option
Expand Down Expand Up @@ -66,7 +66,7 @@ def upload_calculation(
node: CalcJobNode,
transport: Transport,
calc_info: CalcInfo,
folder: SandboxFolder,
folder: Folder,
inputs: Optional[MappingType[str, Any]] = None,
dry_run: bool = False,
) -> RemoteData | None:
Expand Down
1 change: 1 addition & 0 deletions src/aiida/tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

from .calculations import *
from .data import *
from .dumping import *
from .graph import *
from .groups import *
from .visualization import *
Expand Down
11 changes: 11 additions & 0 deletions src/aiida/tools/dumping/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
###########################################################################
# Copyright (c), The AiiDA team. All rights reserved. #
# This file is part of the AiiDA code. #
# #
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core #
# For further information on the license, see the LICENSE.txt file #
# For further information please visit http://www.aiida.net #
###########################################################################
"""Modules related to the dumping of AiiDA data."""

__all__ = ('processes',)
Loading

0 comments on commit c1cc2b0

Please sign in to comment.