Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Refactoring to include outputs parsing #80

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

gbrunin
Copy link

@gbrunin gbrunin commented Sep 9, 2024

I have changed the structure of the code to better reflect the distinction between input and output parsers. Concerning the inputs, all the tests that were already there still pass, it's only a matter of adapting the namespace. For the outputs, I created objects such as PwOutput, that inherit from an abstract BaseOutput, that for now can be instantiated with a from_dir method (a from_files method could/should be added as well). A user with a job that ran in a given directory could get the outputs easily using this classmethod. In these from_dir methods, specific XML and/or standard output Parsers would be used to get the results. Each Parser would parse a single file, and the logic of parsing and extracting the outputs from the different codes would have to be implemented in each from_dir method. For instance, a NebOutput.from_dir would parse the standard output of the global computation and probably the standard outputs and/or XML files for each image. The extracted outputs would be stored as a simple dictionary and these objects would not rely on any external package.
Then, in qe_tools.extractors, ASE and pymatgen objects could be constructed (e.g., ase.Atoms/pymatgen.Structure, band structures,...), allowing each to be optional dependencies.

This is only the base logic of the new structure and many things remain to be implemented. The idea now is to see what breaks with this in aiida-quantumespresso and how the parsing could be moved from there to here. Then, more will be added depending on the needs.

@mbercx mbercx force-pushed the develop branch 3 times, most recently from 6d45efa to 35a667f Compare November 21, 2024 12:41
mbercx and others added 12 commits November 21, 2024 14:09
…s. Made many functions as public since they could be useful to users and outside of their context.
…onversions should take place. In all other files, ase and pymatgen should not be used.
…ibility regarding the number of output files for each type of computation, e.g. NEB.
Several small changes are made to the `PwOutput` class:

1. Use `pathlib` instead of `os.path` for path definitions.
2. `.from_dir` method: Only look for files from the specified directory using list
   comprehension.
3. Adapt approach for identifying the `stdout` file to only use the first 5 lines to
   look for the `Program PWSCF` string.
4. Remove the try-except loop that simply catches a general exception and raises a
   `ValueError` instead. This would obscure the actual error without an immediate
   benefit.
5. Define the `d_out_xxx` variables as empty dictionaries before parsing, and redefine
   them in case the data is parsed successfully.
6. Allow the user to pass a `Path` instead of a string to specify the directory to
   parse from.
Currently the `PwStdoutParser` nor its parent class `BaseOutputFileParser` parse any
of the Quantum ESPRESSO `stdout`. Here we add the `BaseStdoutParser` class that has
some very minimal parsing features for parsing the code name and version, as well as the
walltime of the calculation. A utility function is also added to convert the Quantum
ESPRESSO walltime output into seconds.

The constructor of the `BaseOutputFileParser` is also adapted to remove the default
to `None`. Since there is currently no way of providing the output content after
construction, it should be provided at that time for the class to be useful.
The first schema used for the XML output of Quantum ESPRESSO `pw.x` was still missing.
Here it is added, along with a set of tests for the XML parsing of default runs.
@mbercx
Copy link
Member

mbercx commented Nov 21, 2024

@gbrunin thanks again for the hard work in restructuring the package. I've added some basic stdout parsing, tests etc, and fixed up the package to run the CI properly on the currently active Python versions (3.9 - 3.13).

Now I'm adding some basic documentation, also on the design decisions. We should consider these carefully, particularly the UX, since it's hard to change these afterwards. I don't care too much about breaking backwards compatibility if we can converge on a more user-friendly tool.

@mbercx
Copy link
Member

mbercx commented Nov 21, 2024

Also quick shoutout to @elinscott and his package for QE inputs:

https://github.com/elinscott/qe_input_prototype

He's already indicated he'd be fine with us integrating that work into qe-tools, so we should consider it in our design discussions.

@mbercx mbercx mentioned this pull request Nov 21, 2024
@mbercx
Copy link
Member

mbercx commented Nov 21, 2024

@gbrunin I've also added some basic documentation and design notes. I'm still not quite sure about some of the APIs. I do like the general design now though, I'm just also wondering if:

  1. We should also add one class that can deal with both inputs and outputs. IIRC some of the output parsing requires, or can be significantly facilitated by having access to the inputs. Since we already use the "parser" nomenclature for the output parsers, I might call this e.g. PwCalc, and then it contains both the inputs and outputs.
  2. In time, we want a PwInput class, that allows for both generating an input file and parsing an existing one. I'm not sure if in this case we need to have the Parser class intermediary.
  3. We should already work a bit on separating the raw parsed data of the XML and actual useful outputs. The raw data is not really that user-friendly, clearly.

I've also made a little image to represent the design:

image

gbrunin and others added 2 commits November 22, 2024 11:01
…ec to utils. Renamed BaseStdoutParser.parse_stdout_base to parse for compatibility with parent abstract class.
This currently serves no real purpose, as the executable can be easily
identified from the class itself. In case we do find a use case later,
it can always be added again easily.
Comment on lines 26 to 30
'numpy',
'scipy',
'packaging',
'xmlschema',
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these necessary ? I think it would be best to restrict to the strict minimum. I would say that for parsing or generating input even numpy should not be needed (let alone scipy ?). I'd be in favor of removing them but I don't know where/why they are used. For the packaging dependency, it has itself no dependency so I think it's probably fine to keep it even though it's probably possible to get around without it.. For xmlschema, I would also say it seems fine as xmlschema has just one dependency itself, elementpath, which itself has no dependency. Any thoughts on numpy/scipy removal ?

Copy link

@davidwaroquiers davidwaroquiers Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked, scipy is used only to get the norm of a few vectors in one file. I think this is definitely something that should then probably be removed from the dependencies (also from the fact that if numpy is kept as a required dependency, the norm can also be obtained by numpy, but for numpy, my comment below holds too :)).
As for numpy, it's used in a few more places, but 1/ for very simple maths, vector operations and 2/ for some conversion functions. For the latter point, maybe the "lowest-level" qe tools could be installed without it (the "client" code, be it aiida, pymatgen or else) has to follow the exact API that the lowest level functions require. And numpy could be an optional dependency for "activating" helper functions if the user wants it. Any thoughts ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants