Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProjwfcParser: refactor code and temp retrieve PDOS files #749

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mbercx
Copy link
Member

@mbercx mbercx commented Oct 3, 2021

Massive refactoring of the ProjwfcParser to make the code less entangled
and more readable.

TODO:

@mbercx mbercx added this to the v4.0.0 milestone Oct 3, 2021
@mbercx mbercx added the pr/blocked PR is blocked by another PR that should be merged first label Oct 3, 2021
@mbercx mbercx force-pushed the fix/748/clean-projwfc-parser branch from 4040d49 to 581bd57 Compare October 3, 2021 16:28
@mbercx
Copy link
Member Author

mbercx commented Oct 3, 2021

Note: the Python 3.6 tests fail because numpy.ArrayLike requires a numpy version that no longer supports Python3.6. However, since the EOL of Python 3.6 is in a couple months and the PR is meant to go into v4.0.0, I think we can just leave it as is and drop Python 3.6 support before we merge it.

More importantly, the tests succeed for 581bd57 and Python >= 3.7, without touching the tests. I've also run both the old parser and refactored one for the 4 different spin settings, and compared the content of the orbitals and PDOS arrays. So I'm quite confident that the parsing is unaltered.

@mbercx mbercx added the pr/on-hold PR should not be merged label Oct 3, 2021
@mbercx
Copy link
Member Author

mbercx commented Oct 3, 2021

@sphuber this PR is both blocked by #747 and on hold because it contains backwards incompatible changes:

  • Drop support for pre-v6.5 QE versions.
  • Only temporarily retrieve the PDOS files to lighten the repository.

The `generate_calc_job_node` fixture has a `retrieve_temporary` input that allows the
user to specify a list of files in the fixture test folder that should be copied to
the temporary retrieved folder (as defined by the `_retrieve_temporary_list` class
attribute in the corresponding calculation job). Here we extend this functionality to
allow the user to specify glob patterns in the `retrieve_temporary` input.

Second, we also adapt the code to use the `pathlib` module for handling paths. While
doing so, we also revert a change in 0874d95 that
introduced a try-except to handle the case where the XML is not in the fixture test
folder. This is no longer necessary, because the `glob` call will only return files that
are actually present in the fixture test folder.
@mbercx mbercx force-pushed the fix/748/clean-projwfc-parser branch from 521e34b to d7d344a Compare January 24, 2025 07:39
@mbercx mbercx removed pr/on-hold PR should not be merged pr/blocked PR is blocked by another PR that should be merged first labels Jan 24, 2025
@mbercx
Copy link
Member Author

mbercx commented Jan 24, 2025

@Minotakm @t-reents @AndresOrtegaGuerrero I still have to add the Lowdin parsing, but other than that the PR is ready for review. A line-by-line code review might be quite a bit of work, I'm mainly asking you to test the new parser class and see if it still has any issues. E.g. I think you mentioned the parsing for some cases (spinorbit?) was incorrect.

@Minotakm
Copy link
Collaborator

Thanks @mbercx I will give it a spin and see how it goes.

The `ProjwfcParser` relied on passing a single `out_info_dict` dictionary between
several functions to store the parsed information. This made the parser code so
entangled that it was almost unreadable and difficult to trace down bugs.

Here we refactor the class to properly isolate the various parsing functions as class
methods, and make the code more readable in general. The parent class is also changed to
the `BaseParser` one, so it can rely on the basic stdout parsing this class provides.

Another substantial change is that the `.pdos` files, which can be many and are parsed
completely, are now only temporarily retrieved instead of stored in the AiiDA
repository in the `retrieved` folder.
@mbercx mbercx force-pushed the fix/748/clean-projwfc-parser branch from d7d344a to bec5f23 Compare January 24, 2025 07:58
Copy link

@t-reents t-reents left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbercx, I went through it and it looks nice! Definitely easier to understand what is going on, compared to the last version.

As you mentioned, there was still an issue, namely the proper parsing of the total DOS #1041. I added the fix as suggestions (let me know if I should open a PR to your branch instead) and quickly tested them. @AndresOrtegaGuerrero, please have a look as well.

src/aiida_quantumespresso/parsers/projwfc.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/parsers/projwfc.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/parsers/projwfc.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/parsers/projwfc.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/parsers/projwfc.py Outdated Show resolved Hide resolved
@mbercx
Copy link
Member Author

mbercx commented Jan 24, 2025

@t-reents thanks a lot for the review! Feel free to add a commit here with your changes. If possible it would be great if:

  • We add a commit that tests this issue, probably via a pytest data regression. This will adapt the YAML files corresponding to that regression test. The tests will still pass after the file is updated, but hopefully it's clear that the DOS parsing is incorrect.
  • We add a second commit that fixes this parsing problem, clearly showing the change in the data regression YAML files.

This PR will then have 4 commits to merge in the end. I think that is the most clear for future developers.

@mbercx
Copy link
Member Author

mbercx commented Jan 27, 2025

Thanks @mbercx I will give it a spin and see how it goes.

Haha, just noticed the beautiful pun here. Stellar work @Minotakm 😂

@AndresOrtegaGuerrero
Copy link
Collaborator

thank you @mbercx I will test it once the new commits are merged then :)

Currently, the `ProjwfcParser` does not correctly parse the total DOS data in the case
of spin-polarised calculations. Only the first column is parsed, which corresponds to
spin-up density only. This issue passed the tests unnoticed since the regression tests
only check the attributes of the `Dos` output, not the actual content.

Here we adapt the `ProjwfcParser` to treat the spin-polarised case `nspin = 2` case
correctly, constructing an `XyData` node that has two `y` arrays, one for each spin.
The tests for the spin-polarised case are also adapted to add the `XyData` contents
to the regression files.
@mbercx mbercx force-pushed the fix/748/clean-projwfc-parser branch from 55e1253 to 121d1bb Compare January 27, 2025 11:30
energy, dos, pdos_array = self._parse_pdos_files(retrieved_temporary_folder, nspin, spinorbit, logs)
energy, dos_node, pdos_array = self._parse_pdos_files(retrieved_temporary_folder, nspin, spinorbit, logs)

self.out('Dos', dos_node)
Copy link
Member Author

@mbercx mbercx Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to change this output name to dos (all lowercase), but that is technically backwards-incompatible. However, for the spin-polarised case, by fixing the bug we are also adding an extra array to the XyData output, so we are already breaking backwards compatibility in a sense? Of course changing the output link name would also break backwards compatibility for the non-spin-polarised case.

Thoughts @t-reents @sphuber @Minotakm @AndresOrtegaGuerrero?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much is the impact on previous calculations , not much right ?

('states/eV', 'states/eV')
)
else:
dos_node.set_y(pdostot_array[:, 1], 'Dos', 'states/eV')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above: I'd use lowercase dos for the array name here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants