`ProjwfcParser`: refactor code and temp retrieve PDOS files #749

mbercx · 2021-10-03T15:59:45Z

Massive refactoring of the ProjwfcParser to make the code less entangled
and more readable.

TODO:

Integrate changed from Add raw parser of Lowdin charges from projwfc output file #425 in this PR and make sure @chrisjsewell gets his attribution.

mbercx · 2021-10-03T16:41:21Z

Note: the Python 3.6 tests fail because numpy.ArrayLike requires a numpy version that no longer supports Python3.6. However, since the EOL of Python 3.6 is in a couple months and the PR is meant to go into v4.0.0, I think we can just leave it as is and drop Python 3.6 support before we merge it.

More importantly, the tests succeed for 581bd57 and Python >= 3.7, without touching the tests. I've also run both the old parser and refactored one for the 4 different spin settings, and compared the content of the orbitals and PDOS arrays. So I'm quite confident that the parsing is unaltered.

mbercx · 2021-10-03T16:56:18Z

@sphuber this PR is both blocked by #747 and on hold because it contains backwards incompatible changes:

Drop support for pre-v6.5 QE versions.
Only temporarily retrieve the PDOS files to lighten the repository.

The `generate_calc_job_node` fixture has a `retrieve_temporary` input that allows the user to specify a list of files in the fixture test folder that should be copied to the temporary retrieved folder (as defined by the `_retrieve_temporary_list` class attribute in the corresponding calculation job). Here we extend this functionality to allow the user to specify glob patterns in the `retrieve_temporary` input. Second, we also adapt the code to use the `pathlib` module for handling paths. While doing so, we also revert a change in 0874d95 that introduced a try-except to handle the case where the XML is not in the fixture test folder. This is no longer necessary, because the `glob` call will only return files that are actually present in the fixture test folder.

mbercx · 2025-01-24T07:43:07Z

@Minotakm @t-reents @AndresOrtegaGuerrero I still have to add the Lowdin parsing, but other than that the PR is ready for review. A line-by-line code review might be quite a bit of work, I'm mainly asking you to test the new parser class and see if it still has any issues. E.g. I think you mentioned the parsing for some cases (spinorbit?) was incorrect.

Minotakm · 2025-01-24T07:44:33Z

Thanks @mbercx I will give it a spin and see how it goes.

The `ProjwfcParser` relied on passing a single `out_info_dict` dictionary between several functions to store the parsed information. This made the parser code so entangled that it was almost unreadable and difficult to trace down bugs. Here we refactor the class to properly isolate the various parsing functions as class methods, and make the code more readable in general. The parent class is also changed to the `BaseParser` one, so it can rely on the basic stdout parsing this class provides. Another substantial change is that the `.pdos` files, which can be many and are parsed completely, are now only temporarily retrieved instead of stored in the AiiDA repository in the `retrieved` folder.

t-reents

Thanks @mbercx, I went through it and it looks nice! Definitely easier to understand what is going on, compared to the last version.

As you mentioned, there was still an issue, namely the proper parsing of the total DOS #1041. I added the fix as suggestions (let me know if I should open a PR to your branch instead) and quickly tested them. @AndresOrtegaGuerrero, please have a look as well.

src/aiida_quantumespresso/parsers/projwfc.py

mbercx · 2025-01-24T10:43:29Z

@t-reents thanks a lot for the review! Feel free to add a commit here with your changes. If possible it would be great if:

We add a commit that tests this issue, probably via a pytest data regression. This will adapt the YAML files corresponding to that regression test. The tests will still pass after the file is updated, but hopefully it's clear that the DOS parsing is incorrect.
We add a second commit that fixes this parsing problem, clearly showing the change in the data regression YAML files.

This PR will then have 4 commits to merge in the end. I think that is the most clear for future developers.

mbercx · 2025-01-27T09:07:28Z

Thanks @mbercx I will give it a spin and see how it goes.

Haha, just noticed the beautiful pun here. Stellar work @Minotakm 😂

AndresOrtegaGuerrero · 2025-01-27T10:00:54Z

thank you @mbercx I will test it once the new commits are merged then :)

Currently, the `ProjwfcParser` does not correctly parse the total DOS data in the case of spin-polarised calculations. Only the first column is parsed, which corresponds to spin-up density only. This issue passed the tests unnoticed since the regression tests only check the attributes of the `Dos` output, not the actual content. Here we adapt the `ProjwfcParser` to treat the spin-polarised case `nspin = 2` case correctly, constructing an `XyData` node that has two `y` arrays, one for each spin. The tests for the spin-polarised case are also adapted to add the `XyData` contents to the regression files.

mbercx · 2025-01-27T11:34:44Z

src/aiida_quantumespresso/parsers/projwfc.py

-        energy, dos, pdos_array = self._parse_pdos_files(retrieved_temporary_folder, nspin, spinorbit, logs)
+        energy, dos_node, pdos_array = self._parse_pdos_files(retrieved_temporary_folder, nspin, spinorbit, logs)
+
+        self.out('Dos', dos_node)


I'd be inclined to change this output name to dos (all lowercase), but that is technically backwards-incompatible. However, for the spin-polarised case, by fixing the bug we are also adding an extra array to the XyData output, so we are already breaking backwards compatibility in a sense? Of course changing the output link name would also break backwards compatibility for the non-spin-polarised case.

Thoughts @t-reents @sphuber @Minotakm @AndresOrtegaGuerrero?

How much is the impact on previous calculations , not much right ?

mbercx · 2025-01-27T11:35:12Z

src/aiida_quantumespresso/parsers/projwfc.py

+                ('states/eV', 'states/eV')
+            )
+        else:
+            dos_node.set_y(pdostot_array[:, 1], 'Dos', 'states/eV')


Similar comment as above: I'd use lowercase dos for the array name here.

mbercx added this to the v4.0.0 milestone Oct 3, 2021

mbercx added the pr/blocked PR is blocked by another PR that should be merged first label Oct 3, 2021

mbercx force-pushed the fix/748/clean-projwfc-parser branch from 4040d49 to 581bd57 Compare October 3, 2021 16:28

mbercx added the pr/on-hold PR should not be merged label Oct 3, 2021

mbercx mentioned this pull request Oct 3, 2021

projwfc.x: parse from XML instead of parent calc #747

Merged

This was referenced Jan 28, 2022

♻️ REFACTOR: move basic parsing into BaseParser class #756

Merged

Add raw parser of Lowdin charges from projwfc output file #425

Closed

mbercx modified the milestones: v4.0.0, v5.0.0 May 22, 2023

mbercx mentioned this pull request Dec 23, 2024

ProjwfcCalculation Doesnt doesnt parse properly pdos_tot (Bug) #1041

Open

mbercx mentioned this pull request Jan 22, 2025

🚀 Release v4.8.0 #1063

Merged

mbercx force-pushed the fix/748/clean-projwfc-parser branch 2 times, most recently from 7edefbe to 521e34b Compare January 23, 2025 21:33

mbercx force-pushed the fix/748/clean-projwfc-parser branch from 521e34b to d7d344a Compare January 24, 2025 07:39

mbercx removed pr/on-hold PR should not be merged pr/blocked PR is blocked by another PR that should be merged first labels Jan 24, 2025

mbercx requested review from Minotakm, AndresOrtegaGuerrero and t-reents January 24, 2025 07:39

mbercx force-pushed the fix/748/clean-projwfc-parser branch from d7d344a to bec5f23 Compare January 24, 2025 07:58

t-reents suggested changes Jan 24, 2025

View reviewed changes

mbercx force-pushed the fix/748/clean-projwfc-parser branch from 55e1253 to 121d1bb Compare January 27, 2025 11:30

mbercx commented Jan 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ProjwfcParser`: refactor code and temp retrieve PDOS files #749

`ProjwfcParser`: refactor code and temp retrieve PDOS files #749

mbercx commented Oct 3, 2021 •

edited

Loading

mbercx commented Oct 3, 2021

mbercx commented Oct 3, 2021

mbercx commented Jan 24, 2025

Minotakm commented Jan 24, 2025

t-reents left a comment

mbercx commented Jan 24, 2025

mbercx commented Jan 27, 2025

AndresOrtegaGuerrero commented Jan 27, 2025

mbercx Jan 27, 2025 •

edited

Loading

AndresOrtegaGuerrero Jan 28, 2025

mbercx Jan 27, 2025

ProjwfcParser: refactor code and temp retrieve PDOS files #749

Are you sure you want to change the base?

ProjwfcParser: refactor code and temp retrieve PDOS files #749

Conversation

mbercx commented Oct 3, 2021 • edited Loading

mbercx commented Oct 3, 2021

mbercx commented Oct 3, 2021

mbercx commented Jan 24, 2025

Minotakm commented Jan 24, 2025

t-reents left a comment

Choose a reason for hiding this comment

mbercx commented Jan 24, 2025

mbercx commented Jan 27, 2025

AndresOrtegaGuerrero commented Jan 27, 2025

mbercx Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

AndresOrtegaGuerrero Jan 28, 2025

Choose a reason for hiding this comment

mbercx Jan 27, 2025

Choose a reason for hiding this comment

`ProjwfcParser`: refactor code and temp retrieve PDOS files #749

`ProjwfcParser`: refactor code and temp retrieve PDOS files #749

mbercx commented Oct 3, 2021 •

edited

Loading

mbercx Jan 27, 2025 •

edited

Loading