Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing of Mulliken charges from SIESTA stdout #691

Merged
merged 16 commits into from
Oct 24, 2024

Conversation

ahkole
Copy link
Contributor

@ahkole ahkole commented Feb 23, 2024

  • Added tests for new/changed functions?
  • Ran isort . and black . [24.2.0] at top-level
  • Documentation for functionality in docs/
  • Changes documented in CHANGELOG.md

I was missing the parsing of the Mulliken charges from SIESTA stdout so I started with an implementation of the parsing. Right now it only parses the total charges (so no orbital decomposition) but that is already very useful for me.

The parsing seems to work for the cases that I have tested but I still have to add some more robust tests (and do the other stuff from the todo list above). Therefore, I marked the PR as a draft. Any feedback about whether the implementation is going in the right direction or whether certain things should be handled differently would be much appreciated.

@zerothi
Copy link
Owner

zerothi commented Feb 26, 2024

I think it looks very good.

Only thing would be to streamline the headers against voronoi and hirshfeld, and also make it work for simpler spin-systems.

@ahkole
Copy link
Contributor Author

ahkole commented Feb 28, 2024

Thanks for having a look!

Only thing would be to streamline the headers against voronoi and hirshfeld

Do you mean that the headers should be the same for the three different types of charges?

and also make it work for simpler spin-systems.

Good point, I'll look into that and update the implementation.

@zerothi
Copy link
Owner

zerothi commented Feb 28, 2024

Thanks for having a look!

Thanks for extending sisl!

Only thing would be to streamline the headers against voronoi and hirshfeld

Do you mean that the headers should be the same for the three different types of charges?

Yes, in the returned dataframe it would be ideal if they were the same. This should just be checked.

and also make it work for simpler spin-systems.

Good point, I'll look into that and update the implementation.

If this proves too hard, just start with this one, finish it up, then the other thing can always be added.

@zerothi
Copy link
Owner

zerothi commented Mar 1, 2024

Once you are done with this, I would like to revisit it to make #695 functional for read_charge

@zerothi zerothi added this to the v0.15 milestone Mar 6, 2024
@ahkole
Copy link
Contributor Author

ahkole commented Mar 7, 2024

Yes, in the returned dataframe it would be ideal if they were the same. This should just be checked.

I just noticed that one of the fields in the dataframe for Voronoi and Hirshfeld is dq but for the Mulliken charges SIESTA doesn't output dq only the total charge of an atom so we cannot parse this directly for Mulliken. Do you still want it to be included in the dataframe for Mulliken (which would probably require parsing the atomic charge for each species from somewhere else to compute the dq)?

@ahkole
Copy link
Contributor Author

ahkole commented Mar 7, 2024

I also looked at the way the parsing of the Hirshfeld and Voronoi charges is currently being tested for inspiration. It reads an output that contains Voronoi and Hirshfeld charges from the sisl-files repository. These files unfortunately do not contain any Mulliken charges. Do you think it would be better to modify those files to also contain Mulliken charges so that all the charges are contained in a single file? Or should I add additional files with only Mulliken charges for testing?

@zerothi
Copy link
Owner

zerothi commented Mar 7, 2024

I see.
Then keep q as a field, we can always later allow an atoms argument to calculate dq.

And as for test, anything you prefer would be fine with me. :)

@ahkole
Copy link
Contributor Author

ahkole commented Mar 7, 2024

Then keep q as a field

I used e as a field for the charge of an atom because that was also what seems to be used for Voronoi and Hirshfeld.

I looked into the docs folder but everything for stdoutSileSiesta seems to be automatically generated from the docstring. Is this correct? And is it therefore sufficient if I have updated the docstring to reflect that parsing of total Mulliken charges is now available?

I also started looking into making it work for simpler spin cases (non-polarized and colinear) but that looks more difficult than I thought because apparently the complete layout of the Mulliken charges changes for those spin types :'-) I will try to add to add some tests and look into the parsing of those cases tomorrow.

Copy link

codecov bot commented Mar 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.92%. Comparing base (af3ff99) to head (236c33f).
Report is 40 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #691      +/-   ##
==========================================
+ Coverage   86.68%   86.92%   +0.23%     
==========================================
  Files         402      403       +1     
  Lines       51883    52509     +626     
==========================================
+ Hits        44976    45642     +666     
+ Misses       6907     6867      -40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zerothi
Copy link
Owner

zerothi commented Mar 7, 2024

I used e as a field for the charge of an atom because that was also what seems to be used for Voronoi and Hirshfeld.

Good, I'll open an issue to try and change this (or the other).

I looked into the docs folder but everything for stdoutSileSiesta seems to be automatically generated from the docstring. Is this correct? And is it therefore sufficient if I have updated the docstring to reflect that parsing of total Mulliken charges is now available?

Yes, in terms of the documentation, you don't need to do anything else :)

I also started looking into making it work for simpler spin cases (non-polarized and colinear) but that looks more difficult than I thought because apparently the complete layout of the Mulliken charges changes for those spin types :'-) I will try to add to add some tests and look into the parsing of those cases tomorrow.

Ok, thanks, yes, the Mulliken is pretty difficult to parse, as it has many options. It should probably be refactored in Siesta, because it really is a night-mare to parse ;)

@ahkole
Copy link
Contributor Author

ahkole commented Mar 8, 2024

I tried adding the parsing for simpler spin types. The parsing did get more complicated and I hope the code is not too convoluted now.

I didn't get around to adding the tests yet but I'll try to get to that next week.

atom_idx = []
header = ["e"]
_parse_spin_pol()
elif ret[2] == IDX_POL:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it makes sense to use self.info.spin == Spin.POLARIZED instead? The self.info.spin should be populated well before any mulliken charges is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't know that. If that works that would certainly be preferable. I'll give it a try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure self.info.spin gets populated? I tried this but I get,

AttributeError: stdoutSileSiesta.info.spin does not exist, did you mistype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an additional spin property to the InfoAttr dict at the start of stdOutSileSiesta and now it works (see commit fece834). Was this sort of what you had in mind?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, sorry, I had added that commit just after you branched off, see here

def _parse_spin(attr, match):

So basically we did the same thing, yes, it was exactly how I imagined it. If you could revert your commit, then rebase, or ask me to rebase :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted my last commit to remove any changes I made to add the self.info.spin property and then rebased on the latest version of main. Everything should be okay again now :)

I still have to add the tests to finalize this PR but I'll do my very best to allocate some time to do this next week.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds really great!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you are not aware, files should go in the https://github.com/zerothi/sisl-files repository, and then the submodule files should be bumped, if you need help, let me know. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get around to this unfortunately and I'll be away next week so I won't be able to finalize this until the first week of April. I hope this is okay.

@ahkole ahkole force-pushed the stdout-siesta-read-mulliken branch from fece834 to d6f6ca9 Compare March 15, 2024 18:09
@zerothi
Copy link
Owner

zerothi commented Mar 22, 2024

no worries, let me know when/if you need feedback! :)

@zerothi
Copy link
Owner

zerothi commented Apr 10, 2024

Hi @ahkole sorry to ping, do you need any help here?

@ahkole
Copy link
Contributor Author

ahkole commented Apr 10, 2024

Hi @ahkole sorry to ping, do you need any help here?

Hi, no need to apologize for pinging. I agree that this should be wrapped up and it was starting to be overshadowed by other projects. I am now trying to add the test cases. I have forked the sisl-files repo and started adding the writing of the Mulliken charges to the test files that were already there for the charges. I have one question about this and am running into a small issue:

  1. The current output files for the charge parsing tests were run with some version of the master branch that I don't know how to identify exactly, in the top there is this string for the version label: master-post-4.1-266. Is it okay if I use the latest release version from the GitLab siesta-5.0.0-beta1 to generate output files for the tests? Or should I use a version closer to the latest stable release 4.1.5 for generating these files?
  2. I'm trying to run the test suite that's already there to verify that that is working but I'm getting some errors. I installed the dependencies that were listed in the test section in the pyproject.toml and am trying to run the tests by running pytest src/sisl/io/siesta/tests/test_stdout_charges.py from the root of the source tree but then I always get the following error:
ImportError while loading conftest '/home/arnold/Documents/Code/SIESTA/ahkole-sisl/src/sisl/conftest.py'.
src/sisl/__init__.py:103: in <module>
    import sisl.utils as utils
src/sisl/utils/__init__.py:32: in <module>
    from . import mathematics as math
src/sisl/utils/mathematics.py:24: in <module>
    from sisl._indices import indices_le
E   ModuleNotFoundError: No module named 'sisl._indices'

Do you know what's causing this and how to fix it?

@ahkole
Copy link
Contributor Author

ahkole commented Apr 10, 2024

I may have already figured out what's causing this issue. This sisl._indices module is maybe something that has to be generated/compiled by the install process. If I first install the repo and try to run pytest in the install directory I no longer get this error. However, all the tests that were already there do fail though. Maybe because it cannot find the output files it needs to parse? What is the proper way of running the sisl test suite?

@zerothi
Copy link
Owner

zerothi commented Apr 11, 2024

1. The current output files for the charge parsing tests were run with some version of the master branch that I don't know how to identify exactly, in the top there is this string for the version label: `master-post-4.1-266`. Is it okay if I use the latest release version from the GitLab `siesta-5.0.0-beta1` to generate output files for the tests? Or should I use a version closer to the latest stable release `4.1.5` for generating these files?

I don't mind either :)
Could you add all necessary files for the run, then I can adapt later (see e.g. the discussion here: zerothi/sisl-files#14)

2. I'm trying to run the test suite that's already there to verify that that is working but I'm getting some errors. I installed the dependencies that were listed in the `test` section in the `pyproject.toml` and am trying to run the tests by running `pytest src/sisl/io/siesta/tests/test_stdout_charges.py` from the root of the source tree but then I always get the following error:

Yeah, you can't do that, if you wan't to run a specific test, you should do something like this:

pytest --pyargs sisl -k stdout_charges

I may have already figured out what's causing this issue. This sisl._indices module is maybe something that has to be generated/compiled by the install process. If I first install the repo and try to run pytest in the install directory I no longer get this error. However, all the tests that were already there do fail though. Maybe because it cannot find the output files it needs to parse? What is the proper way of running the sisl test suite?
Yes, the test files should be there before it can run the test-suite.

@ahkole
Copy link
Contributor Author

ahkole commented Apr 12, 2024

pytest --pyargs sisl -k stdout_charges

It required some trial and error on how to point the test suite to the correct files (luckily the error messages usefully referred to the use of a SISL_FILES_TESTS environment variable) but I can now successfully run the tests (and they pass :) ). I will now add some new tests.

Could you add all necessary files for the run, then I can adapt later (see e.g. the discussion here: zerothi/sisl-files#14)

I read the discussion but am not 100% sure I understand what the structure should be. Am I understanding correctly that I should reshape the folder https://github.com/zerothi/sisl-files/tree/main/tests/sisl/io/siesta/outs that contains the current files that are used by the stdout_charges tests such that each output file is in a separate directory (because it was from a separate run)? And that I should add all my new files in separate directories as well? Should each of these folders contain their own README.md or can there be a single README.md in a top folder since all these output files are related (they are for the charge parsing tests)? Finally, I only need to include all files necessary to run the python tests right? So not the fdf files used to generate the output files?

@zerothi
Copy link
Owner

zerothi commented Apr 12, 2024

pytest --pyargs sisl -k stdout_charges

It required some trial and error on how to point the test suite to the correct files (luckily the error messages usefully referred to the use of a SISL_FILES_TESTS environment variable) but I can now successfully run the tests (and they pass :) ). I will now add some new tests.

Could you add all necessary files for the run, then I can adapt later (see e.g. the discussion here: zerothi/sisl-files#14)

I read the discussion but am not 100% sure I understand what the structure should be. Am I understanding correctly that I should reshape the folder https://github.com/zerothi/sisl-files/tree/main/tests/sisl/io/siesta/outs that contains the current files that are used by the stdout_charges tests such that each output file is in a separate directory (because it was from a separate run)? And that I should add all my new files in separate directories as well? Should each of these folders contain their own README.md or can there be a single README.md in a top folder since all these output files are related (they are for the charge parsing tests)? Finally, I only need to include all files necessary to run the python tests right? So not the fdf files used to generate the output files?

No, you don't need to reshape the folders (currently).
My intention is that any new files should be in separate directories. The current files does not contain the mulliken charges, so you would have to create new ones anyway. Whether or not they will contain all charges is up to you.
So

  1. Create a new folder that contains all input and output files for a given structure, following the issue
  2. That folder should be self-contained and should not contain information from other systems.
  3. The idea is that the repo should be fully self-contained. So everything should be added. Input files. And my current way of thinking for siesta is to use psml files from PseudoDojo, the scalar relativistic standard precision V5 (I am currently working on re-ordering files according to these files.

Is this better? Let me know! 😄

@ahkole
Copy link
Contributor Author

ahkole commented Apr 12, 2024

Is this better? Let me know! 😄

I think so. I'll just give it a try and then let you review.

@ahkole
Copy link
Contributor Author

ahkole commented Apr 16, 2024

I've made a first try of adding files for the Mulliken tests, see zerothi/sisl-files#15 . Is this what close to what you wanted? Or are there now too many files in a single folder and should each file be moved to its own subfolder?

@zerothi
Copy link
Owner

zerothi commented Apr 17, 2024

Yes, sub-folder would be nice! :)

@zerothi
Copy link
Owner

zerothi commented May 21, 2024

Sorry to ping you again, I think your addition would be nice to have in a new release, what needs to be done here?

@ahkole
Copy link
Contributor Author

ahkole commented May 21, 2024

Hi, yes sorry that this is still open. It would also be nice for myself to finally wrap this up. The files for the tests have been merged into sisl-files so the only thing that still needs to happen is to add tests of the Mulliken charge parsing that use these files. I think the easiest would be to try to mimic the testing that is done now for the Hirshfeld and Voronoi parsing?

@zerothi
Copy link
Owner

zerothi commented Jun 19, 2024

Hi, yes sorry that this is still open. It would also be nice for myself to finally wrap this up. The files for the tests have been merged into sisl-files so the only thing that still needs to happen is to add tests of the Mulliken charge parsing that use these files. I think the easiest would be to try to mimic the testing that is done now for the Hirshfeld and Voronoi parsing?

Sorry, I had missed this. It isn't too important, I am right now reworking the directory structure, so if you retain this in a single directory, it would be ideal, then I will re-arrange accordingly, sorry for the delayed response.

@ahkole
Copy link
Contributor Author

ahkole commented Jun 24, 2024

Sorry, I had missed this. It isn't too important, I am right now reworking the directory structure, so if you retain this in a single directory, it would be ideal, then I will re-arrange accordingly, sorry for the delayed response.

I'm not entirely sure I understand what you mean but I'll describe how I'm doing it right now:

In sisl-files I added a new directory with files that contain Mulliken moments. The old outs folder with the files that are used by the testing of the Voronoi and Hirshfeld charges is still there as well.

In sisl I copied the test_stdout_charges.py file to test_stdout_charges_mulliken.py and started modifying the tests in the new file to test the Mulliken charging. It is possible to merge test_stdout_charges_mulliken.py into test_stdout_charges.py, then the Voronoi and Hirshfeld tests should start using the files in the new folder I added in sisl-files instead of the old folder.

I am running into some issues with the last test with Mulliken moments at every SCF step. The tests fail for that case so I have to figure out why, I'll look at that in the coming days.

@zerothi
Copy link
Owner

zerothi commented Jun 25, 2024

....
Sounds perfect, great!

I am running into some issues with the last test with Mulliken moments at every SCF step. The tests fail for that case so I have to figure out why, I'll look at that in the coming days.

Ok, let me know if you need assistance!

@ahkole
Copy link
Contributor Author

ahkole commented Sep 27, 2024

Long, long, long overdue but I have finally taken the time to finalize this PR. I fixed the issue that was causing some of the tests to crash (turns out it wasn't such a smart idea to keep parsing as long as you find new species for NC/SOC if you have more than 1 MD step). I have also synced everything with upstream/main and merged my tests into the new structure of the test_stdout_charges.py. I like the improvements to that test file and the sisl-files directory/name structure btw! Adding the tests was now much simpler. Note btw that due to the implementation of the Mulliken charges in SIESTA you can only have MD charges and not SCF or final charges, this is reflected in which tests are used for the Mulliken charges.

Can you check if everything is okay now to be merged?

@ahkole ahkole marked this pull request as ready for review September 27, 2024 16:42
nspecies is what is currently used in Atom, so better
stick with that name.

Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
@zerothi
Copy link
Owner

zerothi commented Oct 24, 2024

And a long overdue merge, I finalized a few things... Once CI passes, I'll merge!

Thanks!

@zerothi zerothi merged commit d9aacc0 into zerothi:main Oct 24, 2024
17 checks passed
@zerothi
Copy link
Owner

zerothi commented Oct 24, 2024

Thanks @ahkole!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants