Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bonds for structure type entries #465

Draft
wants to merge 16 commits into
base: develop
Choose a base branch
from

Conversation

merkys
Copy link
Member

@merkys merkys commented Jun 6, 2023

In issue #426 I proposed adding more chemical properties to OPTIMADE structures. This PR implements my suggestion on representation of chemical connectivity between pairs of sites in OPTIMADE:

{
    "bonds": [ {"sites": [1, 2]} ]
}

I intentionally omit the bond types as this might be difficult to agree upon, whereas having just the connectivity is already beneficial.

Pinging people who have expressed their interest for comments: @eimrek @BobHanson @Austin243

Edit: I have introduced means to express connections with translation equivalents of the sites in sites.

@merkys merkys requested review from vaitkus, ml-evs and JPBergsma June 6, 2023 11:46
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
@merkys
Copy link
Member Author

merkys commented Jun 7, 2023

It has been pointed out that bonds might cross the unit cell and this should as well be reflected. I will update the PR to include this piece of information.

Edit: Added a way to describe translations.

optimade.rst Outdated Show resolved Hide resolved
vaitkus
vaitkus previously approved these changes Jun 7, 2023
Co-authored-by: Antanas Vaitkus <[email protected]>
@merkys merkys requested a review from vaitkus June 8, 2023 07:15
@merkys
Copy link
Member Author

merkys commented Jun 8, 2023

Pinging @d-beltran for comments on how this proposal suits macromolecules.

@merkys
Copy link
Member Author

merkys commented Jun 8, 2023

Pinging @utf who participated in the discussions.

Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @merkys, here's my initial comments. I wonder if we also want this field to capture "non-chemical" connectivity, but I can't think of a good use case

optimade.rst Outdated
- **Examples**:

- :val:`[ {"sites": [1, 2]} ]`: a structure with a bond between sites 1 and 2.
- :val:`[ {"sites": [1, 1], "translations": [ [0, 0, 0], [0, 0, 1] ]} ]`: a 1D polymer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current scheme, I think this is the representation of example of primitive NaCl:

Suggested change
- :val:`[ {"sites": [1, 1], "translations": [ [0, 0, 0], [0, 0, 1] ]} ]`: a 1D polymer.
- :val:```
[ {"sites": [0, 1], "translations": [ [0, 0, 0], [-1, -1, 0] ]},
{"sites": [0, 1], "translations": [ [0, 0, 0], [-1, 0, -1] ]},
{"sites": [0, 1], "translations": [ [0, 0, 0], [0, -1, -1] ]},
{"sites": [0, 1], "translations": [ [0, 0, 0], [1, 1, 0] ]}
{"sites": [0, 1], "translations": [ [0, 0, 0], [1, 0, 1] ]}
{"sites": [0, 1], "translations": [ [0, 0, 0], [0, 1, 1] ]}
]
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property could potentially be used to describe crystal topology which may involve various types of contact distances, but that seems to be a bit a beast of its own (at least according to work done in the TOPO_CIF dictionary [1]). I would probably limit it to chemical bonding for now, especially since currently we purposely are trying to avoid specifying the bond type/order.

[1] https://github.com/COMCIFS/TopoCif/blob/main/dictionary/Topology_0.9.5.dic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaitkus Right, there have already been discussions in the workshop about having different networks for the same structure entry, but this seems to be difficult to accommodate at the moment.

optimade.rst Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
Co-authored-by: Antanas Vaitkus <[email protected]>

- *sites*: a non-decreasing list of 0-based indexes of the two sites that form a chemical bond.

- If translations are needed by at least one of the sites of a bond, the following key SHOULD be used:
Copy link
Contributor

@vaitkus vaitkus Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the translations field is not mandatory in general, maybe we should require it if at least one of the sites is translated?

That is, maybe we should change SHOULD to MUST?

Edit: changed the field name from sites to translations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are talking about translations, as sites is mandatory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merkys Yes, I actually meant translations (fixed).

@d-beltran
Copy link

d-beltran commented Jun 8, 2023

Works for me :)

I intentionally omit the bond types as this might be difficult to agree upon

The type of bond is not specified in most topology formats in our field but I think this is very inconvenient. So happy to know this is provisional and the type will be specified in a future.

@merkys
Copy link
Member Author

merkys commented Jun 8, 2023

@eimrek suggested leaving translation vector only for one of the sites as at least one site will stay in the unit cell, or can be translated there. How about this:

{
    "sites": [63, 64],
    "translation_site": 1, // the second of the two sites is translated
    "translation_vector": [0, 0, 1]
}

The translation_site is required as any of the two sites could be translated, and the order of indexes has to be respected in sites. Or is it better to have a more structured way: "translation": { "site": 1, "vector": [0, 0, 1] }?

@vaitkus
Copy link
Contributor

vaitkus commented Jun 8, 2023

@merkys I would leave at least the option to provide both of the translations. In the COD there are multiple non-polymeric molecules that spans several unit cells, so both translation vectors will be needed to correctly represent the complete molecule. I attach an example of such molecule to this comment (1540421.cif.txt, remove the txt extension before viewing), but @sauliusg could probably provide an even more extreme example (I seem to recall a molecule that spans 5 unit cells).

@sauliusg sauliusg self-requested a review June 9, 2023 07:35
@rartino
Copy link
Contributor

rartino commented Jun 9, 2023

Workshop: We are happy to merge an explicit bond strucutre datastructure, but we must consider the exact format so the types of queries that one wants to do can be performed (with the present filter language, preferably). Inheriting the current CIF framework for this should be seriously considered.

@merkys
Copy link
Member Author

merkys commented Jun 12, 2023

@merkys I would leave at least the option to provide both of the translations. In the COD there are multiple non-polymeric molecules that spans several unit cells, so both translation vectors will be needed to correctly represent the complete molecule. I attach an example of such molecule to this comment (1540421.cif.txt, remove the txt extension before viewing), but @sauliusg could probably provide an even more extreme example (I seem to recall a molecule that spans 5 unit cells).

It is always possible to back-translate one of the sites into the primary unit cell without losing the connectivity information. Having both non-zero translation vectors is a matter of convenience, I think, or does this retain some more information?

@merkys
Copy link
Member Author

merkys commented Jun 12, 2023

Workshop: We are happy to merge an explicit bond strucutre datastructure, but we must consider the exact format so the types of queries that one wants to do can be performed (with the present filter language, preferably). Inheriting the current CIF framework for this should be seriously considered.

Unless we introduce some data redundancy here, the most powerful queries would be the ones based on correlated arrays (a.k.a. zips), as OPTIMADE does not support anything more intricate than that. To make bonds more friendly for zips, it should be defined as an array correlated with species_at_sites, possibly listing all neighbours of each site (full symmetric matrix). But again, species_at_sites is just an array of labels, thus one could not reliably query for, for example, structures where O atom has three bonds or more.

@vaitkus
Copy link
Contributor

vaitkus commented Jun 12, 2023

It is always possible to back-translate one of the sites into the primary unit cell without losing the connectivity information. Having both non-zero translation vectors is a matter of convenience, I think, or does this retain some more information?

Well, if you back-translate sites into the primary unit cell you lose some information and end up with a set of disjointed bonded fragments. You could, of course, translate these fragments from the primary unit cell back into their proper place, however, it is not ye obvious to me that this is a straightforward task. What is the drawback of allowing to specify both sites?

@rartino rartino mentioned this pull request Jan 10, 2024
@ml-evs ml-evs added status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. PR/requires-discussion labels Mar 22, 2024
Copy link
Contributor

@sauliusg sauliusg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are thinking about implementation of fractional coordinates in the future, in such structures only sites within an asymmetric unit will be given. Maybe we also need a key symmetry_operation in addition to the key translation?

- **Type**: list of dictionary with keys:

- :property:`sites`: a list of integers (REQUIRED)
- :property:`translations`: a list of list of integers (OPTIONAL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably specify very clearly that translations must be applied to the coordinates of atoms as they are currently given, and not to, e.g., coordinates reduced to the [0;1) unit cell.

@rartino rartino marked this pull request as draft December 13, 2024 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR/requires-discussion status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants