Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations #25

Merged
merged 19 commits into from
Nov 22, 2024
Merged

Annotations #25

merged 19 commits into from
Nov 22, 2024

Conversation

fgrunewald
Copy link
Collaborator

Martini CG mappings are a curious thing; More complex molecules frequently have atoms that are ignored in the mapping (i.e. zero weight) or reweighted. This PR implements a syntax for this. Optionally, nodes can be given weight using[node;weight], where weight may be a float. Hydrogens need not be given as we figure them out later, but canonically speaking, it would be better. Annotation of the weight may occur before or after the chirality operator. A default weight of 1 is given to all nodes. Hydrogen atoms unless explicit get the same weight as the node they are attached to.

For example, hexadecane but we map only the central two carbons: {[#C1]|4}.{#C1=[C;0]CC[C;0]}

@pckroon
Copy link
Collaborator

pckroon commented Oct 15, 2024

Don't hate it.
We could also use the 'class' attribute for this (which already exists in smiles). The downsides of that are that it's only integers, and the default is 0; the good news here is that pysmiles doesn't set a class for atoms that don't specify one.
I'm fine with using ;<float> instead as well, but I strongly suggest they come all the way at the end of an atom specification (and only there).
As for implicit hydrogens, I see 2 options: either a) implicit hydrogens get the same weight as their parent; or b) the weight of the parent is split/distributed over the hydrogens.
a) gets you more CoG-like behaviour, but CH3 suddenly has more weight (=4) than CH2 (=3), even though you didn't say that.
b) means that CH3 (=0.25 per atom) and CH2 (=0.33 per atom) both have the same weight, but it's less CoG-like

I think a) is the better option, because CoG is a sane default. Needs to be documented though.

Copy link
Collaborator

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine, just a small question

cgsmiles/pysmiles_utils.py Outdated Show resolved Hide resolved
@fgrunewald
Copy link
Collaborator Author

@pckroon I prefer to strip them from the smiles string, because in that fashion CGSmiles and SMILES fragments are treated the same. The question is if one would want to allow general annotation for CGSmiles as well. Something like [#TC5;label]. I might add that in a seperate PR.

@pckroon
Copy link
Collaborator

pckroon commented Nov 4, 2024

[#TC5;w=0.5;name=a;attr=42] you mean? Kind of neat idea.

@fgrunewald
Copy link
Collaborator Author

@pckroon yes that's the idea pretty much. In Martini there are so many meta infos, which one might want to add. But again I think a seperate PR would be better

@pckroon
Copy link
Collaborator

pckroon commented Nov 4, 2024

Well, yes and no. You/we need to decide on the syntax here and now to see how the weights should get implemented: [C;w=0.4] or [C;0.4]. If we decide on the latter, there's the question how the 2 functions are going to interact

@fgrunewald
Copy link
Collaborator Author

ahh yes that is true of course

@fgrunewald
Copy link
Collaborator Author

let me sit on this until tomorrow

@fgrunewald
Copy link
Collaborator Author

fgrunewald commented Nov 5, 2024

@pckroon here my thoughts:

What I like:

  • the syntax makes it easy to store attributes like names; bead types which are often used complementary
  • there is no prior assumption to the meaning of a field
  • it would allow for a module dealing with these annotations using the same code base / logic

What I don't like:

  • another series of equal signs
  • the annotations become more verbose
  • it is not clear how attributes propagate to hydrogen atoms; although the sane default is copying the attribute I guess

@ricalessandri what are your thoughts?

@pckroon
Copy link
Collaborator

pckroon commented Nov 5, 2024

Agnosticism is a big plus when programming, especially when it comes to syntax design (IMNSHO). It makes it more likely you get it right the first time.

another series of equal signs

You need some sort of separator between labels and values, so if you want bead annotations this is unavoidable

the annotations become more verbose

This is indeed a downside. You could argue that, since weights are going to be the most used, they can go without label.
Second thought: treat them like args and kwargs: you can first put positional annotations, then keyword annotations. The application determines the semantics. Use https://docs.python.org/3/library/inspect.html#introspecting-callables-with-the-signature-object

it is not clear how attributes propagate to hydrogen atoms; although the sane default is copying the attribute I guess

Sounds reasonable. If you want to annotate hydrogens, make them explicit.

@ricalessandri
Copy link

My 2 cents: This seems neat and useful. About the verbosity concern - since weight=1 will likely be the implicit default, verbosity only matters when you need different weight values. And for weights (probably the most common case), [C;w=0.4] isn't much longer than [C;0.4]. The more explicit syntax actually makes the cgsmiles more human-readable, so I'd lean towards [C;w=0.4]. Though I won't oppose the idea of having the weights working (also) without the "w" label: so, if one omits the w=, 0.4 is still read as a weight.

@fgrunewald
Copy link
Collaborator Author

fgrunewald commented Nov 8, 2024

As decided by popular vote here it is:

We now allow general annotations of the type [keyword=value,keyword=value]. In addition to making the notation more concise, we have positional arguments defined by a dialect. These positional arguments also set default values for example charge: 0 and weight: 1. They also overtake the charge argument. Thus charged beads have to be written as [#PO4;1.0] or [#PO4;c=1.0].

For the CG part, we adopt the CGSmiles base dialect with charges and weights as positional arguments. Therefore, one can write [#PO4;1.0;0.5] corresponding to charge1 and weight of 0.5. For a neutral residue [#SP4;0;0.5] or [#SP4;w=0.5].

The notation also allows for example to write:

[#ION;1.0;type=Na;cg=1;resname=Na] if one wanted that.

We also support the annotations for the all-atom part, but we should not advertise it. At the moment one should only specify the weight. Charges should still be specified as [Na+] for example. In the long run, when pysmiles gets a minimal parser one can change that and align the annotations.

So bottom line:

  • in OpenSmiles fragments we support weights in addition to the regular SMILES stuff
  • in the CG part we support charges and weights as well as any keyword argument

New dialects can be created using the create_dialect function and making a partial(_parse_node, dialect_signature=YOUR_DIALECT). However, at the moment there is no sleek way to put this into the parsing pipline. @pckroon any thoughts where to inject it.

cgsmiles/pysmiles_utils.py Outdated Show resolved Hide resolved
@fgrunewald fgrunewald changed the title Weights Annotations Nov 8, 2024
cgsmiles/pysmiles_utils.py Outdated Show resolved Hide resolved
cgsmiles/dialects.py Outdated Show resolved Hide resolved
cgsmiles/dialects.py Outdated Show resolved Hide resolved
cgsmiles/dialects.py Outdated Show resolved Hide resolved
cgsmiles/dialects.py Outdated Show resolved Hide resolved
cgsmiles/dialects.py Outdated Show resolved Hide resolved
cgsmiles/dialects.py Outdated Show resolved Hide resolved
cgsmiles/dialects.py Outdated Show resolved Hide resolved
@fgrunewald fgrunewald requested a review from pckroon November 18, 2024 11:20
Copy link
Collaborator

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small code cleaning opportunity, looks good otherwise!

cgsmiles/dialects.py Outdated Show resolved Hide resolved
Co-authored-by: Peter C Kroon <[email protected]>
@fgrunewald fgrunewald merged commit 59d225c into master Nov 22, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants