Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5] Writer #4

Merged
merged 24 commits into from
Oct 8, 2024
Merged

[5] Writer #4

merged 24 commits into from
Oct 8, 2024

Conversation

fgrunewald
Copy link
Collaborator

@fgrunewald fgrunewald commented Apr 17, 2024

This is a first draft of the writer for the CGSmiles.

To Do:

  • make squash operator work
  • make self-cycles work
  • have option to compress string; currently [#PEO]|3 -> [#PEO][#PEO][#PEO]
  • fix pysmiles hyrdorgen writing; the pysmiles writer transform for example [$]cc[$] -> [$][cH2][cH2][$], which technically speaking is correct and works but looks ugly.

@fgrunewald fgrunewald changed the title Writer [5] Writer Apr 27, 2024
@fgrunewald fgrunewald mentioned this pull request May 2, 2024
Copy link
Collaborator

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rebase this on master, or at least merge master? I'll go over it again then.
Untill then, some small notes.

cgsmiles/graph_utils.py Outdated Show resolved Hide resolved
cgsmiles/graph_utils.py Outdated Show resolved Hide resolved
cgsmiles/graph_utils.py Outdated Show resolved Hide resolved
cgsmiles/pysmiles_utils.py Outdated Show resolved Hide resolved
cgsmiles/resolve.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the specific remarks, I would also like some more comments in add_bond_descrp

cgsmiles/graph_utils.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
@fgrunewald
Copy link
Collaborator Author

@pckroon I think the writer is rather complete now; I noticed that we probably should have a reader that returns the highest resolution molecule and fragment dicts. It is wrapped into the MoleculeResolver, but from a data science point of view, it could be handy to write and write the same stuff. I think it should be a separate PR however.

@fgrunewald fgrunewald requested a review from pckroon October 2, 2024 16:05
Copy link
Collaborator

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I do think I found a few small issues besides some minor formatting though. This also means those cases need to get tested.

I noticed that we probably should have a reader that returns the highest resolution molecule and fragment dicts. It is wrapped into the MoleculeResolver, but from a data science point of view, it could be handy to write and write the same stuff. I think it should be a separate PR however.

Yes and yes. Reading and writing the same thing is super useful, also from a testing point of view.

cgsmiles/read_fragments.py Outdated Show resolved Hide resolved
cgsmiles/tests/test_write_cgsmiles.py Outdated Show resolved Hide resolved
Comment on lines +48 to +50
if order_symb != '-':
bond_str = order_symb
bond_str += "["+str(bonding_descrpt[:-1])+"]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to elide aromatic bonds between aromatic atoms?
I prefer string formatting over concatenation personally:

Suggested change
if order_symb != '-':
bond_str = order_symb
bond_str += "["+str(bonding_descrpt[:-1])+"]"
if order_symb != '-':
bond_str += order_symb
bond_str += "[{}]".format(bonding_descrpt[:-1])

def write_graph(molecule, smiles_format=False, default_element='*'):
"""
Creates a CGsmiles string describing `molecule`.
`molecule` should be a single connected component.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh. Not too much work to find all connected components and call a (this) function on those is it? And then just join them with a .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 this one comes from the pysmiles parser I copied. It still has to be a connected component to be valid. That's simply a CGSmiles requirement. You can have edges with zero bond order though. But they have to be edges.

cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
cgsmiles/write_cgsmiles.py Show resolved Hide resolved
Comment on lines 207 to 212
fragment_str = ""
for fragname, frag_graph in fragment_dict.items():
fragment_str += f"#{fragname}="
# format graph depending on resolution
fragment_str += write_graph(frag_graph, smiles_format=all_atom) + ","
fragment_str = "{" + fragment_str[:-1] + "}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make a list of fragment_str's for the separate fragments, then ','.join them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't matter does it?

cgsmiles/write_cgsmiles.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the wrong button, changes are needed.

cgsmiles/tests/test_write_cgsmiles.py Outdated Show resolved Hide resolved
Comment on lines +68 to +71
# we cannot be sure that the atomnames are the same because they
# will depend on the order
nx.set_node_attributes(frag_dict_out[fragname], None, "atomname")
nx.set_node_attributes(frag_dict[fragname], None, "atomname")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we assume anything? For example that the first (two?) character should be the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just the element plus the atom index. The element is already part of the testing. Therefore, I think it is not required.

@fgrunewald fgrunewald merged commit 3465bea into master Oct 8, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants