Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigsmiles #21

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Bigsmiles #21

wants to merge 4 commits into from

Conversation

fgrunewald
Copy link
Collaborator

@fgrunewald fgrunewald commented Sep 18, 2024

@pckroon small additional feature; read bigsmiles and make a cgsmiles string out of it. This small utlity does not cover all the BigSmiles syntax but a sufficently good amount. This is useful for ML operations as well as setting up itp files from BigSmiles.

The idea is simple: each stochastic object becomes a node in a hyper resolution graph with the fragments being the all-atom blocks. It does not support nesting, because I don't care enough to make this fully recursive. But this parser is good enough for the 5000 bigsmiles strings in the block copolymer database.

I'm thinking if it is worth to do the other way around as well. But I think not every cgsmiles string can be converted to bigsmiles.

@fgrunewald fgrunewald requested a review from pckroon September 18, 2024 16:10
Copy link
Collaborator

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very pleased that you preempted by question about making it fully recursive, and I'm very happy you put some warnings in.
I don't think it makes sense to try and translate cgsmiles to bigsmiles, since cgsmiles can do things bigsmiles cannot (right?! What's the point otherwise)

Nice work. Still missing some docstrings though.
Conceptually, how do you feel about parsing the bigsmiles to a cgsmiles-like object, and feeding that to the writer?

@fgrunewald
Copy link
Collaborator Author

@pckroon Agreed that cgsmiles to bigsmiles seems strange and not sure the use case anyway

that was my original idea but then it means having to interpret the bigsmiles thing making it a graph to make it a cgsmiles string. I think that would introduce a lot of code, or required an adopted cgsmiles parser that also handles bigsmiles or am I missing your point?

By the way I'm still thinking about adding one more level or resolution though:

Consider a bigsmiles like this, where A,B,C,D are four different residues but for sake of brevity I do not write them out.

T1{[#A][#B]}{[#C][#D]}T2

This bigsmiles describes a random block of AB followed by a random block of CD. At the moment the meta graph is just A, B, C, D which is fine. But you could also argue to make each stochastic object a node and blocks within that object another level node so that you get a tree with two resolutions:

[#T1][#st_obj1][#st_obj2][#T2].{st_obj1=[#A][#B],st_obj2=[#C][#D],#T1=[#T1],#T2=[#T2]}.{#A=[...],#B=[...],#C=[...]}

For taking this string and making it an actual polymer it does not matter, because you'd just take the last resolution anyways and then define the appropriate probabilities to connect. But for ML applications it could be interesting when one does message passing on meta nodes. Not sure it is worth the effort? What do you think

@pckroon
Copy link
Collaborator

pckroon commented Sep 24, 2024

Parsing bigsmiles to an internal CGsmiles object (networkx graph) does indeed require a separate parser, but does offer advantages in that you (well, someone) can then use the cgsmiles code. In addition, you can then use the cgsmiles writer, which makes testing easier.

As for the added resolution layer, the more complete the parser the happier I get (since I'm a completionist); whether it's worth the effort I don't know, you'd need to ask the ML people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants