Bigsmiles #21

fgrunewald · 2024-09-18T16:09:47Z

@pckroon small additional feature; read bigsmiles and make a cgsmiles string out of it. This small utlity does not cover all the BigSmiles syntax but a sufficently good amount. This is useful for ML operations as well as setting up itp files from BigSmiles.

The idea is simple: each stochastic object becomes a node in a hyper resolution graph with the fragments being the all-atom blocks. It does not support nesting, because I don't care enough to make this fully recursive. But this parser is good enough for the 5000 bigsmiles strings in the block copolymer database.

I'm thinking if it is worth to do the other way around as well. But I think not every cgsmiles string can be converted to bigsmiles.

pckroon

I'm very pleased that you preempted by question about making it fully recursive, and I'm very happy you put some warnings in.
I don't think it makes sense to try and translate cgsmiles to bigsmiles, since cgsmiles can do things bigsmiles cannot (right?! What's the point otherwise)

Nice work. Still missing some docstrings though.
Conceptually, how do you feel about parsing the bigsmiles to a cgsmiles-like object, and feeding that to the writer?

fgrunewald · 2024-09-20T14:31:58Z

@pckroon Agreed that cgsmiles to bigsmiles seems strange and not sure the use case anyway

that was my original idea but then it means having to interpret the bigsmiles thing making it a graph to make it a cgsmiles string. I think that would introduce a lot of code, or required an adopted cgsmiles parser that also handles bigsmiles or am I missing your point?

By the way I'm still thinking about adding one more level or resolution though:

Consider a bigsmiles like this, where A,B,C,D are four different residues but for sake of brevity I do not write them out.

T1{[#A][#B]}{[#C][#D]}T2

This bigsmiles describes a random block of AB followed by a random block of CD. At the moment the meta graph is just A, B, C, D which is fine. But you could also argue to make each stochastic object a node and blocks within that object another level node so that you get a tree with two resolutions:

[#T1][#st_obj1][#st_obj2][#T2].{st_obj1=[#A][#B],st_obj2=[#C][#D],#T1=[#T1],#T2=[#T2]}.{#A=[...],#B=[...],#C=[...]}

For taking this string and making it an actual polymer it does not matter, because you'd just take the last resolution anyways and then define the appropriate probabilities to connect. But for ML applications it could be interesting when one does message passing on meta nodes. Not sure it is worth the effort? What do you think

pckroon · 2024-09-24T11:39:34Z

Parsing bigsmiles to an internal CGsmiles object (networkx graph) does indeed require a separate parser, but does offer advantages in that you (well, someone) can then use the cgsmiles code. In addition, you can then use the cgsmiles writer, which makes testing easier.

As for the added resolution layer, the more complete the parser the happier I get (since I'm a completionist); whether it's worth the effort I don't know, you'd need to ask the ML people.

fgrunewald added 4 commits September 18, 2024 16:39

fix fragment replacement and right terminus

463514e

test draft

9b6f775

Merge branch 'master' into bigsmiles

d7b0b37

rename and complete tests

6a1ad0d

fgrunewald requested a review from pckroon September 18, 2024 16:10

pckroon requested changes Sep 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bigsmiles #21

Bigsmiles #21

fgrunewald commented Sep 18, 2024 •

edited

Loading

pckroon left a comment

fgrunewald commented Sep 20, 2024

pckroon commented Sep 24, 2024

Bigsmiles #21

Are you sure you want to change the base?

Bigsmiles #21

Conversation

fgrunewald commented Sep 18, 2024 • edited Loading

pckroon left a comment

Choose a reason for hiding this comment

fgrunewald commented Sep 20, 2024

pckroon commented Sep 24, 2024

fgrunewald commented Sep 18, 2024 •

edited

Loading