-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bigsmiles #21
base: master
Are you sure you want to change the base?
Bigsmiles #21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very pleased that you preempted by question about making it fully recursive, and I'm very happy you put some warnings in.
I don't think it makes sense to try and translate cgsmiles to bigsmiles, since cgsmiles can do things bigsmiles cannot (right?! What's the point otherwise)
Nice work. Still missing some docstrings though.
Conceptually, how do you feel about parsing the bigsmiles to a cgsmiles-like object, and feeding that to the writer?
@pckroon Agreed that cgsmiles to bigsmiles seems strange and not sure the use case anyway that was my original idea but then it means having to interpret the bigsmiles thing making it a graph to make it a cgsmiles string. I think that would introduce a lot of code, or required an adopted cgsmiles parser that also handles bigsmiles or am I missing your point? By the way I'm still thinking about adding one more level or resolution though: Consider a bigsmiles like this, where A,B,C,D are four different residues but for sake of brevity I do not write them out.
This bigsmiles describes a random block of AB followed by a random block of CD. At the moment the meta graph is just A, B, C, D which is fine. But you could also argue to make each stochastic object a node and blocks within that object another level node so that you get a tree with two resolutions:
For taking this string and making it an actual polymer it does not matter, because you'd just take the last resolution anyways and then define the appropriate probabilities to connect. But for ML applications it could be interesting when one does message passing on meta nodes. Not sure it is worth the effort? What do you think |
Parsing bigsmiles to an internal CGsmiles object (networkx graph) does indeed require a separate parser, but does offer advantages in that you (well, someone) can then use the cgsmiles code. In addition, you can then use the cgsmiles writer, which makes testing easier. As for the added resolution layer, the more complete the parser the happier I get (since I'm a completionist); whether it's worth the effort I don't know, you'd need to ask the ML people. |
@pckroon small additional feature; read bigsmiles and make a cgsmiles string out of it. This small utlity does not cover all the BigSmiles syntax but a sufficently good amount. This is useful for ML operations as well as setting up itp files from BigSmiles.
The idea is simple: each stochastic object becomes a node in a hyper resolution graph with the fragments being the all-atom blocks. It does not support nesting, because I don't care enough to make this fully recursive. But this parser is good enough for the 5000 bigsmiles strings in the block copolymer database.
I'm thinking if it is worth to do the other way around as well. But I think not every cgsmiles string can be converted to bigsmiles.