Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: parameters/steps needed to build SEBS #390

Open
tylerjereddy opened this issue Oct 25, 2024 · 19 comments
Open

ENH: parameters/steps needed to build SEBS #390

tylerjereddy opened this issue Oct 25, 2024 · 19 comments

Comments

@tylerjereddy
Copy link

I see that there is some submission guidance for new building blocks and links at https://github.com/marrink-lab/polyply_1.0/wiki/Submit-polymer-parameters and https://github.com/marrink-lab/polyply_1.0/wiki/Tutorial:-writing-.ff-input-files. Nonetheless, I want to get some guidance for the task I'm being asked to deliver to get a sense for the amount of work involved, and so that I can properly identify the individual steps and what has already been done. I'm happy to contribute to the project and deal with review comments, but first I'd like to check on feasibility, etc.

The task is to build quite large polymers of SEBS (repeats of styrene, ethylene, butylene, and then styrene again, etc.). For the first pass, I can ignore branching, although we do want branching eventually. I just re-read the original Nature Comms paper and I see that styrene and ethylene are already used in published work, and some quick git grep suggests that both are available for GROMOS at least (we don't really care about forcefield, we just want the coords, but I think they are somewhat linked for polyply).

So, initial assessment might be that some work is needed for:

  • the remaining monomer unit (butylene)?
  • links that are possible between the monomers in the block copolymer
    • styrene -> ethylene
    • ethylene -> butylene
    • butylene -> styrene

Does that sound correct? Does the example PR your wiki cites (#227) cover the full spectrum of changes/additions I'd need to make for this to be possible? Or is there more that I should know about here? Laying out a set of steps could be helpful perhaps. It is probably better that I do it myself since we'll need to improve on this capability over time.

@fgrunewald
Copy link
Member

Hi @tylerjereddy,

Thanks for reaching out again. I think it should be possible to complete this project quite fast.

We have a new tool called gen_ff (#383) that is currently in beta testing. This tool allows you to automatically extract the ff-files from a small sample itp file generated with any FF-generator tool. For example, you could generate a small combination like: CH3:1 STYR:2 ETH:2 BUT:2 STYR:2 CH3:1 using acepype and antechamber. Then you give this input to gen_ff which writes the links and building blocks accordingly. Finally, you can generate the larger polymer itp file. Using that itp file and a corresponding topoloy file you can build the coordinates. So in principle this is a 3 step procedure without having to write anything manually:

  1. Generate coordinates of a small example polymer of SEBS. It should cover all required combinations.
  2. Generate itp file using LigParGen, Antechamber, ATB, or CGenFF.
  3. Call gen_ff to generate the polyply input file
  4. Call gen_params to generate parameters for the large polymer
  5. Call gen_coords to make coordinates
  6. Run small NpT to get to target volume.

The only thing is that you need to generate manually, are the coordinates of the initial small molecule. I do have a very hacky UFF polyply version that might work for this step as well.

However, if it is a really large polymer it might be better to generate the Martini coordinates first. Then run an NpT simulation to get to target volume and get a bit of relaxation. Then you backmap the entire thing to all-atom. Polyply can backmap this type of polymer directly. Thus, there is no need for backwards or backmapping files. Steps 1-6 are still needed, however, I think your polymer structure will be more relaxed.

Finally, I think it would be good to provide a bending rigidity in the coordinate generation. But we can discuss this at a later stage.

If this plan sounds reasonable to you, I can help with the individual steps as needed. Keep in mind that #383 is still in beta testing as well.

Cheers,
Fabian

@tylerjereddy
Copy link
Author

@fgrunewald At the gen_ff stage I ran into the traceback below the fold.

Traceback (most recent call last):
  File "/home/treddy/python_venvs/py_311_hemd/bin/polyply", line 292, in <module>
    main()
  File "/home/treddy/python_venvs/py_311_hemd/bin/polyply", line 288, in main
    subprogram(**args_dict)
  File "/home/treddy/python_venvs/py_311_hemd/lib/python3.11/site-packages/polyply/src/gen_ff.py", line 87, in gen_ff
    target_mol = _read_itp_file(itppath)
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/treddy/python_venvs/py_311_hemd/lib/python3.11/site-packages/polyply/src/gen_ff.py", line 53, in _read_itp_file
    read_itp(lines, force_field)
  File "/home/treddy/python_venvs/py_311_hemd/lib/python3.11/site-packages/vermouth/gmx/itp_read.py", line 477, in read_itp
    return list(director.parse(iter(lines)))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/treddy/python_venvs/py_311_hemd/lib/python3.11/site-packages/vermouth/parser_utils.py", line 110, in parse
    result = self.dispatch(line)(line, lineno)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/treddy/python_venvs/py_311_hemd/lib/python3.11/site-packages/vermouth/parser_utils.py", line 235, in parse_section
    raise IOError("Can't parse line {} in section '{}' because the "
OSError: Can't parse line 5 in section '['atomtypes']' because the section is unknown

I used acpype and a small-ish SEBS SMILES representation to get to that step, some of which you can see from the incantation my code complains about below. I'm not sure if I can get away with just using SMILES input, but figured I'd try since acpype seems to support it and it would be quite convenient.

subprocess.CalledProcessError: Command '['polyply', 'gen_ff', '-i', 'smiles_molecule.acpype/smiles_molecule_GMX.itp', '-s', 'CC(c1ccccc1)CC(c1ccccc1)CCCCCCCCCC(CC)CC(CC)CC(c1ccccc1)CC(c1ccccc1)']' returned non-zero exit status 1.

I'm a bit unclear on CGsmiles vs. SMILES, although there seem to be some docstrings with examples at https://github.com/gruenewald-lab/CGsmiles (I'd probably need to read more detailed docs to really understand the augmentations allowed beyond regular SMILES; there seem to be > and < anchors and some variable dividers for setting branching probabilities or something like that..).

I'd also note that the point charge calculations from acpype/amber_linux/bin/sqm were quite slow, but fair enough for now I guess.

@tylerjereddy
Copy link
Author

The error appears to be the same if I install the latest version of https://github.com/marrink-lab/vermouth-martinize locally as well (directly from master branch).

@fgrunewald fgrunewald added the in progress we're working on your issue but it might take a while label Nov 4, 2024
@fgrunewald
Copy link
Member

@tylerjereddy yes indeed this is a known issue. To solve the problem you can provide the top file instead of the itp file. It should be something like uff_<molecule name>_GMX.top. The problem is having the atomtypes in the itp file.

@fgrunewald fgrunewald removed the in progress we're working on your issue but it might take a while label Nov 4, 2024
@tylerjereddy
Copy link
Author

tylerjereddy commented Nov 4, 2024

Here is the traceback I see if I use smiles_molecule.acpype/smiles_molecule_GMX.top instead (same SMILES string as above):

Traceback (most recent call last):
  File "/home/treddy/python_venvs/py_311_hemd/bin/polyply", line 292, in <module>
    main()
  File "/home/treddy/python_venvs/py_311_hemd/bin/polyply", line 288, in main
    subprogram(**args_dict)
  File "/home/treddy/python_venvs/py_311_hemd/lib/python3.11/site-packages/polyply/src/gen_ff.py", line 90, in gen_ff
    meta_mol = MetaMolecule.from_cgsmiles_str(force_field=force_field,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/treddy/python_venvs/py_311_hemd/lib/python3.11/site-packages/polyply/src/meta_molecule.py", line 409, in from_cgsmiles_str
    resolver = MoleculeResolver.from_string(cgsmiles_str, last_all_atom=all_atom)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/treddy/python_venvs/py_311_hemd/lib/python3.11/site-packages/cgsmiles/resolve.py", line 434, in from_string
    molecule = read_cgsmiles(elements[0])
                             ~~~~~~~~^^^
IndexError: list index out of range

There is another top file, smiles_molecule.acpype/smiles_molecule_GMX_OPLS.top I could try--with that one I get a complaint about missing forcefield files. I could try to get those "included" if that seems more promising.

@tylerjereddy
Copy link
Author

Using the OPLS version of the GMX topology file ultimately ends up in the same traceback as above with gen_ff as well (if I patch in the missing include for OPLS FF). Problem with the cgsmiles package or the SMILES string I've provided above?

tylerjereddy added a commit to tylerjereddy/CGsmiles that referenced this issue Nov 6, 2024
* The error message when processing conventional SMILES strings
can be confusing, as observed at:
marrink-lab/polyply_1.0#390 (comment)

* For cases like that where there are no polymeric fragments (i.e., no
use of braces), this patch improves the error message (and adds a test
for it) to nudge the user to BigSMILES or CGSmiles formats.
@tylerjereddy
Copy link
Author

Ah, it looks like either BigSMILES or CGSmiles may be possible, but probably not conventional SMILES per the cross-listed PR to improve the error message.

I'll see if I can adjust my input string accordingly then.

@fgrunewald
Copy link
Member

@tylerjereddy yes, indeed it needs to be a valid CGSmiles string. The CGSmiles string defines the mapping between residues used by polyply to make the itp file and the all-atom graph. We use it to extract the building blocks and make it agnostic to the order of the atoms and their naming.

CGSmiles are different from BigSMILES as they describe transformations of resolutions. The original idea was to use BigSMILES, but they were insufficient for many cases.

As the docs are still under construction you can check this doc for a better explanation.

To give you an example: Let's say you have a PS polymer with 5 residues and a CH3 terminal group. The cgsmiles string should be:

{[#CH3][#PS]|5[#CH3]}.{#CH3=C[>][<],#PS=[<]CC[>]c1ccccc1}

The '>' and '<' are connectors that define which atoms to connect to get the all-atom graph. The first brace contains the residue graph using a syntax like SMILES.

For your case, I think the string should be something like:

{[#PS][#PS][#PE]|4[#BUT]|2[#PS]|4}.{#PS=[<]CC[>]c1ccccc1,#PE=[<]CC[>],#BUT=[<]CC[>](CC)}

Probably, you will end up with two PS terminal residues that are special, because you did not provide an end-group. We avoid having just hydrogen atoms at the end of a polymer for better balancing of charges.

@tylerjereddy
Copy link
Author

Thanks, I ran into a few more errors after switching to CGSmiles and posted an initial analysis starting at #383 (comment).

@fgrunewald
Copy link
Member

@ricalessandri could you perhaps have a look at this and help? I'm quite busy at the moment and not sure when I've time to take a more in depth look.

@tylerjereddy
Copy link
Author

tylerjereddy commented Nov 22, 2024

Ok, after the recent fixes in cgsmiles and advice at tylerjereddy/polyply_pr_383_repro#1 I'm back in business it seems, thanks! I may have more questions, but so far it is moving forward.

I'll need a way to conveniently cycle repeats of two residues at some point as well, maybe I can figure that out from the syntax (new residue composed of N repeats of the individual residues?) or I suppose even generate it programmatically with a long string.

@tylerjereddy
Copy link
Author

@fgrunewald @ricalessandri Probably getting close now, but my coordinates look a bit sparse perhaps (Hter=white; styrene=red; ethyl=orange; butyl=pink) for 10 polymers at density 0.7 g/mL?:

image

there were a few warnings like WARNING - general - Missing a link between residue 108 butylene and residue 109 ethylene.

I'll go ahead and try to build a much larger polymer system overnight, but I suspect something isn't quite right yet? The terminal H seems too far away from other molecules for example, no?

@ricalessandri
Copy link
Collaborator

The sparsity might be ok but the missing link is definitely not good. If the warnings are consistently between butylene and ethylene, that means you have a specific problem with that link while other links butylene-butylene, ethylene-ethylene (if you have those), etc. are fine. So you could look into that specific link (or if you're missing it altogether from the .ff file(s)).

@tylerjereddy
Copy link
Author

Ah good point, gen_ff is receiving SEBS but the mature polymer will have SEBEB...S so that pairwise link is probably missing for that reason.

@fgrunewald
Copy link
Member

@tylerjereddy in addition to what Riccardo said, you should always look at the polymer after an energy minization step. It is part of the protcol to get the inital coordinates.

@tylerjereddy
Copy link
Author

@fgrunewald @ricalessandri

So, after energy minimization the polymers now look far more "natural"/linearized. However, the density guarantees are clearly lost--these are huge polymer chains and they've expanded far outside the bounds of the original simulation "container."

This is good progress though thanks!

Is there any guidance for how I might "guarantee" preservation of density from what polyply gen_coords initially produced or is there no way to avoid having to specify a box size that is in the roughly appropriate range for the system? (it looks like the initial box size, which I did not specify myself, may have been too small by an order of magnitude or so?)

@ricalessandri
Copy link
Collaborator

Note that there's no such thing as being "outside" the simulation box because of periodic boundary conditions.

If you just energy-minimized the system, the simulation box won't change dimensions. So whatever target density you set through polyply gen_coords will be maintained.

@tylerjereddy
Copy link
Author

good point, maybe it just needs the right trjconv incantation to make the materials folks happy, I'll look into that

@ricalessandri
Copy link
Collaborator

gmx trjconv -f box.gro -o box_atom.gro -pbc atom -s topol.tpr will put everything "inside the box";
using -pbc whole will keep each molecule whole, so each atom may appear in a different box image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants