Skip to content

Coarse-Grained SMILES (CGsmiles) A Versatile Line Notation for Molecular Representations Across Multiple Resolutions

License

Notifications You must be signed in to change notification settings

gruenewald-lab/CGsmiles

Repository files navigation

Coarse-Grained SMILES (CGsmiles)

Overview

The CGSmiles line notation encodes arbitrary resolutions of molecules and defines the conversion between these resolutions unambiguously. For example, in coarse-grained (CG) simulations multiple atoms are represented as one large pseudo-atom often called bead. The conversion from the atomic resolution to the CG resolution can be described using the CGSmiles notation. In the Martini 3 force field, Benzene is represented as three particles. The CGSmiles string would be:

"{[#TC5]1[#TC5][#TC5]1}.{#TC5=[$]cc[$]}"

Additionally, multiple resolutions may be layered together so that a hirachical description between one or more CG resolutions becomes possible. Especially, expressing large polymeric molecules becomes simpler when using multiple resolution. For instance consider the copolymer Styreic-Melic Acid. It is an almost perfectly alternating polymer of maleic anhydrade and styrene. In CGSmiles, we can thus write 100 repeat units of this polymer by using three resolutions each contained in curly braces:

"{[#SMA]|100}.{#SMA=[#PS][#MAH]}.{#PS=[>]CC[<]c1ccccc1,#MHA=[<]C1C(=O)CC(=O)C1[>]}"

The CGSmiles Python package is created around this notation to read, write, and further process the resulting graphs. Reading and resolving provides the all the molecule information in form of NetworkX graphs, providing an easy way to interface with other python libraries.

There are a number of other packages and libraries, which use CGSmiles. They are mostly used for coarse-grained modelling with the Martini force field or atomic resolution molecular dynamics simulations. More informtion about the syntax and the different use cases can be found in this documentation. If you are here from one of the packages using CGSmiles check out the GettingStarted section to learn the syntax.

Installation

The easiest ways to install cgsmiles is using pip:

pip install git+https://github.com/gruenewald-lab/CGsmiles.git

In the future we will also distribute it through the Pypi package index but that is currently not supported. Note that the drawing module depends on the scipy and matplotlib packages. These need to be installed before the module can be used.

pip install scipy
pip install matplotlib

Examples

The CGSmiles python package is designed to read and resolve these smiles into networkx graphs that can be used for further tasks, for example drawing the relation between two resolutions (i.e. the mapping).

Martini 3 Benzene

import cgsmiles
from cgsmiles.drawing import draw_molecule

# Martini 3 Benzene
cgsmiles_str = "{[#TC5]1[#TC5][#TC5]1}.{#TC5=[$]cc[$]}"

# Resolve molecule into networkx graphs
res_graph, mol_graph = cgsmiles.MoleculeResolver.from_string(cgsmiles_str).resolve()

# Draw molecule at different resolutions
ax, pos = draw_molecule(mol_graph)

Related Tools

  • pysmiles: Lightweight python library for reading and writing SMILES. CGSmiles runs pysmiles in the background for interpreting atomic resolution fragments.
  • polyply: Generate topology files and coordinates for molecular dynamics (MD) from CGSmiles notation. It takes CGSmiles as input to generate all-atom or coarse-grained topologies and input parameters.
  • fast_forward: Forward map molecular dynamics trajectories from a high to lower resolution using CGSmiles.

Citation

When using cgsmiles to for your publication, please:

About

Coarse-Grained SMILES (CGsmiles) A Versatile Line Notation for Molecular Representations Across Multiple Resolutions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages