Skip to content

Commit

Permalink
Merge pull request #545 from marrink-lab/documentation
Browse files Browse the repository at this point in the history
Documentation
  • Loading branch information
fgrunewald authored Nov 30, 2023
2 parents 899a8df + 571283a commit caa8f63
Show file tree
Hide file tree
Showing 29 changed files with 949 additions and 68 deletions.
29 changes: 29 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3"

formats:
- pdf

# Build documentation in the docs/ directory with Sphinx
sphinx:
builder: html
fail_on_warning: true
configuration: doc/source/conf.py

# We recommend specifying your dependencies to enable reproducible builds:
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- method: pip
path: .
- requirements: requirements-docs.txt
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,11 @@ The documentation of the vermouth python library will come soon.
year={2022}}
```

## Documentation

More complete documentation, including API documentation can be found at
https://vermouth-martinize.readthedocs.io/

## License

Martinize2 and vermouth are distributed under the Apache 2.0 license.
Expand Down
8 changes: 4 additions & 4 deletions bin/martinize2
Original file line number Diff line number Diff line change
Expand Up @@ -797,22 +797,22 @@ def entry():

if args.list_blocks:
print("The following Blocks are known to force field {}:".format(args.from_ff))
print(", ".join(known_force_fields[args.from_ff].blocks))
print(", ".join(sorted(known_force_fields[args.from_ff].blocks)))
print(
"The following Modifications are known to force field {}:".format(
args.from_ff
)
)
print(", ".join(known_force_fields[args.from_ff].modifications))
print(", ".join(sorted(known_force_fields[args.from_ff].modifications)))
print()
print("The following Blocks are known to force field {}:".format(args.to_ff))
print(", ".join(known_force_fields[args.to_ff].blocks))
print(", ".join(sorted(known_force_fields[args.to_ff].blocks)))
print(
"The following Modifications are known to force field {}:".format(
args.to_ff
)
)
print(", ".join(known_force_fields[args.to_ff].modifications))
print(", ".join(sorted(known_force_fields[args.to_ff].modifications)))
parser.exit()

if args.elastic and args.govs_includes:
Expand Down
4 changes: 2 additions & 2 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@
# The full version, including alpha/beta/rc tags
release = get_distribution('vermouth').version
# The short X.Y version
version = '.'.join(release.split('.')[:2])

# version = '.'.join(release.split('.')[:2])
version = release

# -- General configuration ---------------------------------------------------

Expand Down
11 changes: 4 additions & 7 deletions doc/source/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ method. The approach where links only affect the parameters where they depend on
the local structure makes it easier to reason about how the final topology is
constructed, and the performance is better.

Besides nodes, edges and interactions links also describe non-edges, patterns
Besides nodes, edges, and interactions, links also describe non-edges, patterns
and removed interactions. Non-edges and patterns are used when matching the link
to a molecule. Where there is a non-edge in the link there cannot be an edge in
the molecule, and the atoms involved do not need to be present in the molecule.
Expand All @@ -83,7 +83,9 @@ atoms/particles that should already be described by the block and atoms that are
only described by the modification.

A modification can add or remove nodes, change node attributes, and add, change,
or remove interactions; much like a `Link`_.
or remove interactions; much like a `Link`_. Note that a modification *must* always
add at least one node. Otherwise there will be no unidentified nodes to be picked
up by the processor.

Modifications can be defined through :ref:`.ff files <file_formats:.ff file format>`.
See also: :ref:`Identify modifications <martinize2_workflow:Identify modifications>`.
Expand All @@ -101,11 +103,6 @@ Note that this is only a subset of a force field in the MD sense: a VerMoUTH
non-bonded parameters (only the particle types are included), or functional
forms.

The ``universal`` force field deserves special mention. If not overridden with
the ``-from`` flag this force field is used. This force field does not define
any MD parameters, but this is fine. Instead, this force field defines only atom
names and the associated connections.

Mapping
-------
A :class:`~vermouth.map_parser.Mapping` describes how molecular fragments can
Expand Down
4 changes: 2 additions & 2 deletions doc/source/file_formats.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
File formats
============
VerMoUTH introduces two new file formats. The ``.ff`` format for defining
:ref:`blocks <data:block>`, :ref:`links <data:link>` and :
ref:`modifications <data:modification>`. Note that you can also define blocks
:ref:`blocks <data:block>`, :ref:`links <data:link>` and
:ref:`modifications <data:modification>`. Note that you can also define blocks
(and basic links) with Gromacs ``.itp`` and ``.rtp`` files. The ``.mapping``
format can be used to define :ref:`mappings <data:mapping>`. Mappings that don't
cross residue boundaries can also be defined using ``.map`` files.
Expand Down
19 changes: 8 additions & 11 deletions doc/source/general_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,26 +26,22 @@ pip.
pip install vermouth
The behavior of the ``pip`` command can vary depending on the specificity of your
The behavior of the ``pip`` command can vary depending on the specifics of your
python installation. See the `documentation on installing a python package
<https://packaging.python.org/tutorials/installing-packages/#installing-packages>`_
to learn more.

Vermouth has `SciPy <https://scipy.org>`_ as *optional* dependency. If available
it will be used to accelerate the distance calculations when `making bonds
<martinize2_workflow:Make bonds>`_

Quickstart
----------
The CLI of martinize2 is very similar to that of martinize1, and can often be
The CLI of martinize2 is very similar to that of [martinize1]_, and can often be
used as a drop-in replacement. For example:

.. code-block:: bash
martinize2 -f lysozyme.pdb -x cg_protein.pdb -o topol.top
-ff martini3001 -dssp -elastic
This will read an atomistic ``lysozyme.pdb`` and produce a Martini3_ compatible
This will read an atomistic ``lysozyme.pdb`` and produce a [Martini3]_ compatible
structure and topology at ``cg_protein.pdb`` and ``topol.top`` respectively. It
will use the program [DSSP]_ to determine the proteins secondary structure (which
influences the topology), and produce an elastic network. See ``martinize2 -h``
Expand Down Expand Up @@ -89,7 +85,8 @@ Kroon, P.C. (2020). Martinize 2 -- VerMoUTH. *Aggregate, automate, assemble* (pp

References
----------
.. [Martini3] P.C.T. Souza, R. Alessandri, J. Barnoud, S. Thallmair, I. Faustino, F. Grünewald, et al., Martini 3: a general purpose force field for coarse-grained molecular dynamics, Nat. Methods. 18 (2021) 382–388. doi:10.1038/s41592-021-01098-3.
.. [VMD] W. Humphrey, A. Dalke and K. Schulten, "VMD - Visual Molecular Dynamics", J. Molec. Graphics, 1996, vol. 14, pp. 33-38. http://www.ks.uiuc.edu/Research/vmd/.
.. [DSSP] - W.G. Touw, C. Baakman, J. Black, T.A.H. te Beek, E. Krieger, R.P. Joosten, et al., A series of PDB-related databanks for everyday needs, Nucleic Acids Res. 43 (2015) D364–D368. doi:10.1093/nar/gku1028.
- W. Kabsch, C. Sander, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features., Biopolymers. 22 (1983) 2577–637. doi:10.1002/bip.360221211.
.. [Martini3] P.C.T. Souza, R. Alessandri, J. Barnoud, S. Thallmair, I. Faustino, F. Grünewald, et al., Martini 3: a general purpose force field for coarse-grained molecular dynamics, Nat. Methods. 18 (2021) 382–388. https://doi.org/10.1038/s41592-021-01098-3
.. [martinize1] de Jong, D. H., Singh, G., Bennett, W. F. D., Arnarez, C., Wassenaar, T. a, Schäfer, L. v., Periole, X., Tieleman, D. P., & Marrink, S. J. (2013). Improved Parameters for the Martini Coarse-Grained Protein Force Field. Journal of Chemical Theory and Computation, 9(1), 687–697. https://doi.org/10.1021/ct300646g
.. [VMD] W. Humphrey, A. Dalke and K. Schulten, "VMD - Visual Molecular Dynamics", J. Molec. Graphics, 1996, vol. 14, pp. 33-38. http://www.ks.uiuc.edu/Research/vmd/
.. [DSSP] - W.G. Touw, C. Baakman, J. Black, T.A.H. te Beek, E. Krieger, R.P. Joosten, et al., A series of PDB-related databanks for everyday needs, Nucleic Acids Res. 43 (2015) D364–D368. https://doi.org/10.1093/nar/gku1028
- W. Kabsch, C. Sander, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features., Biopolymers. 22 (1983) 2577–637. https://doi.org/10.1002/bip.360221211
2 changes: 1 addition & 1 deletion doc/source/graph_algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ the equivalence criteria.
Subgraph isomorphism
++++++++++++++++++++
A subgraph isomorphism is a :ref:`graph_algorithms:graph isomorphism`, but
without the constraint that :math:`|H| = |G|`. Instead, :math:`|H| <= |G|` if
without the constraint that :math:`|H| = |G|`. Instead, :math:`|H| \le |G|` if
:math:`H` is subgraph isomorphic to :math:`G`.

Induced subgraph isomorphism
Expand Down
44 changes: 23 additions & 21 deletions doc/source/martinize2_workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ We take into account the following PDB records: ``MODEL`` and ``ENDMDL`` to
determine which model to parse; ``ATOM`` and ``HETATM``; ``TER``, which can be
used to separate molecules; ``CONECT``, which is used to add edges; and ``END``.

Will issue a ``pdb-alternate`` warning if any atoms in the PDB file have an
We issue a ``pdb-alternate`` warning if any atoms in the PDB file have an
alternate conformation that is not 'A', since those will always be ignored.

Relevant CLI options: ``-f``; ``-model``; ``-ignore``; ``-ignh``.
Expand All @@ -67,36 +67,36 @@ same name, nor when there is no :ref:`data:Block` corresponding to the residue
[#]_. Note that this will only ever create edges *within* residues.

Edges will be added based on distance when they are close enough together,
except for a few exceptions (below). Atoms will be considered close enough based
except for a few exceptions (see below). Atoms will be considered close enough based
on their element (taken from either the PDB file directly, or deduced from atom
name [#]_). The distance threshold is multiplied by ``-bonds-fudge`` to allow
for conformations that are slightly out-of-equilibrium. Edges will not be added
from distances in two cases: 1) if edges could be added based on atom names no
edges will be added between atoms that are not bonded in the reference
:ref:`data:Block`. 2) No edges will be added between residues if one of the
atoms involved is a hydrogen atom. Edges added this way are logged as debug
output.
:ref:`data:Block`. 2) If the edge would connect 2 residues, and at least one of
the atoms involved is a hydrogen atom. Edges added based on distance are logged
as debug output.

If your input structure is far from equilibrium and adding edges based on
distance is likely to produce erroneous results, make sure to provide ``CONECT``
records describing at least the edges between residues, and between atoms
involved in modifications, such as termini and PTMs.

Will issue a ``general`` warning when it is requested to add edges based on atom
We issue a ``general`` warning when it is requested to add edges based on atom
names, but this cannot be done for some reason. This commonly happens when your
input structure is a homo multimer without ``TER`` record and identical residue
numbers and chain identifiers across the monomers. In this case martinize2
cannot distinguish the atom "N", residue ALA1, chain "A" from the atom "N",
residue ALA1, chain "A" in the next monomer. The easiest solution is to place
strategic ``TER`` records in your PDB file.
residue ALA1, chain "A" in the next monomer. The easiest solution in this case
is to place strategic ``TER`` records in your PDB file.

Relevant CLI options: ``-bond-from``; ``-bonds-fudge``

.. [#] Based on residue name.
.. [#] The method for deriving the element from an atom name is extremely
simplistic: the first letter is used. This will go wrong for two-letter
elements such as 'Fe', 'Cl', and 'Cu'. In those cases, make sure your PDB
file specified the correct element. See also:
file specifies the correct element. See also:
:func:`~vermouth.graph_utils.add_element_attr`
Annotate mutations and modifications
Expand All @@ -110,8 +110,8 @@ PTMs and termini. This is done in part by
The ``-mutate`` option can be used to change the residue name of one or more
residues. For example, you can specify ``-mutate PHE42:ALA`` to mutate all
residues with residue name "PHE" and residue number 42 to "ALA". Or change all
"HSE" residues to "HIS": ``-mutate HSE:HIS``. Mutations can be specified in a
similar way.
"HSE" residues to "HIS": ``-mutate HSE:HIS``. Modifications can be specified in
a similar way.

The specifications ``nter`` and ``cter`` can be used to quickly refer to all N-
and C-terminal residues respectively [#]_. In addition, the CLI options
Expand All @@ -127,8 +127,9 @@ Relevant CLI options: ``-mutate``, ``-modify``, ``-nter``, ``-cter``, ``-nt``

.. [#] N- and C-termini are defined as residues with 1 neighbour and having a
higher or lower residue number than the neighbour, respectively. Note that
this does not include zwitterionic amino acids!
This also means that if your protein has a chain break you'll end up with
this definition also includes termini for non-proteins, but it does not
include zwitterionic amino acids!
This also means that if your polymer has a chain break you'll end up with
more termini than you would otherwise expect.
2) Repair the input graph
Expand All @@ -140,7 +141,7 @@ modifications such as PTMs.
Repair graph
------------
The first step is to complete the graph so that it contains all atoms described
by the reference :ref:`data:Block`, and that all atoms have the correct names.
by the reference :ref:`data:Block`, and so that all atoms have the correct names.
These blocks are taken from the input force field based on residue names (taking
any mutations and modifications into account).
:class:`~vermouth.processors.repair_graph.RepairGraph` takes care of all this.
Expand All @@ -161,18 +162,19 @@ found. This sorting also speeds up the calculation significantly, so if you're
working with a system containing large residues consider correcting some of the
atom names.

Will issue an ``unknown-residue`` warning if no Block can be retrieved for a
We issue an ``unknown-residue`` warning if no :ref:`data:Block` can be retrieved for a
given residue name. In this case the entire molecule will be removed from the
system.

Identify modifications
----------------------
Secondly, all modifications are identified. `Repair graph`_ will also tag all
Secondly, all modifications are identified. `Repair graph`_ also tags all
atoms it did not recognise, and those are processed by
:class:`~vermouth.processors.canonicalize_modifications.CanonicalizeModifications`.

This is done by finding the solution where all unknown atoms are covered by the
atoms of exactly one :ref:`data:Modification`, where the modification must be an
Modifications are identified by finding the solution where all tagged atoms are
covered by the atoms of exactly one :ref:`data:Modification`, where the
modification must be an
:ref:`induced subgraph <graph_algorithms:Induced subgraph isomorphism>` of the
molecule. Every modification must contain at least one "anchoring" atom, which
is an atom that is also described by a :ref:`data:Block`. Unknown atoms are
Expand All @@ -182,7 +184,7 @@ equal if their atom name is equal. Because modifications must be
input structure there can be no missing atoms!

After this step all atoms will have correct atom names, and any residues that
are include modifications will be labelled. This information is later used
include modifications will be labelled. This information is later used
during the :ref:`resolution transformation <martinize2_workflow:3) Resolution transformation>`

An ``unknown-input`` warning will be issued if a modification cannot be
Expand All @@ -199,7 +201,7 @@ The resolution transformation is done by
your molecules at the target resolution, based on the available mappings. These
mappings are read from the ``.map`` and ``.mapping`` files available in the
library [#]_. See also :ref:`file_formats:File formats`. In essence these
mappings describe how molecular fragments (atoms and bonds) correspond to a
mappings describe how molecular fragments (nodes and edges) correspond to a
block in the target force field. We find all the ways these mappings can fit
onto the input molecule, and add the corresponding blocks and modifications to
the resulting molecule.
Expand All @@ -217,7 +219,7 @@ particles they map to in the output force field will also be connected.
Interactions across separate blocks will be added in the next step.

The processor will do some sanity checking on the resulting molecule, and issue
an ``unmapped-atom`` warning if there are modifications in the input molecule
an ``unmapped-atom`` warning if there are atoms in the input molecule
for which no mapping can be found. In addition, this warning will also be issued
if there are any non-hydrogen atoms that are not mapped to the output molecule.
A more serious ``inconsistent-data`` warning will be issued for the following
Expand Down
16 changes: 16 additions & 0 deletions doc/source/processors.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,18 @@
Processor
=========
:class:`Processors <vermouth.processors.processor.Processor>` are relatively
simple. They form the fundamental steps of the martinize2 pipeline. Processors
are called via their :meth:`~vermouth.processors.processor.Processor.run_system`
method. The default implementation of this method iterates over the molecules
in the system, and runs the :meth:`~vermouth.processors.processor.Processor.run_molecule`
method on them. This means that implementations of Processors must implement
either a ``run_system`` method, or a ``run_molecule`` method. If the processor
can be run on independent molecules the ``run_molecule`` method is preferred;
``run_system`` should be used only for cases where the problem at hand cannot
be separated in tasks-per-molecule.

In their ``run_molecule`` method Processor implementations are free to either
modify :class:`molecules <vermouth.molecule.Molecule>` or create new ones.
Either way, they must return a :class:`~vermouth.molecule.Molecule`. The
``run_system`` will be called with a :class:`~vermouth.system.System`, which
will be modified in place.
2 changes: 0 additions & 2 deletions doc/source/tutorials/1_simple_protein_aa/index.rst

This file was deleted.

2 changes: 0 additions & 2 deletions doc/source/tutorials/2_simple_protein_cg/index.rst

This file was deleted.

3 changes: 0 additions & 3 deletions doc/source/tutorials/3_membrane_protein/index.rst

This file was deleted.

2 changes: 0 additions & 2 deletions doc/source/tutorials/4_branched_polymer/index.rst

This file was deleted.

2 changes: 0 additions & 2 deletions doc/source/tutorials/5_glycosylated_protein/index.rst

This file was deleted.

Loading

0 comments on commit caa8f63

Please sign in to comment.