-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing gemmi
-based mmcif reader (with easy extension to PDB/PDBx and mmJSON)
#4712
Open
marinegor
wants to merge
33
commits into
MDAnalysis:develop
Choose a base branch
from
marinegor:feature/mmcif
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
aa2a88f
Start working on MMCIF parser
marinegor 218cf43
Add first (not working) version of MMCIFReader and MMCIF topology parser
marinegor 7f78e02
Do some squashing
marinegor 6682d6e
Remove inherited docs
marinegor 817f3a0
Try improving the parsing
marinegor 3cc8c80
Try three independent loops over the model
marinegor f1bf325
Merge remote-tracking branch 'upstream/develop' into feature/mmcif
marinegor d21c220
Add gemmi dependency
marinegor 2a1be15
necessary params
marinegor 77645e6
finished sorting atom attrs
marinegor 91e6942
add function for transformation into *idx
marinegor 9a0c086
oh damn seems to finally be working
marinegor 9c731df
remove TODOs
marinegor 8b40ec7
Remove debug prints
marinegor bdcbd73
Merge branch 'develop' into feature/mmcif
marinegor 401a4d3
try to pack things into separate class in utils?
marinegor 9c336bd
remove unnecessary functions
marinegor def88e4
copy all loops into separate functions
marinegor cabfd37
Move loops over structures into functions
marinegor 4c9d930
Move coordinate fetching into function for the coordinate reader as well
marinegor 184491a
Fix imports
marinegor 3de8565
Start adding documentation
marinegor ca6ebbb
Reference MMCIFParser in PDBParser
marinegor 45077ad
Add documentation for trajectory and topology parsers
marinegor 9a1a59a
Add mmcif tests
marinegor 27c10d6
Update format specifications
marinegor 950cfcf
Write simple tests
marinegor 8d1a8b5
Merge remote-tracking branch 'upstream/develop' into feature/mmcif
marinegor ef29338
update github action with gemmi
marinegor caca17e
fix gemmi import errors
marinegor f0e49cc
add mmcif testfiles
marinegor b7ada7c
add mmcif to __all__
marinegor e80632c
add black instead of ruff
marinegor File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
import logging | ||
import warnings | ||
|
||
import numpy as np | ||
|
||
from . import base | ||
|
||
try: | ||
import gemmi | ||
|
||
HAS_GEMMI = True | ||
except ImportError: | ||
HAS_GEMMI = False | ||
|
||
logger = logging.getLogger("MDAnalysis.coordinates.MMCIF") | ||
|
||
|
||
def get_coordinates(model: "gemmi.Model") -> np.ndarray: | ||
"""Get coordinates of all atoms in the `gemmi.Model` object. | ||
|
||
Parameters | ||
---------- | ||
model | ||
input `gemmi.Model`, e.g. `gemmi.read_structure('file.cif')[0]` | ||
|
||
Returns | ||
------- | ||
np.ndarray, shape [n, 3], where `n` is the number of atoms in the structure. | ||
""" | ||
return np.array( | ||
[[*at.pos.tolist()] for chain in model for res in chain for at in res] | ||
) | ||
|
||
|
||
class MMCIFReader(base.SingleFrameReaderBase): | ||
"""Reads from an MMCIF file using ``gemmi`` library as a backend. | ||
|
||
Notes | ||
----- | ||
|
||
If the structure represents an ensemble, only the first structure in the ensemble | ||
is read here (and a warning is thrown). Also, if the structure has a placeholder "CRYST1" | ||
record (1, 1, 1, 90, 90, 90), it's set to ``None`` instead. | ||
|
||
.. versionadded:: 2.8.0 | ||
""" | ||
|
||
format = ["cif", "cif.gz", "mmcif"] | ||
units = {"time": None, "length": "Angstrom"} | ||
|
||
def _read_first_frame(self): | ||
structure = gemmi.read_structure(self.filename) | ||
cell_dims = np.array( | ||
[ | ||
getattr(structure.cell, name) | ||
for name in ("a", "b", "c", "alpha", "beta", "gamma") | ||
] | ||
) | ||
if len(structure) > 1: | ||
warnings.warn( | ||
f"File {self.filename} has {len(structure)} models, but only the first one will be read" | ||
) | ||
if len(structure) > 1: | ||
warnings.warn( | ||
"MMCIF model {self.filename} contains {len(model)=} different models, " | ||
"but only the first one will be used to assign the topology" | ||
) # TODO: if the structures represent timestamps, can parse them with :func:`get_coordinates`. | ||
|
||
model = structure[0] | ||
coords = get_coordinates(model) | ||
self.n_atoms = len(coords) | ||
self.ts = self._Timestep.from_coordinates(coords, **self._ts_kwargs) | ||
if np.allclose(cell_dims, np.array([1.0, 1.0, 1.0, 90.0, 90.0, 90.0])): | ||
warnings.warn( | ||
"1 A^3 CRYST1 record," | ||
" this is usually a placeholder." | ||
" Unit cell dimensions will be set to None." | ||
) | ||
self.ts.dimensions = None | ||
else: | ||
self.ts.dimensions = cell_dims | ||
self.ts.frame = 0 | ||
|
||
def close(self): | ||
pass | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add optional deps down in the optional deps section below.