Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New biomol fields #400

Open
wants to merge 23 commits into
base: develop
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2490,7 +2490,9 @@ biomol_chains
- **name**: The chain name/letter.
- **residues**: A list of integers referring to the index of :field:`biomol_residues`, that belong to this chain.
The list SHOULD NOT be empty. The index of the first residue is 0.
- **types**: A list of tags specifying the type of molecules this chain contains.
- **types**: A list of custom tags/labels specifying the type of molecules this chain contains (e.g. 'protein').
This field is useful as an overview of every chain and as a query target for the structure.
Labels in this field are non-standard. Every implementation may use different labels according to its needs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advise standardizing labels at least for the most common molecule types to benefit the queryability of the field. Implementations could use their own labels, but prefixed with their own database-specific prefixes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. This way we could save a lot of work.

I have been searching for references for our labels in the current mmCIF format (the future PDB standard) and this is the best I have found. They are meant for assemblies and they do not totally suit me, but I will try to resemble them.

So I suggest the standard labels to be the following: 'PROTEIN', 'NUCLEIC ACID', 'CARBOHYDRATES', 'LIPID', 'MEMBRANE', 'LIGAND', 'ION', 'SOLVENT', 'OTHER'.

If you agree I will commit changes soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems like a good idea to define these labels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like basing on mmCIF. Maybe there is already a JSON representation for mmCIF data?

So I suggest the standard labels to be the following: 'PROTEIN', 'NUCLEIC ACID', 'CARBOHYDRATES', 'LIPID', 'MEMBRANE', 'LIGAND', 'ION', 'SOLVENT', 'OTHER'.

If you agree I will commit changes soon.

These labels sound very good. I would just render them in lowercase and describe the use of prefixes for custom labels in the form of <prefix>:<label>.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is already a JSON representation for mmCIF data?

In a fast search I found several mmCIF to JSON parsers and one of them seems to be the official one.

I would just render them in lowercase and describe the use of prefixes for custom labels in the form of :

Allright

- **sequences**: A list of residue sequences in current chain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the same sequence occurs twice in a chain. Should the sequence be listed here twice or just once?

- **sequence_types**: A list of tags specifying the type of each sequence in the :property:`sequences` field. The type of a sequence is defined by its components (e.g. 'aminoacids').
- There SHOULD NOT be two or more chains with the same :property:`name`.
Expand Down