From eca24df9cd1adf73b4c3b9744c8c2197a0f8a9d8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=20Beltr=C3=A1n?= Date: Wed, 23 Feb 2022 11:40:40 +0100 Subject: [PATCH 01/21] added new biomol fields --- optimade.rst | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) diff --git a/optimade.rst b/optimade.rst index e4a72765d..a080658ff 100644 --- a/optimade.rst +++ b/optimade.rst @@ -2465,6 +2465,110 @@ Relationships with calculations MAY be used to indicate provenance where a struc Appendices ========== +Domain Specific Fields +---------------------- + +The fields below are all optional and are only used within specific research fields. + +Every field has a standard domain-specific prefix. + + +biomol_chains +~~~~~~ + +- **Description**: For each chain in the system there is a dictionary that describes this chain. Chains are groups of related residues (e.g. a polymer). + Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix. +- **Type**: list of dictionaries with the properties: + - :property:`name`: string (REQUIRED) + - :property:`residues`: list of integers (REQUIRED) + - :property:`types`: list of strings + - :property:`sequences`: list of strings + - :property:`sequence_types`: list of strings +- **Requirements/Conventions**: + - **Query**: Support for queries on this property is OPTIONAL. + If supported, only a subset of the filter features MAY be supported. + - **name**: The chain name/letter. + - **residues**: A list of integers referring to the index of :field:`biomol_residues`, that belong to this chain. + The list SHOULD NOT be empty. The index of the first residue is 0. + - **types**: A list of tags specifying the type of molecules this chain contains. + - **sequences**: A list of residue sequences in current chain. + - **sequence_types**: A list of tags specifying the type of each sequence in the :property:`sequences` field. The type of a sequence is defined by its components (e.g. 'aminoacids'). + - There SHOULD NOT be two or more chains with the same :property:`name`. + - Values in :property:`name` SHOULD be in capital letters. + - Values in :property:`name` SHOULD NOT be longer than 1 character when the number of chains is not greater than the number of letters in English alphabet (26). + - Values in :property:`sequences` SHOULD be in capital letters. + - Number of values in :property:`sequences` and :property:`sequence_types` MUST match. + +- **Examples**: + +.. code:: jsonc + { + "biomol_chains":[ + { + "name": "A", + "residues":[0,1,2,3, ...], + "types": ['protein', 'ions'], + "sequences": ['MSHHWGYG'], + "sequence_types": ['aminoacids'] + }, + { + "name": "B", + "residues":[54,55,56,57, ...], + "types": ['nucleic acid'], + "sequences": ['GATTACA'], + "sequence_types": ['nucleotides'] + }, + ] + } + +biomol_residues +~~~~~~ + +- **Description**: For each residue in the system there is a dictionary that describes this residue. Residues are groups of related atoms (e.g. an aminoacid). + Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix. +- **Type**: list of dictionaries with the properties: + - :property:`name`: string (REQUIRED) + - :property:`number`: integer (REQUIRED) + - :property:`insertion_code`: string or null (REQUIRED) + - :property:`sites`: list of integers (REQUIRED) +- **Requirements/Conventions**: + - **Query**: Support for queries on this property is OPTIONAL. + If supported, only a subset of the filter features MAY be supported. + - **name**: The residue name + - **number**: The residue number according to source notation. + - **insertion_code**: The residue insertion code. It MUST NOT be longer than 1 character. It MAY be null. + - **sites**: A list of integers referring to the index of :field:`cartesian_site_positions`, that belong to this residue. + The list SHOULD NOT be empty. The index of the first site is 0. + - There MUST NOT be two or more residues with the same integer in :property:`sites`. + - All :property:`name` and :property:`insertion_code` values SHOULD be in capital letters. + +- **Examples**: + +.. code:: jsonc + { + "biomol_residues":[ + { + "name": "PHE", + "number": 17, + "insertion_code": null, + "sites":[0,1,2,3, ...] + }, + { + "name": "ASP", + "number": 18, + "insertion_code": null, + "sites":[17,18,19,20, ...] + }, + { + "name": "LEU", + "number": 18, + "insertion_code": "A", + "sites":[29,30,31, ...] + }, + ] + } + + The Filter Language EBNF Grammar -------------------------------- From 572a29fb0800467af72fbb42262450e59c031060 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=20Beltr=C3=A1n?= Date: Wed, 23 Feb 2022 12:46:13 +0100 Subject: [PATCH 02/21] added new biomol sequence fields --- optimade.rst | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/optimade.rst b/optimade.rst index a080658ff..1db0b6a3b 100644 --- a/optimade.rst +++ b/optimade.rst @@ -2568,6 +2568,48 @@ biomol_residues ] } +biomol_sequences +~~~~~~ + +- **Description**: A list of residue sequences in current structure. It may be any type of sequence, as this type is further specified in :field:`biomol_sequence_types`. + Sequences may be grouped and ordered in any form (e.g. by chains, by fragments of covalently bonded atoms, etc.) as long as they make sense when querying structures by sequence. +- **Type**: list of strings +- **Requirements/Conventions**: + - **Query**: Support for queries on this property is OPTIONAL. + - There MUST be the same number of values that in :field:`biomol_sequence_types`. + - Values SHOULD be in capital letters. + +- **Examples**: + +.. code:: jsonc + { + "biomol_sequences":[ + 'MSHHWGYG', + 'GATTACA' + ] + } + + +biomol_sequence_types +~~~~~~ + +- **Description**: A list of tags specifying the type of each sequence in the :field:`biomol_sequences` field. + The type of a sequence is defined by its components (e.g. 'aminoacids'). +- **Type**: list of strings +- **Requirements/Conventions**: + - **Query**: Support for queries on this property is OPTIONAL. + - There MUST be the same number of values that in :field:`biomol_sequences`. + +- **Examples**: + +.. code:: jsonc + { + "biomol_sequence_types":[ + 'aminoacids', + 'nucleotides' + ] + } + The Filter Language EBNF Grammar -------------------------------- From 173172fc8099e04cc0b1a6d6167160dd9c6b90c1 Mon Sep 17 00:00:00 2001 From: Dani <44979434+d-beltran@users.noreply.github.com> Date: Wed, 23 Feb 2022 16:53:57 +0100 Subject: [PATCH 03/21] Update optimade.rst Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com> --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 1db0b6a3b..fec5f3926 100644 --- a/optimade.rst +++ b/optimade.rst @@ -2474,7 +2474,7 @@ Every field has a standard domain-specific prefix. biomol_chains -~~~~~~ +~~~~~~~~~~~~~ - **Description**: For each chain in the system there is a dictionary that describes this chain. Chains are groups of related residues (e.g. a polymer). Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix. From 5911d534f6bf587be277fee2db7d5acbb51f401f Mon Sep 17 00:00:00 2001 From: Dani <44979434+d-beltran@users.noreply.github.com> Date: Wed, 23 Feb 2022 16:56:44 +0100 Subject: [PATCH 04/21] Title underlines fit --- optimade.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/optimade.rst b/optimade.rst index fec5f3926..29ea75878 100644 --- a/optimade.rst +++ b/optimade.rst @@ -2522,7 +2522,7 @@ biomol_chains } biomol_residues -~~~~~~ +~~~~~~~~~~~~~~~ - **Description**: For each residue in the system there is a dictionary that describes this residue. Residues are groups of related atoms (e.g. an aminoacid). Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix. @@ -2569,7 +2569,7 @@ biomol_residues } biomol_sequences -~~~~~~ +~~~~~~~~~~~~~~~~ - **Description**: A list of residue sequences in current structure. It may be any type of sequence, as this type is further specified in :field:`biomol_sequence_types`. Sequences may be grouped and ordered in any form (e.g. by chains, by fragments of covalently bonded atoms, etc.) as long as they make sense when querying structures by sequence. @@ -2591,7 +2591,7 @@ biomol_sequences biomol_sequence_types -~~~~~~ +~~~~~~~~~~~~~~~~~~~~~ - **Description**: A list of tags specifying the type of each sequence in the :field:`biomol_sequences` field. The type of a sequence is defined by its components (e.g. 'aminoacids'). From 05a8fedb047bcdb401ab67d5782d9507c5288826 Mon Sep 17 00:00:00 2001 From: Dani <44979434+d-beltran@users.noreply.github.com> Date: Thu, 24 Feb 2022 11:10:07 +0100 Subject: [PATCH 05/21] More explained biomol_chain types --- optimade.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 29ea75878..121b3d5a9 100644 --- a/optimade.rst +++ b/optimade.rst @@ -2490,7 +2490,9 @@ biomol_chains - **name**: The chain name/letter. - **residues**: A list of integers referring to the index of :field:`biomol_residues`, that belong to this chain. The list SHOULD NOT be empty. The index of the first residue is 0. - - **types**: A list of tags specifying the type of molecules this chain contains. + - **types**: A list of custom tags/labels specifying the type of molecules this chain contains (e.g. 'protein'). + This field is useful as an overview of every chain and as a query target for the structure. + Labels in this field are non-standard. Every implementation may use different labels according to its needs. - **sequences**: A list of residue sequences in current chain. - **sequence_types**: A list of tags specifying the type of each sequence in the :property:`sequences` field. The type of a sequence is defined by its components (e.g. 'aminoacids'). - There SHOULD NOT be two or more chains with the same :property:`name`. From 76f690c0c366406cc3095ee1d8f42627e7697c63 Mon Sep 17 00:00:00 2001 From: Dani <44979434+d-beltran@users.noreply.github.com> Date: Fri, 25 Feb 2022 15:38:32 +0100 Subject: [PATCH 06/21] Added standard labels for biomol_chain types --- optimade.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index 121b3d5a9..b245bd132 100644 --- a/optimade.rst +++ b/optimade.rst @@ -2490,9 +2490,11 @@ biomol_chains - **name**: The chain name/letter. - **residues**: A list of integers referring to the index of :field:`biomol_residues`, that belong to this chain. The list SHOULD NOT be empty. The index of the first residue is 0. - - **types**: A list of custom tags/labels specifying the type of molecules this chain contains (e.g. 'protein'). + - **types**: A list of tags/labels specifying the type of molecules this chain contains (e.g. 'protein'). This field is useful as an overview of every chain and as a query target for the structure. - Labels in this field are non-standard. Every implementation may use different labels according to its needs. + Standard labels for this field are the follwoing: 'protein', 'nucleic acid', 'carbohydrates', 'lipid', 'membrane', 'ligand', 'ion', 'solvent' and 'other'. + The list SHOULD contain values within the standard labels. + Additional custom labels MAY be used. These labels MUST include the database-provider-specific prefix with the following format: :