From eca24df9cd1adf73b4c3b9744c8c2197a0f8a9d8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dani=20Beltr=C3=A1n?= <daniel.beltran@irbbarcelona.org>
Date: Wed, 23 Feb 2022 11:40:40 +0100
Subject: [PATCH 01/21] added new biomol fields

---
 optimade.rst | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/optimade.rst b/optimade.rst
index e4a72765d..a080658ff 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -2465,6 +2465,110 @@ Relationships with calculations MAY be used to indicate provenance where a struc
 Appendices
 ==========
 
+Domain Specific Fields
+----------------------
+
+The fields below are all optional and are only used within specific research fields.
+
+Every field has a standard domain-specific prefix.
+
+
+biomol_chains
+~~~~~~
+
+- **Description**: For each chain in the system there is a dictionary that describes this chain. Chains are groups of related residues (e.g. a polymer).
+  Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix.
+- **Type**: list of dictionaries with the properties:
+   - :property:`name`: string (REQUIRED)
+   - :property:`residues`: list of integers (REQUIRED)
+   - :property:`types`: list of strings
+   - :property:`sequences`: list of strings
+   - :property:`sequence_types`: list of strings
+- **Requirements/Conventions**:
+   - **Query**:  Support for queries on this property is OPTIONAL.
+     If supported, only a subset of the filter features MAY be supported.
+   - **name**: The chain name/letter.
+   - **residues**: A list of integers referring to the index of :field:`biomol_residues`, that belong to this chain.
+     The list SHOULD NOT be empty. The index of the first residue is 0.
+   - **types**: A list of tags specifying the type of molecules this chain contains.
+   - **sequences**: A list of residue sequences in current chain.
+   - **sequence_types**: A list of tags specifying the type of each sequence in the :property:`sequences` field. The type of a sequence is defined by its components (e.g. 'aminoacids').
+   - There SHOULD NOT be two or more chains with the same :property:`name`.
+   - Values in :property:`name` SHOULD be in capital letters.
+   - Values in :property:`name` SHOULD NOT be longer than 1 character when the number of chains is not greater than the number of letters in English alphabet (26).
+   - Values in :property:`sequences` SHOULD be in capital letters.
+   - Number of values in :property:`sequences` and :property:`sequence_types` MUST match.
+
+- **Examples**:
+
+.. code:: jsonc
+  {
+    "biomol_chains":[
+      {
+        "name": "A",
+        "residues":[0,1,2,3, ...],
+        "types": ['protein', 'ions'],
+	      "sequences": ['MSHHWGYG'],
+	      "sequence_types": ['aminoacids']
+      },
+      {
+        "name": "B",
+        "residues":[54,55,56,57, ...],
+        "types": ['nucleic acid'],
+	      "sequences": ['GATTACA'],
+	      "sequence_types": ['nucleotides']
+      },
+    ]
+  }
+
+biomol_residues
+~~~~~~
+
+- **Description**: For each residue in the system there is a dictionary that describes this residue. Residues are groups of related atoms (e.g. an aminoacid).
+  Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix.
+- **Type**: list of dictionaries with the properties:
+   - :property:`name`: string (REQUIRED)
+   - :property:`number`: integer (REQUIRED)
+   - :property:`insertion_code`: string or null (REQUIRED)
+   - :property:`sites`: list of integers (REQUIRED)
+- **Requirements/Conventions**:
+   - **Query**:  Support for queries on this property is OPTIONAL.
+     If supported, only a subset of the filter features MAY be supported.
+   - **name**: The residue name
+   - **number**: The residue number according to source notation.
+   - **insertion_code**: The residue insertion code. It MUST NOT be longer than 1 character. It MAY be null.
+   - **sites**: A list of integers referring to the index of :field:`cartesian_site_positions`, that belong to this residue.
+     The list SHOULD NOT be empty. The index of the first site is 0.
+   - There MUST NOT be two or more residues with the same integer in :property:`sites`.
+   - All :property:`name` and :property:`insertion_code` values SHOULD be in capital letters.
+
+- **Examples**:
+
+.. code:: jsonc
+  {
+    "biomol_residues":[
+      {
+        "name": "PHE",
+	      "number": 17,
+	      "insertion_code": null,
+        "sites":[0,1,2,3, ...]
+      },
+      {
+        "name": "ASP",
+	      "number": 18,
+	      "insertion_code": null,
+        "sites":[17,18,19,20, ...]
+      },
+      {
+        "name": "LEU",
+	      "number": 18,
+        "insertion_code": "A",
+        "sites":[29,30,31, ...]
+      },
+    ]
+  }
+
+
 The Filter Language EBNF Grammar
 --------------------------------
 

From 572a29fb0800467af72fbb42262450e59c031060 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dani=20Beltr=C3=A1n?= <daniel.beltran@irbbarcelona.org>
Date: Wed, 23 Feb 2022 12:46:13 +0100
Subject: [PATCH 02/21] added new biomol sequence fields

---
 optimade.rst | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/optimade.rst b/optimade.rst
index a080658ff..1db0b6a3b 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -2568,6 +2568,48 @@ biomol_residues
     ]
   }
 
+biomol_sequences
+~~~~~~
+
+- **Description**: A list of residue sequences in current structure. It may be any type of sequence, as this type is further specified in :field:`biomol_sequence_types`.
+  Sequences may be grouped and ordered in any form (e.g. by chains, by fragments of covalently bonded atoms, etc.) as long as they make sense when querying structures by sequence.
+- **Type**: list of strings
+- **Requirements/Conventions**:
+   - **Query**:  Support for queries on this property is OPTIONAL.
+   - There MUST be the same number of values that in :field:`biomol_sequence_types`.
+   - Values SHOULD be in capital letters.
+
+- **Examples**:
+
+.. code:: jsonc
+  {
+    "biomol_sequences":[
+      'MSHHWGYG',
+      'GATTACA'
+    ]
+  }
+
+
+biomol_sequence_types
+~~~~~~
+
+- **Description**: A list of tags specifying the type of each sequence in the :field:`biomol_sequences` field.
+  The type of a sequence is defined by its components (e.g. 'aminoacids').
+- **Type**: list of strings
+- **Requirements/Conventions**:
+   - **Query**:  Support for queries on this property is OPTIONAL.
+   - There MUST be the same number of values that in :field:`biomol_sequences`.
+
+- **Examples**:
+
+.. code:: jsonc
+  {
+    "biomol_sequence_types":[
+      'aminoacids',
+      'nucleotides'
+    ]
+  }
+
 
 The Filter Language EBNF Grammar
 --------------------------------

From 173172fc8099e04cc0b1a6d6167160dd9c6b90c1 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Wed, 23 Feb 2022 16:53:57 +0100
Subject: [PATCH 03/21] Update optimade.rst

Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com>
---
 optimade.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/optimade.rst b/optimade.rst
index 1db0b6a3b..fec5f3926 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -2474,7 +2474,7 @@ Every field has a standard domain-specific prefix.
 
 
 biomol_chains
-~~~~~~
+~~~~~~~~~~~~~
 
 - **Description**: For each chain in the system there is a dictionary that describes this chain. Chains are groups of related residues (e.g. a polymer).
   Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix.

From 5911d534f6bf587be277fee2db7d5acbb51f401f Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Wed, 23 Feb 2022 16:56:44 +0100
Subject: [PATCH 04/21] Title underlines fit

---
 optimade.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/optimade.rst b/optimade.rst
index fec5f3926..29ea75878 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -2522,7 +2522,7 @@ biomol_chains
   }
 
 biomol_residues
-~~~~~~
+~~~~~~~~~~~~~~~
 
 - **Description**: For each residue in the system there is a dictionary that describes this residue. Residues are groups of related atoms (e.g. an aminoacid).
   Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix.
@@ -2569,7 +2569,7 @@ biomol_residues
   }
 
 biomol_sequences
-~~~~~~
+~~~~~~~~~~~~~~~~
 
 - **Description**: A list of residue sequences in current structure. It may be any type of sequence, as this type is further specified in :field:`biomol_sequence_types`.
   Sequences may be grouped and ordered in any form (e.g. by chains, by fragments of covalently bonded atoms, etc.) as long as they make sense when querying structures by sequence.
@@ -2591,7 +2591,7 @@ biomol_sequences
 
 
 biomol_sequence_types
-~~~~~~
+~~~~~~~~~~~~~~~~~~~~~
 
 - **Description**: A list of tags specifying the type of each sequence in the :field:`biomol_sequences` field.
   The type of a sequence is defined by its components (e.g. 'aminoacids').

From 05a8fedb047bcdb401ab67d5782d9507c5288826 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Thu, 24 Feb 2022 11:10:07 +0100
Subject: [PATCH 05/21] More explained biomol_chain types

---
 optimade.rst | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/optimade.rst b/optimade.rst
index 29ea75878..121b3d5a9 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -2490,7 +2490,9 @@ biomol_chains
    - **name**: The chain name/letter.
    - **residues**: A list of integers referring to the index of :field:`biomol_residues`, that belong to this chain.
      The list SHOULD NOT be empty. The index of the first residue is 0.
-   - **types**: A list of tags specifying the type of molecules this chain contains.
+   - **types**: A list of custom tags/labels specifying the type of molecules this chain contains (e.g. 'protein').
+     This field is useful as an overview of every chain and as a query target for the structure.
+     Labels in this field are non-standard. Every implementation may use different labels according to its needs.
    - **sequences**: A list of residue sequences in current chain.
    - **sequence_types**: A list of tags specifying the type of each sequence in the :property:`sequences` field. The type of a sequence is defined by its components (e.g. 'aminoacids').
    - There SHOULD NOT be two or more chains with the same :property:`name`.

From 76f690c0c366406cc3095ee1d8f42627e7697c63 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Fri, 25 Feb 2022 15:38:32 +0100
Subject: [PATCH 06/21] Added standard labels for biomol_chain types

---
 optimade.rst | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/optimade.rst b/optimade.rst
index 121b3d5a9..b245bd132 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -2490,9 +2490,11 @@ biomol_chains
    - **name**: The chain name/letter.
    - **residues**: A list of integers referring to the index of :field:`biomol_residues`, that belong to this chain.
      The list SHOULD NOT be empty. The index of the first residue is 0.
-   - **types**: A list of custom tags/labels specifying the type of molecules this chain contains (e.g. 'protein').
+   - **types**: A list of tags/labels specifying the type of molecules this chain contains (e.g. 'protein').
      This field is useful as an overview of every chain and as a query target for the structure.
-     Labels in this field are non-standard. Every implementation may use different labels according to its needs.
+     Standard labels for this field are the follwoing: 'protein', 'nucleic acid', 'carbohydrates', 'lipid', 'membrane', 'ligand', 'ion', 'solvent' and 'other'.
+     The list SHOULD contain values within the standard labels.
+     Additional custom labels MAY be used. These labels MUST include the database-provider-specific prefix with the following format: <prefix>:<label>.
    - **sequences**: A list of residue sequences in current chain.
    - **sequence_types**: A list of tags specifying the type of each sequence in the :property:`sequences` field. The type of a sequence is defined by its components (e.g. 'aminoacids').
    - There SHOULD NOT be two or more chains with the same :property:`name`.

From 1f5650d7c7dc21bafe97489a62bfec3a9d009f6a Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Mon, 28 Feb 2022 14:59:35 +0100
Subject: [PATCH 07/21] Added underscore to new fieldnames

---
 optimade.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/optimade.rst b/optimade.rst
index b245bd132..d523cd4bb 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -2473,7 +2473,7 @@ The fields below are all optional and are only used within specific research fie
 Every field has a standard domain-specific prefix.
 
 
-biomol_chains
+_biomol_chains
 ~~~~~~~~~~~~~
 
 - **Description**: For each chain in the system there is a dictionary that describes this chain. Chains are groups of related residues (e.g. a polymer).
@@ -2525,7 +2525,7 @@ biomol_chains
     ]
   }
 
-biomol_residues
+_biomol_residues
 ~~~~~~~~~~~~~~~
 
 - **Description**: For each residue in the system there is a dictionary that describes this residue. Residues are groups of related atoms (e.g. an aminoacid).
@@ -2572,7 +2572,7 @@ biomol_residues
     ]
   }
 
-biomol_sequences
+_biomol_sequences
 ~~~~~~~~~~~~~~~~
 
 - **Description**: A list of residue sequences in current structure. It may be any type of sequence, as this type is further specified in :field:`biomol_sequence_types`.
@@ -2594,7 +2594,7 @@ biomol_sequences
   }
 
 
-biomol_sequence_types
+_biomol_sequence_types
 ~~~~~~~~~~~~~~~~~~~~~
 
 - **Description**: A list of tags specifying the type of each sequence in the :field:`biomol_sequences` field.

From a8781ac56d8abac9911e8f7d78c4923bb982ea23 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Mon, 28 Feb 2022 16:08:00 +0100
Subject: [PATCH 08/21] Amend

---
 optimade.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/optimade.rst b/optimade.rst
index d523cd4bb..343e47fd1 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -2474,7 +2474,7 @@ Every field has a standard domain-specific prefix.
 
 
 _biomol_chains
-~~~~~~~~~~~~~
+~~~~~~~~~~~~~~
 
 - **Description**: For each chain in the system there is a dictionary that describes this chain. Chains are groups of related residues (e.g. a polymer).
   Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix.
@@ -2526,7 +2526,7 @@ _biomol_chains
   }
 
 _biomol_residues
-~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~
 
 - **Description**: For each residue in the system there is a dictionary that describes this residue. Residues are groups of related atoms (e.g. an aminoacid).
   Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix.
@@ -2573,7 +2573,7 @@ _biomol_residues
   }
 
 _biomol_sequences
-~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~
 
 - **Description**: A list of residue sequences in current structure. It may be any type of sequence, as this type is further specified in :field:`biomol_sequence_types`.
   Sequences may be grouped and ordered in any form (e.g. by chains, by fragments of covalently bonded atoms, etc.) as long as they make sense when querying structures by sequence.
@@ -2595,7 +2595,7 @@ _biomol_sequences
 
 
 _biomol_sequence_types
-~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~
 
 - **Description**: A list of tags specifying the type of each sequence in the :field:`biomol_sequences` field.
   The type of a sequence is defined by its components (e.g. 'aminoacids').

From ac6ed658a2f0520e10e61283ffbcdc5095e4a8c7 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Tue, 1 Mar 2022 12:29:44 +0100
Subject: [PATCH 09/21] New species property: biomol atom name

---
 optimade.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/optimade.rst b/optimade.rst
index 343e47fd1..e363b1885 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -2127,6 +2127,7 @@ species
   - :property:`nattached`: list of integers (OPTIONAL)
   - :property:`mass`: list of floats (OPTIONAL)
   - :property:`original_name`: string (OPTIONAL).
+  - :property:`_biomol_atom_name`: string (OPTIONAL).
 
 - **Requirements/Conventions**:
 
@@ -2165,6 +2166,8 @@ species
 
           **Note**: With regards to "source database", we refer to the immediate source being queried via the OPTIMADE API implementation.
           The main use of this field is for source databases that use species names, containing characters that are not allowed (see description of the list property `species_at_sites`_).
+	  
+  - **\_biomol\_atom\_name**: OPTIONAL. Name of the atom according to the biomolecular field standards.
 
   - For systems that have only species formed by a single chemical symbol, and that have at most one species per chemical symbol, SHOULD use the chemical symbol as species name (e.g., :val:`"Ti"` for titanium, :val:`"O"` for oxygen, etc.)
     However, note that this is OPTIONAL, and client implementations MUST NOT assume that the key corresponds to a chemical symbol, nor assume that if the species name is a valid chemical symbol, that it represents a species with that chemical symbol.

From 55f71e2be3f68825308d4d9a34af404fedc751e7 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Tue, 1 Mar 2022 12:32:19 +0100
Subject: [PATCH 10/21] Restructured biomol sequences

---
 optimade.rst | 50 ++++++++++++++++++++++----------------------------
 1 file changed, 22 insertions(+), 28 deletions(-)

diff --git a/optimade.rst b/optimade.rst
index e363b1885..db039daa5 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -2578,42 +2578,36 @@ _biomol_residues
 _biomol_sequences
 ~~~~~~~~~~~~~~~~~
 
-- **Description**: A list of residue sequences in current structure. It may be any type of sequence, as this type is further specified in :field:`biomol_sequence_types`.
-  Sequences may be grouped and ordered in any form (e.g. by chains, by fragments of covalently bonded atoms, etc.) as long as they make sense when querying structures by sequence.
-- **Type**: list of strings
+- **Description**: A list of residue sequences in current structure.
+ Every sequence is a dictionary which includes the sequence itself and the type of sequence it is.
+ Every sequence may include a list of chain and residue indices.
+ Sequences may be grouped and ordered in any form (e.g. by chains, by fragments of covalently bonded atoms, etc.) as long as they make sense when querying structures by sequence.
+- **Type**: list of dictionaries with the properties:
+   - :property:`sequence`: string (REQUIRED)
+   - :property:`type`: string (REQUIRED)
+   - :property:`chains`: list of integers
 - **Requirements/Conventions**:
-   - **Query**:  Support for queries on this property is OPTIONAL.
-   - There MUST be the same number of values that in :field:`biomol_sequence_types`.
-   - Values SHOULD be in capital letters.
+   - **Query**:  Queries on this property SHOULD be supported.
+   - **sequence**: A string with a letter for each residue in the sequence. Letters SHOULD be capital letters.
+   - **type**: The type of a sequence is defined by its components (e.g. 'aminoacids').
+   - **chains**: A list of integers referring to indices in :field:`biomol_chains` for chains which include this sequence totally or partially.
+   There MUST NOT be repeated indices in :property:`chains`.
 
-- **Examples**:
 
-.. code:: jsonc
-  {
-    "biomol_sequences":[
-      'MSHHWGYG',
-      'GATTACA'
-    ]
-  }
-
-
-_biomol_sequence_types
-~~~~~~~~~~~~~~~~~~~~~~
-
-- **Description**: A list of tags specifying the type of each sequence in the :field:`biomol_sequences` field.
-  The type of a sequence is defined by its components (e.g. 'aminoacids').
-- **Type**: list of strings
-- **Requirements/Conventions**:
-   - **Query**:  Support for queries on this property is OPTIONAL.
-   - There MUST be the same number of values that in :field:`biomol_sequences`.
 
 - **Examples**:
 
 .. code:: jsonc
   {
-    "biomol_sequence_types":[
-      'aminoacids',
-      'nucleotides'
+    "biomol_sequences":[
+      {
+        sequence: 'MSHHWGYG',
+        type: 'aminoacids'
+      },
+      {
+        sequence: 'GATTACA',
+        type: 'nucleotides'
+      }
     ]
   }
 

From f865a5a18d69d2d46ab50b31e07fc3814427f3f4 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Tue, 13 Sep 2022 17:35:37 +0200
Subject: [PATCH 11/21] Laussane discussions update

---
 optimade.rst | 135 ++++++++++++++++++++++-----------------------------
 1 file changed, 57 insertions(+), 78 deletions(-)

diff --git a/optimade.rst b/optimade.rst
index b7d2a0714..5de30906b 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3158,59 +3158,6 @@ The fields below are all optional and are only used within specific research fie
 
 Every field has a standard domain-specific prefix.
 
-
-_biomol_chains
-~~~~~~~~~~~~~~
-
-- **Description**: For each chain in the system there is a dictionary that describes this chain. Chains are groups of related residues (e.g. a polymer).
-  Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix.
-- **Type**: list of dictionaries with the properties:
-   - :property:`name`: string (REQUIRED)
-   - :property:`residues`: list of integers (REQUIRED)
-   - :property:`types`: list of strings
-   - :property:`sequences`: list of strings
-   - :property:`sequence_types`: list of strings
-- **Requirements/Conventions**:
-   - **Query**:  Support for queries on this property is OPTIONAL.
-     If supported, only a subset of the filter features MAY be supported.
-   - **name**: The chain name/letter.
-   - **residues**: A list of integers referring to the index of :field:`biomol_residues`, that belong to this chain.
-     The list SHOULD NOT be empty. The index of the first residue is 0.
-   - **types**: A list of tags/labels specifying the type of molecules this chain contains (e.g. 'protein').
-     This field is useful as an overview of every chain and as a query target for the structure.
-     Standard labels for this field are the follwoing: 'protein', 'nucleic acid', 'carbohydrates', 'lipid', 'membrane', 'ligand', 'ion', 'solvent' and 'other'.
-     The list SHOULD contain values within the standard labels.
-     Additional custom labels MAY be used. These labels MUST include the database-provider-specific prefix with the following format: <prefix>:<label>.
-   - **sequences**: A list of residue sequences in current chain.
-   - **sequence_types**: A list of tags specifying the type of each sequence in the :property:`sequences` field. The type of a sequence is defined by its components (e.g. 'aminoacids').
-   - There SHOULD NOT be two or more chains with the same :property:`name`.
-   - Values in :property:`name` SHOULD be in capital letters.
-   - Values in :property:`name` SHOULD NOT be longer than 1 character when the number of chains is not greater than the number of letters in English alphabet (26).
-   - Values in :property:`sequences` SHOULD be in capital letters.
-   - Number of values in :property:`sequences` and :property:`sequence_types` MUST match.
-
-- **Examples**:
-
-.. code:: jsonc
-  {
-    "biomol_chains":[
-      {
-        "name": "A",
-        "residues":[0,1,2,3, ...],
-        "types": ['protein', 'ions'],
-	      "sequences": ['MSHHWGYG'],
-	      "sequence_types": ['aminoacids']
-      },
-      {
-        "name": "B",
-        "residues":[54,55,56,57, ...],
-        "types": ['nucleic acid'],
-	      "sequences": ['GATTACA'],
-	      "sequence_types": ['nucleotides']
-      },
-    ]
-  }
-
 _biomol_residues
 ~~~~~~~~~~~~~~~~
 
@@ -3220,15 +3167,16 @@ _biomol_residues
    - :property:`name`: string (REQUIRED)
    - :property:`number`: integer (REQUIRED)
    - :property:`insertion_code`: string or null (REQUIRED)
-   - :property:`sites`: list of integers (REQUIRED)
+   - :property:`chain`: string (OPTIONAL)
 - **Requirements/Conventions**:
    - **Query**:  Support for queries on this property is OPTIONAL.
      If supported, only a subset of the filter features MAY be supported.
    - **name**: The residue name
    - **number**: The residue number according to source notation.
    - **insertion_code**: The residue insertion code. It MUST NOT be longer than 1 character. It MAY be null.
-   - **sites**: A list of integers referring to the index of :field:`cartesian_site_positions`, that belong to this residue.
-     The list SHOULD NOT be empty. The index of the first site is 0.
+   - **chain**: The chain number this residue belongs to.
+   - Values in :property:`chain` SHOULD be in capital letters.
+   - Values in :property:`chain` SHOULD NOT be longer than 1 character when the number of chains is not greater than the number of letters in English alphabet (26).
    - There MUST NOT be two or more residues with the same integer in :property:`sites`.
    - All :property:`name` and :property:`insertion_code` values SHOULD be in capital letters.
 
@@ -3236,60 +3184,91 @@ _biomol_residues
 
 .. code:: jsonc
   {
-    "biomol_residues":[
+    "_biomol_residues":[
       {
         "name": "PHE",
-	      "number": 17,
-	      "insertion_code": null,
-        "sites":[0,1,2,3, ...]
+	"number": 17,
+	"insertion_code": null,
+        "chain": "A"
       },
       {
         "name": "ASP",
-	      "number": 18,
-	      "insertion_code": null,
-        "sites":[17,18,19,20, ...]
+	"number": 18,
+	"insertion_code": null,
+        "chain": "A"
       },
       {
         "name": "LEU",
-	      "number": 18,
+	"number": 18,
         "insertion_code": "A",
-        "sites":[29,30,31, ...]
+        "chain": "A"
       },
     ]
   }
+  
+_biomol_residues_at_sites
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **Description**: Index of the residues at each site (where values for sites are specified with the same order of the property `cartesian_site_positions`_).
+  The properties of the residues are found in the property `_biomol_residues`_.
+- **Type**: list of integers.
+- **Requirements/Conventions**:
+  - **Support**: SHOULD be supported by all biomol implementations, i.e., SHOULD NOT be :val:`null`.
+  - **Query**: Support for queries on this property is OPTIONAL.
+    If supported, filters MAY support only a subset of comparison operators.
+  - MUST have length equal to the number of sites in the structure (first dimension of the list property `cartesian_site_positions`_).
+  - Residue indices mentioned in the `_biomol_residues_at_sites`_ list MUST be lower than the length of the list property `_biomol_residues`_ (i.e. for each value in the `_biomol_residues_at_sites`_ list there MUST exist one dictionary in the `_biomol_residues`_ list with the index equal to the corresponding `_biomol_residues_at_sites`_ value).
 
-_biomol_sequences
-~~~~~~~~~~~~~~~~~
+- **Examples**:
 
-- **Description**: A list of residue sequences in current structure.
- Every sequence is a dictionary which includes the sequence itself and the type of sequence it is.
- Every sequence may include a list of chain and residue indices.
- Sequences may be grouped and ordered in any form (e.g. by chains, by fragments of covalently bonded atoms, etc.) as long as they make sense when querying structures by sequence.
+  - :val:`[0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1, ... ]` indicates that the first 8 sites belong to the first residue in the the `residues`_ list, while the 9 following atoms belong to the second residue.
+
+_biomol_site_sequences
+~~~~~~~~~~~~~~~~~~~~~~
+
+- **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides having atoms with coordinates in sites. The order of the elements in the `_biomol_site_sequences`_ list is not relevant. Each dictionary in the list holds two keys: sequence and type. The sequence is a string of one-letter codes identifying each amino acid or nucleotide as defined by the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.pdbx_seq_one_letter_code.html>`__. The type is a string defining the monomers of the sequence. Accepted values are “polypeptide” for amino acids, “polydeoxyribonucleotide”  for deoxyribonucleotides and “polyribonucleotide” for ribonucleotides, according to the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_pdbx_reference_linked_entity.link_to_entity_type.html>`__.
 - **Type**: list of dictionaries with the properties:
    - :property:`sequence`: string (REQUIRED)
    - :property:`type`: string (REQUIRED)
-   - :property:`chains`: list of integers
 - **Requirements/Conventions**:
    - **Query**:  Queries on this property SHOULD be supported.
    - **sequence**: A string with a letter for each residue in the sequence. Letters SHOULD be capital letters.
-   - **type**: The type of a sequence is defined by its components (e.g. 'aminoacids').
-   - **chains**: A list of integers referring to indices in :field:`biomol_chains` for chains which include this sequence totally or partially.
-   There MUST NOT be repeated indices in :property:`chains`.
+   - **type**: The type of a sequence is defined by the type of its residues (e.g. "polypeptide").
+
+- **Examples**:
 
+.. code:: jsonc
+  {
+    "_biomol_site_sequences":[
+      {
+        sequence: 'MSHHWGYG',
+        type: 'polypeptide'
+      },
+      {
+        sequence: 'GATTACA',
+        type: 'polydeoxyribonucleotide'
+      }
+    ]
+  }
+  
+_biomol_full_sequences
+~~~~~~~~~~~~~~~~~~~~~~
 
+- **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides including residues without coordinates in sites. The order of the elements in the `_biomol_full_sequences`_ list is not relevant.
+Each element in the list is a dictionary, with the same schema defined for `_biomol_site_sequences`_.
 
 - **Examples**:
 
 .. code:: jsonc
   {
-    "biomol_sequences":[
+    "_biomol_full_sequences":[
       {
         sequence: 'MSHHWGYG',
-        type: 'aminoacids'
+        type: 'polypeptide'
       },
       {
         sequence: 'GATTACA',
-        type: 'nucleotides'
+        type: 'polydeoxyribonucleotide'
       }
     ]
   }

From af3c8171a67212cf9714a45fb811144af1d3de6a Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Tue, 13 Sep 2022 18:17:38 +0200
Subject: [PATCH 12/21] Update optimade.rst

Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com>
---
 optimade.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/optimade.rst b/optimade.rst
index 5de30906b..4301468b3 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3216,7 +3216,7 @@ _biomol_residues_at_sites
   - **Support**: SHOULD be supported by all biomol implementations, i.e., SHOULD NOT be :val:`null`.
   - **Query**: Support for queries on this property is OPTIONAL.
     If supported, filters MAY support only a subset of comparison operators.
-  - MUST have length equal to the number of sites in the structure (first dimension of the list property `cartesian_site_positions`_).
+  - The number of values MUST be equal to :property: `nsites`, i.e. the number of sites in the structure.
   - Residue indices mentioned in the `_biomol_residues_at_sites`_ list MUST be lower than the length of the list property `_biomol_residues`_ (i.e. for each value in the `_biomol_residues_at_sites`_ list there MUST exist one dictionary in the `_biomol_residues`_ list with the index equal to the corresponding `_biomol_residues_at_sites`_ value).
 
 - **Examples**:

From 616019c2fb50522952819873f048d5b59f33548e Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Tue, 13 Sep 2022 18:18:30 +0200
Subject: [PATCH 13/21] Update optimade.rst

---
 optimade.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/optimade.rst b/optimade.rst
index 4301468b3..0601b9dee 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3177,7 +3177,6 @@ _biomol_residues
    - **chain**: The chain number this residue belongs to.
    - Values in :property:`chain` SHOULD be in capital letters.
    - Values in :property:`chain` SHOULD NOT be longer than 1 character when the number of chains is not greater than the number of letters in English alphabet (26).
-   - There MUST NOT be two or more residues with the same integer in :property:`sites`.
    - All :property:`name` and :property:`insertion_code` values SHOULD be in capital letters.
 
 - **Examples**:

From bd3e9e176b4f9b1c5ae01e978cf897834ca23370 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Tue, 13 Sep 2022 18:21:34 +0200
Subject: [PATCH 14/21] Update optimade.rst

---
 optimade.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/optimade.rst b/optimade.rst
index 0601b9dee..b67d53c54 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3212,7 +3212,7 @@ _biomol_residues_at_sites
   The properties of the residues are found in the property `_biomol_residues`_.
 - **Type**: list of integers.
 - **Requirements/Conventions**:
-  - **Support**: SHOULD be supported by all biomol implementations, i.e., SHOULD NOT be :val:`null`.
+  - **Support**: MUST be supported when `_biomol_residues`_ is present as well, i.e., MUST NOT be :val:`null`.
   - **Query**: Support for queries on this property is OPTIONAL.
     If supported, filters MAY support only a subset of comparison operators.
   - The number of values MUST be equal to :property: `nsites`, i.e. the number of sites in the structure.

From ce93c9d113b1df642576dfe322f35e5fc6b6aa86 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Tue, 13 Sep 2022 18:21:54 +0200
Subject: [PATCH 15/21] Update optimade.rst

Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com>
---
 optimade.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/optimade.rst b/optimade.rst
index b67d53c54..a54f20e32 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3216,7 +3216,7 @@ _biomol_residues_at_sites
   - **Query**: Support for queries on this property is OPTIONAL.
     If supported, filters MAY support only a subset of comparison operators.
   - The number of values MUST be equal to :property: `nsites`, i.e. the number of sites in the structure.
-  - Residue indices mentioned in the `_biomol_residues_at_sites`_ list MUST be lower than the length of the list property `_biomol_residues`_ (i.e. for each value in the `_biomol_residues_at_sites`_ list there MUST exist one dictionary in the `_biomol_residues`_ list with the index equal to the corresponding `_biomol_residues_at_sites`_ value).
+  - Each value in the `_biomol_residues_at_sites`_ list MUST correspond to  the index of one the dictionaries in the `_biomol_residues`_ list.
 
 - **Examples**:
 

From a875aaf2a82ce07825dc49766682494cc86171a4 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Tue, 13 Sep 2022 18:31:10 +0200
Subject: [PATCH 16/21] added a few breaklines with correct indent

---
 optimade.rst | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/optimade.rst b/optimade.rst
index a54f20e32..74034f0ae 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3225,7 +3225,10 @@ _biomol_residues_at_sites
 _biomol_site_sequences
 ~~~~~~~~~~~~~~~~~~~~~~
 
-- **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides having atoms with coordinates in sites. The order of the elements in the `_biomol_site_sequences`_ list is not relevant. Each dictionary in the list holds two keys: sequence and type. The sequence is a string of one-letter codes identifying each amino acid or nucleotide as defined by the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.pdbx_seq_one_letter_code.html>`__. The type is a string defining the monomers of the sequence. Accepted values are “polypeptide” for amino acids, “polydeoxyribonucleotide”  for deoxyribonucleotides and “polyribonucleotide” for ribonucleotides, according to the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_pdbx_reference_linked_entity.link_to_entity_type.html>`__.
+- **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides having atoms with coordinates in sites. 
+  The order of the elements in the `_biomol_site_sequences`_ list is not relevant.
+  Each dictionary in the list holds two keys: sequence and type. The sequence is a string of one-letter codes identifying each amino acid or nucleotide as defined by the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.pdbx_seq_one_letter_code.html>`__.
+  The type is a string defining the monomers of the sequence. Accepted values are “polypeptide” for amino acids, “polydeoxyribonucleotide”  for deoxyribonucleotides and “polyribonucleotide” for ribonucleotides, according to the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_pdbx_reference_linked_entity.link_to_entity_type.html>`__.
 - **Type**: list of dictionaries with the properties:
    - :property:`sequence`: string (REQUIRED)
    - :property:`type`: string (REQUIRED)
@@ -3253,8 +3256,9 @@ _biomol_site_sequences
 _biomol_full_sequences
 ~~~~~~~~~~~~~~~~~~~~~~
 
-- **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides including residues without coordinates in sites. The order of the elements in the `_biomol_full_sequences`_ list is not relevant.
-Each element in the list is a dictionary, with the same schema defined for `_biomol_site_sequences`_.
+- **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides including residues without coordinates in sites.
+  The order of the elements in the `_biomol_full_sequences`_ list is not relevant.
+  Each element in the list is a dictionary, with the same schema defined for `_biomol_site_sequences`_.
 
 - **Examples**:
 

From a1ddd88d9bb312fee6460c0e0cbbe5c0bcf398c3 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Wed, 14 Sep 2022 12:33:30 +0200
Subject: [PATCH 17/21] Update optimade.rst

Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com>
---
 optimade.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/optimade.rst b/optimade.rst
index 74034f0ae..c54a32721 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3226,7 +3226,7 @@ _biomol_site_sequences
 ~~~~~~~~~~~~~~~~~~~~~~
 
 - **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides having atoms with coordinates in sites. 
-  The order of the elements in the `_biomol_site_sequences`_ list is not relevant.
+  The elements in the `_biomol_site_sequences`_ list are unordered.
   Each dictionary in the list holds two keys: sequence and type. The sequence is a string of one-letter codes identifying each amino acid or nucleotide as defined by the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.pdbx_seq_one_letter_code.html>`__.
   The type is a string defining the monomers of the sequence. Accepted values are “polypeptide” for amino acids, “polydeoxyribonucleotide”  for deoxyribonucleotides and “polyribonucleotide” for ribonucleotides, according to the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_pdbx_reference_linked_entity.link_to_entity_type.html>`__.
 - **Type**: list of dictionaries with the properties:

From a0ee16a0b41f730003be35e58b13940b41f71ad3 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Wed, 14 Sep 2022 12:33:56 +0200
Subject: [PATCH 18/21] Update optimade.rst

Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com>
---
 optimade.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/optimade.rst b/optimade.rst
index c54a32721..cdae64483 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3228,7 +3228,8 @@ _biomol_site_sequences
 - **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides having atoms with coordinates in sites. 
   The elements in the `_biomol_site_sequences`_ list are unordered.
   Each dictionary in the list holds two keys: sequence and type. The sequence is a string of one-letter codes identifying each amino acid or nucleotide as defined by the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.pdbx_seq_one_letter_code.html>`__.
-  The type is a string defining the monomers of the sequence. Accepted values are “polypeptide” for amino acids, “polydeoxyribonucleotide”  for deoxyribonucleotides and “polyribonucleotide” for ribonucleotides, according to the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_pdbx_reference_linked_entity.link_to_entity_type.html>`__.
+  The type is a string defining what kind of monomers are in the sequence.
+    Accepted values are "polypeptide" for amino acids, "polydeoxyribonucleotide"  for deoxyribonucleotides and "polyribonucleotide" for ribonucleotides, according to the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_pdbx_reference_linked_entity.link_to_entity_type.html>`__.
 - **Type**: list of dictionaries with the properties:
    - :property:`sequence`: string (REQUIRED)
    - :property:`type`: string (REQUIRED)

From c0ea1ac5d5ef55315dd7071f56263188afe3d0da Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Wed, 14 Sep 2022 12:36:24 +0200
Subject: [PATCH 19/21] Update optimade.rst

Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com>
---
 optimade.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/optimade.rst b/optimade.rst
index cdae64483..a41aa9de2 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3248,7 +3248,7 @@ _biomol_site_sequences
         type: 'polypeptide'
       },
       {
-        sequence: 'GATTACA',
+        sequence: '(DG)(DA)(DT)(DT)(DA)(DC)(DA)',
         type: 'polydeoxyribonucleotide'
       }
     ]

From af14d7231b27811397f6486e3934f3c1f3ab7eaa Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Wed, 14 Sep 2022 12:36:41 +0200
Subject: [PATCH 20/21] Update optimade.rst

Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com>
---
 optimade.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/optimade.rst b/optimade.rst
index a41aa9de2..dfd4656ac 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3258,7 +3258,7 @@ _biomol_full_sequences
 ~~~~~~~~~~~~~~~~~~~~~~
 
 - **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides including residues without coordinates in sites.
-  The order of the elements in the `_biomol_full_sequences`_ list is not relevant.
+  The elements in the `_biomol_full_sequences`_ list are unordered.
   Each element in the list is a dictionary, with the same schema defined for `_biomol_site_sequences`_.
 
 - **Examples**:

From 7610146edf4ded681da2da1f702a79e174be1bb8 Mon Sep 17 00:00:00 2001
From: Dani <44979434+d-beltran@users.noreply.github.com>
Date: Wed, 14 Sep 2022 12:38:12 +0200
Subject: [PATCH 21/21] insertion_code renamed as icode

---
 optimade.rst | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/optimade.rst b/optimade.rst
index dfd4656ac..8b7cdf5f1 100644
--- a/optimade.rst
+++ b/optimade.rst
@@ -3166,18 +3166,18 @@ _biomol_residues
 - **Type**: list of dictionaries with the properties:
    - :property:`name`: string (REQUIRED)
    - :property:`number`: integer (REQUIRED)
-   - :property:`insertion_code`: string or null (REQUIRED)
+   - :property:`icode`: string or null (REQUIRED)
    - :property:`chain`: string (OPTIONAL)
 - **Requirements/Conventions**:
    - **Query**:  Support for queries on this property is OPTIONAL.
      If supported, only a subset of the filter features MAY be supported.
    - **name**: The residue name
    - **number**: The residue number according to source notation.
-   - **insertion_code**: The residue insertion code. It MUST NOT be longer than 1 character. It MAY be null.
+   - **icode**: The residue insertion code. It MUST NOT be longer than 1 character. It MAY be null.
    - **chain**: The chain number this residue belongs to.
    - Values in :property:`chain` SHOULD be in capital letters.
    - Values in :property:`chain` SHOULD NOT be longer than 1 character when the number of chains is not greater than the number of letters in English alphabet (26).
-   - All :property:`name` and :property:`insertion_code` values SHOULD be in capital letters.
+   - All :property:`name` and :property:`icode` values SHOULD be in capital letters.
 
 - **Examples**:
 
@@ -3187,19 +3187,19 @@ _biomol_residues
       {
         "name": "PHE",
 	"number": 17,
-	"insertion_code": null,
+	"icode": null,
         "chain": "A"
       },
       {
         "name": "ASP",
 	"number": 18,
-	"insertion_code": null,
+	"icode": null,
         "chain": "A"
       },
       {
         "name": "LEU",
 	"number": 18,
-        "insertion_code": "A",
+        "icode": "A",
         "chain": "A"
       },
     ]