Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SMILES property #4

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions src/v0.1.0/entrytypes/structures.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,13 @@ properties:
sortable: false
query-support: "none"
response-level: "yes"
_cheminfo_smiles:
$$inherit: "/properties/structures/_cheminfo_smiles"
x-optimade-implementation:
support: "may"
sortable: false
query-support: "none"
response-level: "yes"
_cheminfo_stdinchikey:
$$inherit: "/properties/structures/_cheminfo_stdinchikey"
x-optimade-implementation:
Expand Down
23 changes: 23 additions & 0 deletions src/v0.1.0/properties/structures/_cheminfo_smiles.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
$$schema: "https://schemas.optimade.org/meta/v1.2/optimade/property_definition"
$id: "https://schemas.optimade.org/namespaces/cheminformatics/v0.1/properties/structures/_cheminfo_smiles"
title: "SMILES (Simplified Molecular Input Line Entry Specification) representation of the structure"
merkys marked this conversation as resolved.
Show resolved Hide resolved
x-optimade-type: "string"
x-optimade-definition:
kind: "property"
version: "0.1.0"
format: "1.2"
name: "_cheminfo_smiles"
label: "_cheminfo_smiles_structures"
type:
- "string"
- "null"
description: |-
SMILES (Simplified Molecular Input Line Entry System) representation of the structure.
Values MUST adhere to the OpenSMILES specification v1.0 (http://opensmiles.org/opensmiles.html).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consensus is that we should recommend our favourite specifications, rather than necessarily enforcing it.

We can also recommend that implementations announce which flavour of SMILES they are using as human-readable metadata

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also recommend that implementations announce which flavour of SMILES they are using as human-readable metadata

There was a suggestion of having an enumerator for most-widely-used values and value other with another field defined for free-form text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just remembered this point that either this field should be multivalued, or we have a proper recommendation for how to deal with multiple disconnected chemical subcomponents, or the case where the SMILES does not include all atoms in the "unit cell"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just remembered this point that either this field should be multivalued, or we have a proper recommendation for how to deal with multiple disconnected chemical subcomponents, or the case where the SMILES does not include all atoms in the "unit cell"

I have mentioned this issue in the initial PR message, but please correct me if you meant a different thing.

The issue of not representing all atoms in the structure (sort of "unit cell") is a good catch, though.

When structures or their parts cannot be unambiguously represented in SMILES according to OpenSMILES recommendations, using the guidelines from Quirós et al. 2018 (https://doi.org/10.1186/s13321-018-0279-6) is RECOMMENDED.
Providers MAY canonicalize (i.e., use rules to establish the stable order of atoms) the produced SMILES representations, but this is not mandatory and no particular set of rules is recommended.
Generally, providers SHOULD NOT change the representation more frequently than the structure itself is modified.
examples:
- "c1ccccc1"
- "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
x-optimade-unit: "inapplicable"