From b819a1637bf3116295caf24d82917435edd79840 Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Fri, 24 May 2024 13:35:17 -0400 Subject: [PATCH 01/16] docs: expand readme close #20 * update readme to include what metaschema processor is/does, installation steps, expected file hierarchy, and how to contribute to the schema/docs --- Makefile | 40 +++++++++++++++++++++ README.md | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 141 insertions(+), 1 deletion(-) create mode 100644 Makefile diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..b14308a --- /dev/null +++ b/Makefile @@ -0,0 +1,40 @@ +PYV:=3.12 +VEDIR=venv/${PYV} + +############################################################################ +#= SETUP, INSTALLATION, PACKAGING + +#=> venv: make a Python 3 virtual environment +.PHONY: venv/% +venv/%: + python$* -m venv $@; \ + source $@/bin/activate; \ + python -m ensurepip --upgrade; \ + pip install --upgrade pip setuptools + +#=> develop: install package in develop mode +.PHONY: develop setup +develop setup: + pip install -e . + +#=> devready: create venv, install prerequisites, install pkg in develop mode +.PHONY: devready +devready: + make ${VEDIR} && source ${VEDIR}/bin/activate && make develop + @echo '#################################################################################' + @echo '### Do not forget to `source ${VEDIR}/bin/activate` to use this environment ###' + @echo '#################################################################################' + +############################################################################ +#= TESTING +# see test configuration in pyproject.toml + +#=> test: execute tests +.PHONY: test +test: + pytest tests/ + +#=> doctest: execute documentation tests (requires extra data) +.PHONY: doctest +doctest: + pytest tests/ --doctest-modules diff --git a/README.md b/README.md index c978fcb..3860bb9 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,102 @@ # gks-metaschema -Tools and scripts for parsing the GKS standards metaschemas + +Tools and scripts for parsing the GA4GH Genomic Knowledge Standards (GKS) metaschemas. +The metaschema processor converts [JSON Schema Version 2020-12](json-schema.org/draft/2020-12/schema) +in YAML to reStructuredText and JSON files. + +Currently used in: + +* [VRS](https://github.com/ga4gh/vrs) +* [VA-Spec](https://github.com/ga4gh/va-spec/) +* [Cat-VRS](https://github.com/ga4gh/cat-vrs) + +## Installing for development + +### Prerequisites + +* Python 3.12: We recommend using [pyenv](https://github.com/pyenv/pyenv). + +### Installation Steps + +Fork the repo at . + + git clone git@github.com:YOUR_GITHUB_ID/gks-metaschema.git + cd gks-metaschema + make devready + source venv/3.12/bin/activate + +### Testing + +To run the tests: + + make test + +## Usage + +### File Hierarchy + +The metaschema processor expects the following hierarchy: + + ├── docs + │ ├── source + │ | ├── ... + │ ├── Makefile + ├── schema + │ ├──gks_schema + │ | ├── gks-schema-source.yaml + │ | ├── Makefile + │ | ├── prune.mk + +* `docs`: [Sphinx](https://www.sphinx-doc.org/en/master/index.html) documentation + directory. **Must** be named `docs`. + * `source`: Directory containing documentation written in reStructuredText and Sphinx + configuration. **Must** be named `source`. + * `Makefile`: Commands to create the reStructuredText files. + This file should not change across GKS projects. +* `schema`: Schema directory. Can also contain submodules for other GKS product schemas. + * `gks_schema`: Schema directory for GKS product. The directory name should reflect + the product, e.g. `vrs`. + * `gks-schema-source.yaml`: Source document for the JSON Schema 2020-12. The file name + should reflect the standard, e.g. `vrs-source.yaml`. The file name **must** end + with `-source.yaml`. + * `Makefile`: Commands to create the reStructuredText and JSON files. + This file should not change across GKS projects. + * `prune.mk`: Cleanup of files in `def` and `json` directories based on source document. + This file should not change across GKS projects. + +### Contributing to the schema + +To create the corresponding `def` (reStructuredText) and `json` files after making +changes to the source document, from the _schema_ directory: + + make all + +The file structure will now look like: + + ├── schema + │ ├──gks_schema + | | ├── def + │ | | ├── ... + | | ├── json + │ | | ├── ... + │ | ├── gks-schema-source.yaml + │ | ├── Makefile + │ | ├── prune.mk + +### Contributing to the docs + +GKS specification documentation is written in reStructuredText and located in +`docs/source`. + +To build documentation locally, you must install [entr](https://eradman.com/entrproject/): + + brew install entr + +Then from the _docs_ directory: + + make clean watch & + +Then, open `docs/build/html/index.html`. The above make command should build docs when +the source changes. + +> **NOTE**: Some types of changes require recleaning and building. From 891ec0bcfb799003ac06cd65576a0cf5b095dacc Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Mon, 4 Nov 2024 16:31:54 -0500 Subject: [PATCH 02/16] feat: add property level maturity + ga4gh prefix/keys in RST files close #26 * Add GA4GH Digest table if prefix/keys are provided * Add Flags column for property level maturity status + ordered property in arrays --- src/ga4gh/gks/metaschema/scripts/y2t.py | 81 +++++++++++++++++++ tests/data/gnomAD/json/GnomadCAF | 52 ++++++------ tests/data/vrs/def/Adjacency.rst | 38 ++++++++- tests/data/vrs/def/Allele.rst | 35 +++++++- tests/data/vrs/def/CopyNumber.rst | 15 ++++ tests/data/vrs/def/CopyNumberChange.rst | 35 +++++++- tests/data/vrs/def/CopyNumberCount.rst | 35 +++++++- tests/data/vrs/def/Expression.rst | 8 +- .../data/vrs/def/Ga4ghIdentifiableObject.rst | 10 +++ tests/data/vrs/def/Haplotype.rst | 37 ++++++++- tests/data/vrs/def/LengthExpression.rst | 14 +++- .../vrs/def/LiteralSequenceExpression.rst | 14 +++- tests/data/vrs/def/Range.rst | 4 +- .../vrs/def/ReferenceLengthExpression.rst | 16 +++- tests/data/vrs/def/Residue.rst | 4 +- tests/data/vrs/def/SequenceExpression.rst | 9 +++ tests/data/vrs/def/SequenceLocation.rst | 32 +++++++- tests/data/vrs/def/SequenceReference.rst | 15 +++- tests/data/vrs/def/SequenceString.rst | 4 +- tests/data/vrs/def/Variation.rst | 14 ++++ 20 files changed, 418 insertions(+), 54 deletions(-) diff --git a/src/ga4gh/gks/metaschema/scripts/y2t.py b/src/ga4gh/gks/metaschema/scripts/y2t.py index 076a8ee..5fd4648 100755 --- a/src/ga4gh/gks/metaschema/scripts/y2t.py +++ b/src/ga4gh/gks/metaschema/scripts/y2t.py @@ -1,12 +1,28 @@ #!/usr/bin/env python3 """convert input .yaml to .rst artifacts""" +from io import TextIOWrapper import os import sys import pathlib from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor +# Mapping to corresponding hex color code and code for maturity status +MATURITY_MAPPING: dict[str, tuple[str, str]] = { + "draft": ("D3D3D3", "D"), + "trial_use": ("FFFF99", "TU"), + "normative": ("B6D7A8", "N"), + "deprecated": ("EA9999", "DP") +} + +# Mapping to corresponding code for ordered property in arrays +ORDERED_MAPPING: dict[bool, str] = { + True: "OL", + False: "UL" +} + + def resolve_type(class_property_definition): if 'type' in class_property_definition: if class_property_definition['type'] == 'array': @@ -62,6 +78,66 @@ def get_ancestor_with_attributes(class_name, proc): return class_name + +def add_ga4gh_digest(class_definition: dict, f: TextIOWrapper) -> None: + """Add GA4GH Digest table + + Will only include this table if both ``prefix`` and ``keys`` are provided + + :param class_definition: Model definition + :param f: RST file + """ + ga4gh_digest = class_definition.get("ga4ghDigest") or {} + if ga4gh_digest: + ga4gh_prefix = ga4gh_digest.get("prefix") or "" + ga4gh_keys = ga4gh_digest.get("keys") or [] + if ga4gh_prefix and ga4gh_keys: + print(f""" +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - {ga4gh_prefix} + - {str(ga4gh_digest.get("keys") or [])}\n""", file=f) + + +def resolve_flags(class_property_attributes: dict) -> str: + """Add badges for flags (maturity and ordered property) + + :param class_property_attributes: Property attributes for a class + :return: Output for flag badges + """ + flags = "" + maturity = class_property_attributes.get("maturity") + + if maturity is not None: + background_color, maturity_code = MATURITY_MAPPING.get(maturity, (None, None)) + if background_color and maturity_code: + flags += f""" + .. raw:: html + + {maturity_code}""" + + ordered = class_property_attributes.get("ordered") + ordered_code = ORDERED_MAPPING.get(ordered, None) + + if ordered_code is not None: + if not flags: + flags += """ + .. raw:: html\n""" + + flags += f""" + {ordered_code}""" + return flags + + def main(proc_schema): for class_name, class_definition in proc_schema.defs.items(): with open(proc_schema.def_fp / (class_name + '.rst'), "w") as f: @@ -97,6 +173,9 @@ def main(proc_schema): inheritance = f"Some {class_name} attributes are inherited from :ref:`{ancestor}`.\n" else: inheritance = "" + + add_ga4gh_digest(class_definition, f) + print("\n**Information Model**", file=f) print(f""" {inheritance} @@ -107,12 +186,14 @@ def main(proc_schema): :widths: auto * - Field + - Flags - Type - Limits - Description""", file=f) for class_property_name, class_property_attributes in class_definition[p].items(): print(f"""\ * - {class_property_name} + - {resolve_flags(class_property_attributes)} - {resolve_type(class_property_attributes)} - {resolve_cardinality(class_property_name, class_property_attributes, class_definition)} - {class_property_attributes.get('description', '')}""", file=f) diff --git a/tests/data/gnomAD/json/GnomadCAF b/tests/data/gnomAD/json/GnomadCAF index ffd28fa..313a0d4 100644 --- a/tests/data/gnomAD/json/GnomadCAF +++ b/tests/data/gnomAD/json/GnomadCAF @@ -4,32 +4,6 @@ "title": "GnomadCAF", "type": "object", "$defs": { - "GrpMaxFAF95": { - "description": "The group maximum filtering allele frequency at 95% CI", - "protectedClassOf": "GnomadCAF", - "type": "object", - "maturity": "draft", - "properties": { - "frequency": { - "type": "number" - }, - "confidenceInterval": { - "type": "number", - "const": 0.95, - "default": 0.95 - }, - "groupId": { - "type": "string", - "description": "The genetic ancestry group from which the max frequency was calculated." - } - }, - "required": [ - "confidenceInterval", - "frequency", - "groupId" - ], - "additionalProperties": false - }, "GnomadCafProperties": { "description": "Additional properties specific to the gnomAD CAF model.", "protectedClassOf": "GnomadCAF", @@ -101,6 +75,32 @@ } }, "required": [] + }, + "GrpMaxFAF95": { + "description": "The group maximum filtering allele frequency at 95% CI", + "protectedClassOf": "GnomadCAF", + "type": "object", + "maturity": "draft", + "properties": { + "frequency": { + "type": "number" + }, + "confidenceInterval": { + "type": "number", + "const": 0.95, + "default": 0.95 + }, + "groupId": { + "type": "string", + "description": "The genetic ancestry group from which the max frequency was calculated." + } + }, + "required": [ + "confidenceInterval", + "frequency", + "groupId" + ], + "additionalProperties": false } }, "maturity": "draft", diff --git a/tests/data/vrs/def/Adjacency.rst b/tests/data/vrs/def/Adjacency.rst index ddfc877..35bd98b 100644 --- a/tests/data/vrs/def/Adjacency.rst +++ b/tests/data/vrs/def/Adjacency.rst @@ -1,13 +1,28 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** The `Adjacency` class can represent either the termination of a sequence or the adjoining of the end of a sequence with the beginning of an adjacent sequence, potentially with an intervening linker sequence. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - AJ + - ['adjoinedSequences', 'linker', 'type'] + + **Information Model** Some Adjacency attributes are inherited from :ref:`Variation`. @@ -19,42 +34,61 @@ Some Adjacency attributes are inherited from :ref:`Variation`. :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 0..1 - MUST be "Adjacency". * - digest + - - string - 0..1 - A sha512t24u digest created using the VRS Computed Identifier algorithm. * - expressions + - + .. raw:: html + + UL - :ref:`Expression` - 0..m - * - adjoinedSequences + - + .. raw:: html + + OL - :ref:`IRI` | :ref:`Location` - 1..2 - The terminal sequence or pair of adjoined sequences that defines in the adjacency. * - linker + - - :ref:`SequenceExpression` - 0..1 - The sequence found between adjoined sequences. diff --git a/tests/data/vrs/def/Allele.rst b/tests/data/vrs/def/Allele.rst index 3e5fd8c..1a7add7 100644 --- a/tests/data/vrs/def/Allele.rst +++ b/tests/data/vrs/def/Allele.rst @@ -1,13 +1,28 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** The state of a molecule at a :ref:`Location`. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - VA + - ['location', 'state', 'type'] + + **Information Model** Some Allele attributes are inherited from :ref:`Variation`. @@ -19,42 +34,58 @@ Some Allele attributes are inherited from :ref:`Variation`. :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 0..1 - MUST be "Allele" * - digest + - - string - 0..1 - A sha512t24u digest created using the VRS Computed Identifier algorithm. * - expressions + - + .. raw:: html + + UL - :ref:`Expression` - 0..m - * - location + - - :ref:`IRI` | :ref:`Location` - 1..1 - The location of the Allele * - state + - - :ref:`SequenceExpression` - 1..1 - An expression of the sequence state diff --git a/tests/data/vrs/def/CopyNumber.rst b/tests/data/vrs/def/CopyNumber.rst index 8020d57..b982575 100644 --- a/tests/data/vrs/def/CopyNumber.rst +++ b/tests/data/vrs/def/CopyNumber.rst @@ -13,38 +13,53 @@ Some CopyNumber attributes are inherited from :ref:`Variation`. :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 0..1 - * - digest + - - string - 0..1 - A sha512t24u digest created using the VRS Computed Identifier algorithm. * - expressions + - + .. raw:: html + + UL - :ref:`Expression` - 0..m - * - location + - - :ref:`IRI` | :ref:`Location` - 1..1 - A location for which the number of systemic copies is described. diff --git a/tests/data/vrs/def/CopyNumberChange.rst b/tests/data/vrs/def/CopyNumberChange.rst index 0441869..19e9abe 100644 --- a/tests/data/vrs/def/CopyNumberChange.rst +++ b/tests/data/vrs/def/CopyNumberChange.rst @@ -1,13 +1,28 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** An assessment of the copy number of a :ref:`Location` or a :ref:`Gene` within a system (e.g. genome, cell, etc.) relative to a baseline ploidy. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - CX + - ['copyChange', 'location', 'type'] + + **Information Model** Some CopyNumberChange attributes are inherited from :ref:`CopyNumber`. @@ -19,42 +34,58 @@ Some CopyNumberChange attributes are inherited from :ref:`CopyNumber`. :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 0..1 - MUST be "CopyNumberChange" * - digest + - - string - 0..1 - A sha512t24u digest created using the VRS Computed Identifier algorithm. * - expressions + - + .. raw:: html + + UL - :ref:`Expression` - 0..m - * - location + - - :ref:`IRI` | :ref:`Location` - 1..1 - A location for which the number of systemic copies is described. * - copyChange + - - string - 1..1 - MUST be one of "efo:0030069" (complete genomic loss), "efo:0020073" (high-level loss), "efo:0030068" (low-level loss), "efo:0030067" (loss), "efo:0030064" (regional base ploidy), "efo:0030070" (gain), "efo:0030071" (low-level gain), "efo:0030072" (high-level gain). diff --git a/tests/data/vrs/def/CopyNumberCount.rst b/tests/data/vrs/def/CopyNumberCount.rst index 869598e..c1bc418 100644 --- a/tests/data/vrs/def/CopyNumberCount.rst +++ b/tests/data/vrs/def/CopyNumberCount.rst @@ -1,13 +1,28 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** The absolute count of discrete copies of a :ref:`Location` or :ref:`Gene`, within a system (e.g. genome, cell, etc.). +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - CN + - ['copies', 'location', 'type'] + + **Information Model** Some CopyNumberCount attributes are inherited from :ref:`CopyNumber`. @@ -19,42 +34,58 @@ Some CopyNumberCount attributes are inherited from :ref:`CopyNumber`. :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 0..1 - MUST be "CopyNumberCount" * - digest + - - string - 0..1 - A sha512t24u digest created using the VRS Computed Identifier algorithm. * - expressions + - + .. raw:: html + + UL - :ref:`Expression` - 0..m - * - location + - - :ref:`IRI` | :ref:`Location` - 1..1 - A location for which the number of systemic copies is described. * - copies + - - integer | :ref:`Range` - 1..1 - The integral number of copies of the subject in a system diff --git a/tests/data/vrs/def/Expression.rst b/tests/data/vrs/def/Expression.rst index 3add21a..00b7914 100644 --- a/tests/data/vrs/def/Expression.rst +++ b/tests/data/vrs/def/Expression.rst @@ -1,8 +1,8 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** @@ -18,18 +18,22 @@ Representation of a variation by a specified nomenclature or syntax for a Variat :widths: auto * - Field + - Flags - Type - Limits - Description * - syntax + - - string - 1..1 - * - value + - - string - 1..1 - * - syntax_version + - - string - 0..1 - diff --git a/tests/data/vrs/def/Ga4ghIdentifiableObject.rst b/tests/data/vrs/def/Ga4ghIdentifiableObject.rst index b64490a..cfb547b 100644 --- a/tests/data/vrs/def/Ga4ghIdentifiableObject.rst +++ b/tests/data/vrs/def/Ga4ghIdentifiableObject.rst @@ -13,30 +13,40 @@ Some Ga4ghIdentifiableObject attributes are inherited from :ref:`gks.core:Entity :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 0..1 - * - digest + - - string - 0..1 - A sha512t24u digest created using the VRS Computed Identifier algorithm. diff --git a/tests/data/vrs/def/Haplotype.rst b/tests/data/vrs/def/Haplotype.rst index 523880a..4477906 100644 --- a/tests/data/vrs/def/Haplotype.rst +++ b/tests/data/vrs/def/Haplotype.rst @@ -1,13 +1,28 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** An ordered set of co-occurring :ref:`variants ` on the same molecule. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - HT + - ['members', 'type'] + + **Information Model** Some Haplotype attributes are inherited from :ref:`Variation`. @@ -19,38 +34,56 @@ Some Haplotype attributes are inherited from :ref:`Variation`. :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 0..1 - MUST be "Haplotype" * - digest + - - string - 0..1 - A sha512t24u digest created using the VRS Computed Identifier algorithm. * - expressions + - + .. raw:: html + + UL - :ref:`Expression` - 0..m - * - members + - + .. raw:: html + + OL - :ref:`Adjacency` | :ref:`Allele` | :ref:`IRI` - 2..m - A list of :ref:`Alleles ` and :ref:`Adjacencies ` that comprise a Haplotype. Members must share the same reference sequence as adjacent members. Alleles should not have overlapping or adjacent coordinates with neighboring Alleles. Neighboring alleles should be ordered by ascending coordinates, unless represented on a DNA inversion (following an Adjacency with end-defined adjoinedSequences), in which case they should be ordered in descending coordinates. Sequence references MUST be consistent for all members between and including the end of one Adjacency and the beginning of another. diff --git a/tests/data/vrs/def/LengthExpression.rst b/tests/data/vrs/def/LengthExpression.rst index 8a4c5d2..9aa8bd6 100644 --- a/tests/data/vrs/def/LengthExpression.rst +++ b/tests/data/vrs/def/LengthExpression.rst @@ -1,8 +1,8 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** @@ -19,30 +19,40 @@ Some LengthExpression attributes are inherited from :ref:`SequenceExpression`. :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 1..1 - MUST be "LengthExpression" * - length + - - :ref:`Range` | integer - 0..1 - diff --git a/tests/data/vrs/def/LiteralSequenceExpression.rst b/tests/data/vrs/def/LiteralSequenceExpression.rst index 34acf08..fe6c625 100644 --- a/tests/data/vrs/def/LiteralSequenceExpression.rst +++ b/tests/data/vrs/def/LiteralSequenceExpression.rst @@ -1,8 +1,8 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** @@ -19,30 +19,40 @@ Some LiteralSequenceExpression attributes are inherited from :ref:`SequenceExpre :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 1..1 - MUST be "LiteralSequenceExpression" * - sequence + - - :ref:`SequenceString` - 1..1 - the literal sequence diff --git a/tests/data/vrs/def/Range.rst b/tests/data/vrs/def/Range.rst index c2f1077..311084c 100644 --- a/tests/data/vrs/def/Range.rst +++ b/tests/data/vrs/def/Range.rst @@ -1,8 +1,8 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** diff --git a/tests/data/vrs/def/ReferenceLengthExpression.rst b/tests/data/vrs/def/ReferenceLengthExpression.rst index dbcf89d..86dec99 100644 --- a/tests/data/vrs/def/ReferenceLengthExpression.rst +++ b/tests/data/vrs/def/ReferenceLengthExpression.rst @@ -1,8 +1,8 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** @@ -19,38 +19,50 @@ Some ReferenceLengthExpression attributes are inherited from :ref:`SequenceExpre :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 1..1 - MUST be "ReferenceLengthExpression" * - length + - - integer | :ref:`Range` - 1..1 - The number of residues in the expressed sequence. * - sequence + - - :ref:`SequenceString` - 0..1 - the :ref:`Sequence` encoded by the Reference Length Expression. * - repeatSubunitLength + - - integer - 1..1 - The number of residues in the repeat subunit. diff --git a/tests/data/vrs/def/Residue.rst b/tests/data/vrs/def/Residue.rst index 1311832..e1a8545 100644 --- a/tests/data/vrs/def/Residue.rst +++ b/tests/data/vrs/def/Residue.rst @@ -1,8 +1,8 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** diff --git a/tests/data/vrs/def/SequenceExpression.rst b/tests/data/vrs/def/SequenceExpression.rst index 8f6f680..b03e2ca 100644 --- a/tests/data/vrs/def/SequenceExpression.rst +++ b/tests/data/vrs/def/SequenceExpression.rst @@ -13,26 +13,35 @@ Some SequenceExpression attributes are inherited from :ref:`gks.core:Entity`. :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 1..1 - The SequenceExpression class type. MUST match child class type. diff --git a/tests/data/vrs/def/SequenceLocation.rst b/tests/data/vrs/def/SequenceLocation.rst index 82501a7..fbda247 100644 --- a/tests/data/vrs/def/SequenceLocation.rst +++ b/tests/data/vrs/def/SequenceLocation.rst @@ -1,13 +1,28 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** A :ref:`Location` defined by an interval on a referenced :ref:`Sequence`. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - SL + - ['end', 'sequenceReference', 'start', 'type'] + + **Information Model** Some SequenceLocation attributes are inherited from :ref:`Ga4ghIdentifiableObject`. @@ -19,42 +34,55 @@ Some SequenceLocation attributes are inherited from :ref:`Ga4ghIdentifiableObjec :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 0..1 - MUST be "SequenceLocation" * - digest + - - string - 0..1 - A sha512t24u digest created using the VRS Computed Identifier algorithm. * - sequenceReference + - - :ref:`IRI` | :ref:`SequenceReference` - 0..1 - A :ref:`SequenceReference`. * - start + - - integer | :ref:`Range` - 0..1 - The start coordinate or range of the SequenceLocation. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range less than or equal to the value of `end`. * - end + - - integer | :ref:`Range` - 0..1 - The end coordinate or range of the SequenceLocation. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range greater than or equal to the value of `start`. diff --git a/tests/data/vrs/def/SequenceReference.rst b/tests/data/vrs/def/SequenceReference.rst index 8dd2e2d..b001a62 100644 --- a/tests/data/vrs/def/SequenceReference.rst +++ b/tests/data/vrs/def/SequenceReference.rst @@ -1,8 +1,8 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** @@ -19,34 +19,45 @@ Some SequenceReference attributes are inherited from :ref:`gks.core:Entity`. :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 0..1 - * - refgetAccession + - - string - 1..1 - A `GA4GH RefGet ` identifier for the referenced sequence, using the sha512t24u digest. * - residueAlphabet + - - string - 0..1 - The interpretation of the character codes referred to by the refget accession, where "aa" specifies an amino acid character set, and "na" specifies a nucleic acid character set. diff --git a/tests/data/vrs/def/SequenceString.rst b/tests/data/vrs/def/SequenceString.rst index 00e4356..ee09e11 100644 --- a/tests/data/vrs/def/SequenceString.rst +++ b/tests/data/vrs/def/SequenceString.rst @@ -1,8 +1,8 @@ .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + **Computational Definition** diff --git a/tests/data/vrs/def/Variation.rst b/tests/data/vrs/def/Variation.rst index 4d40047..4606eaa 100644 --- a/tests/data/vrs/def/Variation.rst +++ b/tests/data/vrs/def/Variation.rst @@ -13,34 +13,48 @@ Some Variation attributes are inherited from :ref:`Ga4ghIdentifiableObject`. :widths: auto * - Field + - Flags - Type - Limits - Description * - id + - - string - 0..1 - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). * - label + - - string - 0..1 - A primary label for the entity. * - description + - - string - 0..1 - A free-text description of the entity. * - extensions + - + .. raw:: html + + OL - :ref:`Extension` - 0..m - * - type + - - string - 0..1 - * - digest + - - string - 0..1 - A sha512t24u digest created using the VRS Computed Identifier algorithm. * - expressions + - + .. raw:: html + + UL - :ref:`Expression` - 0..m - From 1f44b073f93ae039b4acd81d7f46e1580f3b31dd Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Mon, 4 Nov 2024 17:31:06 -0500 Subject: [PATCH 03/16] update readme --- README.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 3860bb9..2ca7399 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,13 @@ # gks-metaschema Tools and scripts for parsing the GA4GH Genomic Knowledge Standards (GKS) metaschemas. -The metaschema processor converts [JSON Schema Version 2020-12](json-schema.org/draft/2020-12/schema) -in YAML to reStructuredText and JSON files. +The metaschema processor (MSP) converts +[JSON Schema Version 2020-12](json-schema.org/draft/2020-12/schema) in YAML to +reStructuredText (RST) and JSON files. Currently used in: +* [GKS-Core](https://github.com/ga4gh/gks-core) * [VRS](https://github.com/ga4gh/vrs) * [VA-Spec](https://github.com/ga4gh/va-spec/) * [Cat-VRS](https://github.com/ga4gh/cat-vrs) @@ -18,7 +20,8 @@ Currently used in: ### Installation Steps -Fork the repo at . +Fork the repo at , and initialize a development +environment. git clone git@github.com:YOUR_GITHUB_ID/gks-metaschema.git cd gks-metaschema From d505c29d1ae4cad73d2f3587a985cca68ed6b72e Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Mon, 4 Nov 2024 23:45:39 -0500 Subject: [PATCH 04/16] deprecated code should be X --- src/ga4gh/gks/metaschema/scripts/y2t.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/ga4gh/gks/metaschema/scripts/y2t.py b/src/ga4gh/gks/metaschema/scripts/y2t.py index 5fd4648..fde3091 100755 --- a/src/ga4gh/gks/metaschema/scripts/y2t.py +++ b/src/ga4gh/gks/metaschema/scripts/y2t.py @@ -13,7 +13,7 @@ "draft": ("D3D3D3", "D"), "trial_use": ("FFFF99", "TU"), "normative": ("B6D7A8", "N"), - "deprecated": ("EA9999", "DP") + "deprecated": ("EA9999", "X") } # Mapping to corresponding code for ordered property in arrays From 3e7bb252b26a05c53dc00ad069383d99484d6f68 Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Mon, 4 Nov 2024 23:48:46 -0500 Subject: [PATCH 05/16] ga4gh digest table will show if ga4ghDigest is present --- src/ga4gh/gks/metaschema/scripts/y2t.py | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/ga4gh/gks/metaschema/scripts/y2t.py b/src/ga4gh/gks/metaschema/scripts/y2t.py index fde3091..fb54b68 100755 --- a/src/ga4gh/gks/metaschema/scripts/y2t.py +++ b/src/ga4gh/gks/metaschema/scripts/y2t.py @@ -89,10 +89,8 @@ def add_ga4gh_digest(class_definition: dict, f: TextIOWrapper) -> None: """ ga4gh_digest = class_definition.get("ga4ghDigest") or {} if ga4gh_digest: - ga4gh_prefix = ga4gh_digest.get("prefix") or "" - ga4gh_keys = ga4gh_digest.get("keys") or [] - if ga4gh_prefix and ga4gh_keys: - print(f""" + ga4gh_prefix = ga4gh_digest.get("prefix") or "None" + print(f""" **GA4GH Digest** .. list-table:: From 75305de211a3ebf13515e51dacb0d3d1f2a3c4fa Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Mon, 4 Nov 2024 23:54:27 -0500 Subject: [PATCH 06/16] rm var --- src/ga4gh/gks/metaschema/scripts/y2t.py | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/src/ga4gh/gks/metaschema/scripts/y2t.py b/src/ga4gh/gks/metaschema/scripts/y2t.py index fb54b68..e4b62a0 100755 --- a/src/ga4gh/gks/metaschema/scripts/y2t.py +++ b/src/ga4gh/gks/metaschema/scripts/y2t.py @@ -89,7 +89,6 @@ def add_ga4gh_digest(class_definition: dict, f: TextIOWrapper) -> None: """ ga4gh_digest = class_definition.get("ga4ghDigest") or {} if ga4gh_digest: - ga4gh_prefix = ga4gh_digest.get("prefix") or "None" print(f""" **GA4GH Digest** @@ -102,7 +101,7 @@ def add_ga4gh_digest(class_definition: dict, f: TextIOWrapper) -> None: * - Prefix - Keys - * - {ga4gh_prefix} + * - {ga4gh_digest.get("prefix") or "None"} - {str(ga4gh_digest.get("keys") or [])}\n""", file=f) @@ -143,15 +142,15 @@ def main(proc_schema): if maturity == 'draft': print(""" .. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in + significantly in future releases. Maturity levels are described in the :ref:`maturity-model`. - + """, file=f) elif maturity == 'trial use': print(""" .. note:: This data class is at a **trial use** maturity level and may change in future releases. Maturity levels are described in the :ref:`maturity-model`. - + """, file=f) print("**Computational Definition**\n", file=f) print(class_definition['description'], file=f) From 7730952394d89dcda9b797b0d8421b3d2629fbdf Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Mon, 4 Nov 2024 23:55:25 -0500 Subject: [PATCH 07/16] udpate tests --- tests/data/vrs/def/CopyNumber.rst | 15 +++++++++++++++ tests/data/vrs/def/Ga4ghIdentifiableObject.rst | 15 +++++++++++++++ tests/data/vrs/def/LengthExpression.rst | 15 +++++++++++++++ tests/data/vrs/def/LiteralSequenceExpression.rst | 15 +++++++++++++++ tests/data/vrs/def/ReferenceLengthExpression.rst | 15 +++++++++++++++ tests/data/vrs/def/SequenceExpression.rst | 15 +++++++++++++++ tests/data/vrs/def/SequenceReference.rst | 15 +++++++++++++++ tests/data/vrs/def/Variation.rst | 15 +++++++++++++++ 8 files changed, 120 insertions(+) diff --git a/tests/data/vrs/def/CopyNumber.rst b/tests/data/vrs/def/CopyNumber.rst index b982575..9af40ee 100644 --- a/tests/data/vrs/def/CopyNumber.rst +++ b/tests/data/vrs/def/CopyNumber.rst @@ -2,6 +2,21 @@ A measure of the copies of a :ref:`Location` within a system (e.g. genome, cell, etc.) +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - None + - ['location', 'type'] + + **Information Model** Some CopyNumber attributes are inherited from :ref:`Variation`. diff --git a/tests/data/vrs/def/Ga4ghIdentifiableObject.rst b/tests/data/vrs/def/Ga4ghIdentifiableObject.rst index cfb547b..5998f87 100644 --- a/tests/data/vrs/def/Ga4ghIdentifiableObject.rst +++ b/tests/data/vrs/def/Ga4ghIdentifiableObject.rst @@ -2,6 +2,21 @@ A contextual value object for which a GA4GH computed identifier can be created. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - None + - ['type'] + + **Information Model** Some Ga4ghIdentifiableObject attributes are inherited from :ref:`gks.core:Entity`. diff --git a/tests/data/vrs/def/LengthExpression.rst b/tests/data/vrs/def/LengthExpression.rst index 9aa8bd6..fc92452 100644 --- a/tests/data/vrs/def/LengthExpression.rst +++ b/tests/data/vrs/def/LengthExpression.rst @@ -8,6 +8,21 @@ A sequence expressed only by its length. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - None + - ['length', 'type'] + + **Information Model** Some LengthExpression attributes are inherited from :ref:`SequenceExpression`. diff --git a/tests/data/vrs/def/LiteralSequenceExpression.rst b/tests/data/vrs/def/LiteralSequenceExpression.rst index fe6c625..fd20bb8 100644 --- a/tests/data/vrs/def/LiteralSequenceExpression.rst +++ b/tests/data/vrs/def/LiteralSequenceExpression.rst @@ -8,6 +8,21 @@ An explicit expression of a Sequence. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - None + - ['sequence', 'type'] + + **Information Model** Some LiteralSequenceExpression attributes are inherited from :ref:`SequenceExpression`. diff --git a/tests/data/vrs/def/ReferenceLengthExpression.rst b/tests/data/vrs/def/ReferenceLengthExpression.rst index 86dec99..6143ceb 100644 --- a/tests/data/vrs/def/ReferenceLengthExpression.rst +++ b/tests/data/vrs/def/ReferenceLengthExpression.rst @@ -8,6 +8,21 @@ An expression of a length of a sequence from a repeating reference. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - None + - ['length', 'repeatSubunitLength', 'type'] + + **Information Model** Some ReferenceLengthExpression attributes are inherited from :ref:`SequenceExpression`. diff --git a/tests/data/vrs/def/SequenceExpression.rst b/tests/data/vrs/def/SequenceExpression.rst index b03e2ca..32b74d1 100644 --- a/tests/data/vrs/def/SequenceExpression.rst +++ b/tests/data/vrs/def/SequenceExpression.rst @@ -2,6 +2,21 @@ An expression describing a :ref:`Sequence`. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - None + - ['type'] + + **Information Model** Some SequenceExpression attributes are inherited from :ref:`gks.core:Entity`. diff --git a/tests/data/vrs/def/SequenceReference.rst b/tests/data/vrs/def/SequenceReference.rst index b001a62..6fb9be2 100644 --- a/tests/data/vrs/def/SequenceReference.rst +++ b/tests/data/vrs/def/SequenceReference.rst @@ -8,6 +8,21 @@ A sequence of nucleic or amino acid character codes. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - None + - [] + + **Information Model** Some SequenceReference attributes are inherited from :ref:`gks.core:Entity`. diff --git a/tests/data/vrs/def/Variation.rst b/tests/data/vrs/def/Variation.rst index 4606eaa..1793ae1 100644 --- a/tests/data/vrs/def/Variation.rst +++ b/tests/data/vrs/def/Variation.rst @@ -2,6 +2,21 @@ A representation of the state of one or more biomolecules. +**GA4GH Digest** + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Prefix + - Keys + + * - None + - ['type'] + + **Information Model** Some Variation attributes are inherited from :ref:`Ga4ghIdentifiableObject`. From 5d5f0094252517fd157943934e077365af02de46 Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Tue, 5 Nov 2024 08:08:51 -0500 Subject: [PATCH 08/16] update ordered codes --- src/ga4gh/gks/metaschema/scripts/y2t.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/ga4gh/gks/metaschema/scripts/y2t.py b/src/ga4gh/gks/metaschema/scripts/y2t.py index e4b62a0..96e0bf2 100755 --- a/src/ga4gh/gks/metaschema/scripts/y2t.py +++ b/src/ga4gh/gks/metaschema/scripts/y2t.py @@ -18,8 +18,8 @@ # Mapping to corresponding code for ordered property in arrays ORDERED_MAPPING: dict[bool, str] = { - True: "OL", - False: "UL" + True: "↓", + False: "⋮" } @@ -120,7 +120,7 @@ def resolve_flags(class_property_attributes: dict) -> str: flags += f""" .. raw:: html - {maturity_code}""" + {maturity_code}""" ordered = class_property_attributes.get("ordered") ordered_code = ORDERED_MAPPING.get(ordered, None) @@ -131,7 +131,7 @@ def resolve_flags(class_property_attributes: dict) -> str: .. raw:: html\n""" flags += f""" - {ordered_code}""" + {ordered_code}""" return flags From 7ca229433e10d4c4c659858c56e696d8a90c138d Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Tue, 5 Nov 2024 08:19:51 -0500 Subject: [PATCH 09/16] add title --- src/ga4gh/gks/metaschema/scripts/y2t.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/ga4gh/gks/metaschema/scripts/y2t.py b/src/ga4gh/gks/metaschema/scripts/y2t.py index 96e0bf2..10985c4 100755 --- a/src/ga4gh/gks/metaschema/scripts/y2t.py +++ b/src/ga4gh/gks/metaschema/scripts/y2t.py @@ -117,21 +117,23 @@ def resolve_flags(class_property_attributes: dict) -> str: if maturity is not None: background_color, maturity_code = MATURITY_MAPPING.get(maturity, (None, None)) if background_color and maturity_code: + title = f"{maturity.replace("_", " ").title()} Maturity Level" flags += f""" .. raw:: html - {maturity_code}""" + {maturity_code}""" ordered = class_property_attributes.get("ordered") ordered_code = ORDERED_MAPPING.get(ordered, None) if ordered_code is not None: + title = "Ordered" if ordered else "Unordered" if not flags: flags += """ .. raw:: html\n""" flags += f""" - {ordered_code}""" + {ordered_code}""" return flags From 38945e6fec528f7ae049d1ea200a0973b2ae73dd Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Tue, 5 Nov 2024 08:28:45 -0500 Subject: [PATCH 10/16] update structure --- README.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 2ca7399..184d7e5 100644 --- a/README.md +++ b/README.md @@ -49,6 +49,7 @@ The metaschema processor expects the following hierarchy: │ | ├── gks-schema-source.yaml │ | ├── Makefile │ | ├── prune.mk + │ ├── Makefile * `docs`: [Sphinx](https://www.sphinx-doc.org/en/master/index.html) documentation directory. **Must** be named `docs`. @@ -59,13 +60,14 @@ The metaschema processor expects the following hierarchy: * `schema`: Schema directory. Can also contain submodules for other GKS product schemas. * `gks_schema`: Schema directory for GKS product. The directory name should reflect the product, e.g. `vrs`. - * `gks-schema-source.yaml`: Source document for the JSON Schema 2020-12. The file name - should reflect the standard, e.g. `vrs-source.yaml`. The file name **must** end - with `-source.yaml`. + * `gks-schema-source.yaml`: Source document for the JSON Schema 2020-12. The file name + should reflect the standard, e.g. `vrs-source.yaml`. The file name **must** end + with `-source.yaml`. + * `Makefile`: Commands to create the reStructuredText and JSON files. + This file should not change across GKS projects. + * `prune.mk`: Cleanup of files in `def` and `json` directories based on source document. + This file should not change across GKS projects. * `Makefile`: Commands to create the reStructuredText and JSON files. - This file should not change across GKS projects. - * `prune.mk`: Cleanup of files in `def` and `json` directories based on source document. - This file should not change across GKS projects. ### Contributing to the schema From d2c4e386b0c0462b4514ab154d1ef924776656cc Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Tue, 5 Nov 2024 08:29:51 -0500 Subject: [PATCH 11/16] forgot to update one more --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 184d7e5..40a1c85 100644 --- a/README.md +++ b/README.md @@ -87,6 +87,7 @@ The file structure will now look like: │ | ├── gks-schema-source.yaml │ | ├── Makefile │ | ├── prune.mk + │ ├── Makefile ### Contributing to the docs From e019b807fec493247379bbc2079d1ba8cdcefa97 Mon Sep 17 00:00:00 2001 From: Kyle Ferriter Date: Thu, 7 Nov 2024 13:36:26 -0500 Subject: [PATCH 12/16] GKS hackathon changes tracking (#28) * Add dependencies to pyproject, and optional dev * Delete .requirements.txt * Add latest ruff as dev dependency * Improve README * Add ci.yaml github action. Add ruff cfg to pyproject * Fix easy lint issues * Format code * Add `pre-commit` script * Type hints and comments * Add templates * Make devready install .[dev] * Delete all json/rst files compiled by tests. Aren't used. --------- Co-authored-by: Liam Mulhall Co-authored-by: Terry ONeill Co-authored-by: Liam Mulhall Co-authored-by: Kori Kuzma --- .github/workflows/ci.yaml | 33 ++ .gitignore | 3 +- .requirements.txt | 2 - Makefile | 2 +- README.md | 7 + pyproject.toml | 39 ++- scripts/pre-commit | 16 + src/ga4gh/gks/metaschema/scripts/jsy2js.py | 7 +- .../gks/metaschema/scripts/source2classes.py | 9 +- .../gks/metaschema/scripts/source2jsy.py | 5 +- .../metaschema/scripts/source2mergedjsy.py | 5 +- .../gks/metaschema/scripts/source2splitjs.py | 71 ++-- src/ga4gh/gks/metaschema/scripts/y2t.py | 180 ++++++---- src/ga4gh/gks/metaschema/tools/source_proc.py | 319 +++++++++--------- src/templates/maturity | 4 + tests/data/gnomAD/json/GnomadCAF | 116 ------- tests/data/vrs/def/Adjacency.rst | 94 ------ tests/data/vrs/def/Allele.rst | 91 ----- tests/data/vrs/def/CopyNumber.rst | 80 ----- tests/data/vrs/def/CopyNumberChange.rst | 91 ----- tests/data/vrs/def/CopyNumberCount.rst | 91 ----- tests/data/vrs/def/Expression.rst | 39 --- .../data/vrs/def/Ga4ghIdentifiableObject.rst | 67 ---- tests/data/vrs/def/Haplotype.rst | 89 ----- tests/data/vrs/def/LengthExpression.rst | 73 ---- .../vrs/def/LiteralSequenceExpression.rst | 73 ---- tests/data/vrs/def/Location.rst | 3 - tests/data/vrs/def/MolecularVariation.rst | 3 - tests/data/vrs/def/Range.rst | 9 - .../vrs/def/ReferenceLengthExpression.rst | 83 ----- tests/data/vrs/def/Residue.rst | 9 - tests/data/vrs/def/SequenceExpression.rst | 62 ---- tests/data/vrs/def/SequenceLocation.rst | 88 ----- tests/data/vrs/def/SequenceReference.rst | 78 ----- tests/data/vrs/def/SequenceString.rst | 9 - tests/data/vrs/def/SystemicVariation.rst | 3 - tests/data/vrs/def/ValueObject.rst | 3 - tests/data/vrs/def/Variation.rst | 75 ---- tests/data/vrs/json/Adjacency | 91 ----- tests/data/vrs/json/Allele | 85 ----- tests/data/vrs/json/CopyNumberChange | 85 ----- tests/data/vrs/json/CopyNumberCount | 82 ----- tests/data/vrs/json/Expression | 36 -- tests/data/vrs/json/Haplotype | 78 ----- tests/data/vrs/json/LengthExpression | 55 --- tests/data/vrs/json/LiteralSequenceExpression | 50 --- tests/data/vrs/json/Location | 11 - tests/data/vrs/json/MolecularVariation | 18 - tests/data/vrs/json/Range | 21 -- tests/data/vrs/json/ReferenceLengthExpression | 67 ---- tests/data/vrs/json/Residue | 9 - tests/data/vrs/json/SequenceExpression | 18 - tests/data/vrs/json/SequenceLocation | 84 ----- tests/data/vrs/json/SequenceReference | 53 --- tests/data/vrs/json/SequenceString | 9 - tests/data/vrs/json/SystemicVariation | 18 - tests/data/vrs/json/Variation | 27 -- tests/test_basic.py | 40 ++- 58 files changed, 450 insertions(+), 2518 deletions(-) create mode 100644 .github/workflows/ci.yaml delete mode 100644 .requirements.txt create mode 100755 scripts/pre-commit create mode 100644 src/templates/maturity delete mode 100644 tests/data/gnomAD/json/GnomadCAF delete mode 100644 tests/data/vrs/def/Adjacency.rst delete mode 100644 tests/data/vrs/def/Allele.rst delete mode 100644 tests/data/vrs/def/CopyNumber.rst delete mode 100644 tests/data/vrs/def/CopyNumberChange.rst delete mode 100644 tests/data/vrs/def/CopyNumberCount.rst delete mode 100644 tests/data/vrs/def/Expression.rst delete mode 100644 tests/data/vrs/def/Ga4ghIdentifiableObject.rst delete mode 100644 tests/data/vrs/def/Haplotype.rst delete mode 100644 tests/data/vrs/def/LengthExpression.rst delete mode 100644 tests/data/vrs/def/LiteralSequenceExpression.rst delete mode 100644 tests/data/vrs/def/Location.rst delete mode 100644 tests/data/vrs/def/MolecularVariation.rst delete mode 100644 tests/data/vrs/def/Range.rst delete mode 100644 tests/data/vrs/def/ReferenceLengthExpression.rst delete mode 100644 tests/data/vrs/def/Residue.rst delete mode 100644 tests/data/vrs/def/SequenceExpression.rst delete mode 100644 tests/data/vrs/def/SequenceLocation.rst delete mode 100644 tests/data/vrs/def/SequenceReference.rst delete mode 100644 tests/data/vrs/def/SequenceString.rst delete mode 100644 tests/data/vrs/def/SystemicVariation.rst delete mode 100644 tests/data/vrs/def/ValueObject.rst delete mode 100644 tests/data/vrs/def/Variation.rst delete mode 100644 tests/data/vrs/json/Adjacency delete mode 100644 tests/data/vrs/json/Allele delete mode 100644 tests/data/vrs/json/CopyNumberChange delete mode 100644 tests/data/vrs/json/CopyNumberCount delete mode 100644 tests/data/vrs/json/Expression delete mode 100644 tests/data/vrs/json/Haplotype delete mode 100644 tests/data/vrs/json/LengthExpression delete mode 100644 tests/data/vrs/json/LiteralSequenceExpression delete mode 100644 tests/data/vrs/json/Location delete mode 100644 tests/data/vrs/json/MolecularVariation delete mode 100644 tests/data/vrs/json/Range delete mode 100644 tests/data/vrs/json/ReferenceLengthExpression delete mode 100644 tests/data/vrs/json/Residue delete mode 100644 tests/data/vrs/json/SequenceExpression delete mode 100644 tests/data/vrs/json/SequenceLocation delete mode 100644 tests/data/vrs/json/SequenceReference delete mode 100644 tests/data/vrs/json/SequenceString delete mode 100644 tests/data/vrs/json/SystemicVariation delete mode 100644 tests/data/vrs/json/Variation diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml new file mode 100644 index 0000000..4ad2a30 --- /dev/null +++ b/.github/workflows/ci.yaml @@ -0,0 +1,33 @@ +name: CI + +on: + push: + branches: + - main + pull_request: + +jobs: + test: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: recursive + + - name: Set up Python 3.12 + uses: actions/setup-python@v5 + with: + python-version: '3.12' + architecture: 'x64' + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install '.[dev]' + + - name: Run lint + format + run: | + ruff check src + + - name: Run tests + run: pytest diff --git a/.gitignore b/.gitignore index 8597052..ea2a03c 100644 --- a/.gitignore +++ b/.gitignore @@ -132,4 +132,5 @@ dmypy.json Pipfile* # IDEs -.vscode \ No newline at end of file +.vscode +.idea diff --git a/.requirements.txt b/.requirements.txt deleted file mode 100644 index 9eb2b58..0000000 --- a/.requirements.txt +++ /dev/null @@ -1,2 +0,0 @@ -pyyaml -pytest diff --git a/Makefile b/Makefile index b14308a..a656cf0 100644 --- a/Makefile +++ b/Makefile @@ -15,7 +15,7 @@ venv/%: #=> develop: install package in develop mode .PHONY: develop setup develop setup: - pip install -e . + pip install -e '.[dev]' #=> devready: create venv, install prerequisites, install pkg in develop mode .PHONY: devready diff --git a/README.md b/README.md index 40a1c85..dc69525 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,6 @@ # gks-metaschema + Tools and scripts for parsing the GA4GH Genomic Knowledge Standards (GKS) metaschemas. The metaschema processor (MSP) converts [JSON Schema Version 2020-12](json-schema.org/draft/2020-12/schema) in YAML to @@ -28,6 +29,12 @@ environment. make devready source venv/3.12/bin/activate + +Set up the `pre-commit` hook + + cp ./scripts/pre-commit ./.git/hooks/ + + ### Testing To run the tests: diff --git a/pyproject.toml b/pyproject.toml index 8224519..28afb69 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -28,7 +28,17 @@ keywords = [ "variation" ] requires-python = ">=3.12" -dynamic = ["version", "dependencies"] +dependencies = [ + "pyyaml", + "Jinja2" +] +dynamic = ["version"] + +[project.optional-dependencies] +dev = [ + "pytest", + "ruff==0.7.2" +] [project.urls] Homepage = "https://github.com/ga4gh/gks-metaschema" @@ -37,9 +47,6 @@ Changelog = "https://github.com/ga4gh/gks-metaschema/releases" Source = "https://github.com/ga4gh/gks-metaschema" "Bug Tracker" = "https://github.com/ga4gh/gks-metaschema/issues" -[tool.setuptools.dynamic] -dependencies = {file = [".requirements.txt"]} - [tool.setuptools_scm] [project.scripts] @@ -53,3 +60,27 @@ source2classes = "ga4gh.gks.metaschema.scripts.source2classes:cli" [build-system] requires = ["setuptools>=65.3", "setuptools_scm>=8"] build-backend = "setuptools.build_meta" + + +[tool.ruff] +line-length = 120 +target-version = "py312" + +[tool.ruff.lint] +select = [ + "C", + "F", + "I", + "E", + "W" +] +fixable = ["ALL"] +ignore = ["C901"] + +[tool.ruff.format] +# Like Black, use double quotes for strings. +quote-style = "double" +# Like Black, indent with spaces, rather than tabs. +indent-style = "space" +# Like Black, respect magic trailing commas. +skip-magic-trailing-comma = false diff --git a/scripts/pre-commit b/scripts/pre-commit new file mode 100755 index 0000000..48d61b8 --- /dev/null +++ b/scripts/pre-commit @@ -0,0 +1,16 @@ +#!/usr/bin/env bash +# This pre-commit script should be placed in the .git/hooks/ directory. +# It runs code quality checks prior to a commit. + + +# Get and change to the root of the repo +project_root_dir=`git rev-parse --show-toplevel` +cd "$project_root_dir" || exit 1 + +# Immediately exit if there's an error. +set -e + +ruff check # Run the linter. +ruff check --select I --fix # Sort imports. +ruff format # Run the formatter. +pytest # Run the test suite. diff --git a/src/ga4gh/gks/metaschema/scripts/jsy2js.py b/src/ga4gh/gks/metaschema/scripts/jsy2js.py index 729739d..d8b6265 100755 --- a/src/ga4gh/gks/metaschema/scripts/jsy2js.py +++ b/src/ga4gh/gks/metaschema/scripts/jsy2js.py @@ -1,12 +1,15 @@ #!/usr/bin/env python3 -import yaml import json import sys +import yaml + + def cli(): yaml_schema = yaml.load(sys.stdin, Loader=yaml.SafeLoader) json.dump(yaml_schema, sys.stdout, indent=3) + if __name__ == "__main__": - cli() \ No newline at end of file + cli() diff --git a/src/ga4gh/gks/metaschema/scripts/source2classes.py b/src/ga4gh/gks/metaschema/scripts/source2classes.py index 14494bb..5584345 100755 --- a/src/ga4gh/gks/metaschema/scripts/source2classes.py +++ b/src/ga4gh/gks/metaschema/scripts/source2classes.py @@ -1,8 +1,9 @@ #!/usr/bin/env python3 import argparse -from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor from pathlib import Path +from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor + parser = argparse.ArgumentParser() parser.add_argument("infile") @@ -13,10 +14,12 @@ def main(proc): continue print(cls) + def cli(): args = parser.parse_args() p = YamlSchemaProcessor(Path(args.infile)) main(p) -if __name__ == '__main__': - cli() \ No newline at end of file + +if __name__ == "__main__": + cli() diff --git a/src/ga4gh/gks/metaschema/scripts/source2jsy.py b/src/ga4gh/gks/metaschema/scripts/source2jsy.py index 6546f71..d2173b4 100755 --- a/src/ga4gh/gks/metaschema/scripts/source2jsy.py +++ b/src/ga4gh/gks/metaschema/scripts/source2jsy.py @@ -2,12 +2,15 @@ import pathlib import sys + from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor + def cli(): source_file = pathlib.Path(sys.argv[1]) p = YamlSchemaProcessor(source_file) p.js_yaml_dump(sys.stdout) + if __name__ == "__main__": - cli() \ No newline at end of file + cli() diff --git a/src/ga4gh/gks/metaschema/scripts/source2mergedjsy.py b/src/ga4gh/gks/metaschema/scripts/source2mergedjsy.py index 3532300..f957867 100644 --- a/src/ga4gh/gks/metaschema/scripts/source2mergedjsy.py +++ b/src/ga4gh/gks/metaschema/scripts/source2mergedjsy.py @@ -2,13 +2,16 @@ import pathlib import sys + from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor + def cli(): source_file = pathlib.Path(sys.argv[1]) p = YamlSchemaProcessor(source_file) p.merge_imported() p.js_yaml_dump(sys.stdout) + if __name__ == "__main__": - cli() \ No newline at end of file + cli() diff --git a/src/ga4gh/gks/metaschema/scripts/source2splitjs.py b/src/ga4gh/gks/metaschema/scripts/source2splitjs.py index aef0e35..e6281fc 100644 --- a/src/ga4gh/gks/metaschema/scripts/source2splitjs.py +++ b/src/ga4gh/gks/metaschema/scripts/source2splitjs.py @@ -1,42 +1,50 @@ #!/usr/bin/env python3 -from pathlib import Path -from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor import argparse -import re -import os import copy import json +import os +import re +from pathlib import Path + +from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor parser = argparse.ArgumentParser() parser.add_argument("infile") -def _redirect_refs(obj, dest_path, root_proc, mode): - frag_re = re.compile(r'(/\$defs|definitions)/(\w+)') +def _redirect_refs(obj: dict | list, dest_path: Path, root_proc: YamlSchemaProcessor, mode: str) -> dict | list: + """Process the list of references and returns the list of classes + + :param obj: list of schema objects + :param dest_path: destination output path + :param root_proc: the root YamlSchemaProcessor + :param mode: output mode of "json" or "yaml" + """ + frag_re = re.compile(r"(/\$defs|definitions)/(\w+)") if isinstance(obj, list): return [_redirect_refs(x, dest_path, root_proc, mode) for x in obj] elif isinstance(obj, dict): for k, v in obj.items(): - if k == '$ref': - parts = v.split('#') + if k == "$ref": + parts = v.split("#") if len(parts) == 2: ref, fragment = parts elif len(parts) == 1: ref = parts[0] - fragment = '' + fragment = "" else: - raise ValueError(f'Expected only one fragment operator.') + raise ValueError("Expected only one fragment operator.") if fragment: m = frag_re.match(fragment) assert m is not None ref_class = m.group(2) else: - ref_class = ref.split('/')[-1].split('.')[0] + ref_class = ref.split("/")[-1].split(".")[0] # Test if reference is for internal or external object # and retrieve appropriate processor for export path - if ref == '': + if ref == "": proc = root_proc else: proc = None @@ -44,12 +52,12 @@ def _redirect_refs(obj, dest_path, root_proc, mode): if ref_class in other.defs: proc = other if proc is None: - raise ValueError(f'Could not find {ref_class} in processors') + raise ValueError(f"Could not find {ref_class} in processors") # if reference is protected for the class being processed, return only fragment - if ref == '' and proc.class_is_protected(ref_class): - containing_class = proc.raw_defs[ref_class]['protectedClassOf'] + if ref == "" and proc.class_is_protected(ref_class): + containing_class = proc.raw_defs[ref_class]["protectedClassOf"] if containing_class == dest_path.name: - obj[k] = f'#{fragment}' + obj[k] = f"#{fragment}" return obj obj[k] = proc.get_class_abs_path(ref_class, mode) else: @@ -59,26 +67,31 @@ def _redirect_refs(obj, dest_path, root_proc, mode): return obj -def split_defs_to_js(root_proc, mode='json'): - if mode == 'json': +def split_defs_to_js(root_proc: YamlSchemaProcessor, mode: str = "json") -> None: + """Splits the classes defined in the schema into json files. + + :param root_proc: root YamlSchemaProcessor + :param mode: str, defaults to "json" + """ + if mode == "json": fp = root_proc.json_fp - elif mode == 'yaml': + elif mode == "yaml": fp = root_proc.yaml_fp else: - raise ValueError('mode must be json or yaml') + raise ValueError("mode must be json or yaml") os.makedirs(fp, exist_ok=True) kw = root_proc.schema_def_keyword for cls in root_proc.for_js[kw].keys(): if root_proc.class_is_protected(cls): continue class_def = copy.deepcopy(root_proc.for_js[kw][cls]) - target_path = fp / f'{cls}' + target_path = fp / f"{cls}" out_doc = copy.deepcopy(root_proc.for_js) if cls in root_proc.has_protected_members: - def_dict = dict() + def_dict = {} keep = False for protected_cls in root_proc.has_protected_members[cls]: - if root_proc.raw_defs[protected_cls]['protectedClassOf'] == cls: + if root_proc.raw_defs[protected_cls]["protectedClassOf"] == cls: def_dict[protected_cls] = copy.deepcopy(root_proc.defs[protected_cls]) keep = True if keep: @@ -89,15 +102,17 @@ def split_defs_to_js(root_proc, mode='json'): out_doc.pop(kw, None) class_def = _redirect_refs(class_def, target_path, root_proc, mode) out_doc.update(class_def) - out_doc['title'] = cls - out_doc['$id'] = root_proc.get_class_uri(cls, mode) - with open(target_path, 'w') as f: + out_doc["title"] = cls + out_doc["$id"] = root_proc.get_class_uri(cls, mode) + with open(target_path, "w") as f: json.dump(out_doc, f, indent=3, sort_keys=False) + def cli(): args = parser.parse_args() p = YamlSchemaProcessor(Path(args.infile)) split_defs_to_js(p) -if __name__ == '__main__': - cli() \ No newline at end of file + +if __name__ == "__main__": + cli() diff --git a/src/ga4gh/gks/metaschema/scripts/y2t.py b/src/ga4gh/gks/metaschema/scripts/y2t.py index 10985c4..422e004 100755 --- a/src/ga4gh/gks/metaschema/scripts/y2t.py +++ b/src/ga4gh/gks/metaschema/scripts/y2t.py @@ -1,84 +1,100 @@ #!/usr/bin/env python3 """convert input .yaml to .rst artifacts""" -from io import TextIOWrapper import os -import sys import pathlib +import sys +from io import TextIOWrapper +from pathlib import Path + +from jinja2 import Environment, FileSystemLoader + from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor +templates_dir = Path(__file__).resolve().parents[4] / "templates" +env = Environment(loader=FileSystemLoader(templates_dir)) # Mapping to corresponding hex color code and code for maturity status MATURITY_MAPPING: dict[str, tuple[str, str]] = { "draft": ("D3D3D3", "D"), "trial_use": ("FFFF99", "TU"), "normative": ("B6D7A8", "N"), - "deprecated": ("EA9999", "X") + "deprecated": ("EA9999", "X"), } # Mapping to corresponding code for ordered property in arrays -ORDERED_MAPPING: dict[bool, str] = { - True: "↓", - False: "⋮" -} +ORDERED_MAPPING: dict[bool, str] = {True: "↓", False: "⋮"} + +def resolve_type(class_property_definition: dict) -> str: + """Resolves a class definition to a concrete type. -def resolve_type(class_property_definition): - if 'type' in class_property_definition: - if class_property_definition['type'] == 'array': - return resolve_type(class_property_definition['items']) - return class_property_definition['type'] - elif '$ref' in class_property_definition: - ref = class_property_definition['$ref'] - identifier = ref.split('/')[-1] - return f':ref:`{identifier}`' - elif '$refCurie' in class_property_definition: - ref = class_property_definition['$refCurie'] - identifier = ref.split('/')[-1] - return f':ref:`{identifier}`' - elif 'oneOf' in class_property_definition or 'anyOf' in class_property_definition: - kw = 'oneOf' - if 'anyOf' in class_property_definition: - kw = 'anyOf' - deprecated_types = class_property_definition.get('deprecated', list()) - resolved_deprecated = list() - resolved_active = list() + :param class_property_definition: type definition, "_Not Specified_" if undetermined + """ + if "type" in class_property_definition: + if class_property_definition["type"] == "array": + return resolve_type(class_property_definition["items"]) + return class_property_definition["type"] + elif "$ref" in class_property_definition: + ref = class_property_definition["$ref"] + identifier = ref.split("/")[-1] + return f":ref:`{identifier}`" + elif "$refCurie" in class_property_definition: + ref = class_property_definition["$refCurie"] + identifier = ref.split("/")[-1] + return f":ref:`{identifier}`" + elif "oneOf" in class_property_definition or "anyOf" in class_property_definition: + kw = "oneOf" + if "anyOf" in class_property_definition: + kw = "anyOf" + deprecated_types = class_property_definition.get("deprecated", []) + resolved_deprecated = [] + resolved_active = [] for property_type in class_property_definition[kw]: resolved_type = resolve_type(property_type) if property_type in deprecated_types: - resolved_deprecated.append(resolved_type + f' (deprecated)') + resolved_deprecated.append(resolved_type + " (deprecated)") else: resolved_active.append(resolved_type) - return ' | '.join(resolved_active + resolved_deprecated) + return " | ".join(resolved_active + resolved_deprecated) else: return "_Not Specified_" -def resolve_cardinality(class_property_name, class_property_attributes, class_definition): - """Resolve class property cardinality from yaml definition""" - if class_property_name in class_definition.get('required', []): - min_count = '1' - elif class_property_name in class_definition.get('heritableRequired', []): - min_count = '1' +def resolve_cardinality(class_property_name: str, class_property_attributes: dict, class_definition: dict) -> str: + """Resolves class property cardinality from YAML definition. + + :param class_property_name: class property name + :param class_property_attributes: class property attributes + :param class_definition: class definition + """ + if class_property_name in class_definition.get("required", []): + min_count = "1" + elif class_property_name in class_definition.get("heritableRequired", []): + min_count = "1" else: - min_count = '0' - if class_property_attributes.get('type') == 'array': - max_count = class_property_attributes.get('maxItems', 'm') - min_count = class_property_attributes.get('minItems', 0) + min_count = "0" + if class_property_attributes.get("type") == "array": + max_count = class_property_attributes.get("maxItems", "m") + min_count = class_property_attributes.get("minItems", 0) else: - max_count = '1' - return f'{min_count}..{max_count}' + max_count = "1" + return f"{min_count}..{max_count}" + +def get_ancestor_with_attributes(class_name: str, proc: YamlSchemaProcessor) -> str: + """Returns the ancestor class of the class name -def get_ancestor_with_attributes(class_name, proc): + :param class_name: class name + :param proc: yaml schema processor + """ if proc.class_is_passthrough(class_name): raw_def, proc = proc.get_local_or_inherited_class(class_name, raw=True) - ancestor = raw_def.get('inherits') + ancestor = raw_def.get("inherits") return get_ancestor_with_attributes(ancestor, proc) return class_name - def add_ga4gh_digest(class_definition: dict, f: TextIOWrapper) -> None: """Add GA4GH Digest table @@ -89,7 +105,8 @@ def add_ga4gh_digest(class_definition: dict, f: TextIOWrapper) -> None: """ ga4gh_digest = class_definition.get("ga4ghDigest") or {} if ga4gh_digest: - print(f""" + print( + f""" **GA4GH Digest** .. list-table:: @@ -102,7 +119,9 @@ def add_ga4gh_digest(class_definition: dict, f: TextIOWrapper) -> None: - Keys * - {ga4gh_digest.get("prefix") or "None"} - - {str(ga4gh_digest.get("keys") or [])}\n""", file=f) + - {str(ga4gh_digest.get("keys") or [])}\n""", + file=f, + ) def resolve_flags(class_property_attributes: dict) -> str: @@ -121,7 +140,7 @@ def resolve_flags(class_property_attributes: dict) -> str: flags += f""" .. raw:: html - {maturity_code}""" + {maturity_code}""" # noqa: E501 ordered = class_property_attributes.get("ordered") ordered_code = ORDERED_MAPPING.get(ordered, None) @@ -133,40 +152,43 @@ def resolve_flags(class_property_attributes: dict) -> str: .. raw:: html\n""" flags += f""" - {ordered_code}""" + {ordered_code}""" # noqa: E501 return flags -def main(proc_schema): +def main(proc_schema: YamlSchemaProcessor) -> None: + """ + Generates the .rst file for each of the classes in the schema + + :param proc_schema: schema processor object + """ for class_name, class_definition in proc_schema.defs.items(): - with open(proc_schema.def_fp / (class_name + '.rst'), "w") as f: - maturity = class_definition.get('maturity', '') - if maturity == 'draft': - print(""" -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - """, file=f) - elif maturity == 'trial use': - print(""" -.. note:: This data class is at a **trial use** maturity level and may change - in future releases. Maturity levels are described in the :ref:`maturity-model`. - - """, file=f) + with open(proc_schema.def_fp / (class_name + ".rst"), "w") as f: + maturity = class_definition.get("maturity", "") + if maturity == "draft": + template = env.get_template("maturity") + print( + template.render(info="warning", maturity_level="draft", modifier="significantly"), + file=f, + ) + elif maturity == "trial use": + print( + template.render(info="note", maturity_level="trial use", modifier=""), + file=f, + ) print("**Computational Definition**\n", file=f) - print(class_definition['description'], file=f) + print(class_definition["description"], file=f) if proc_schema.class_is_passthrough(class_name): continue - if 'heritableProperties' in class_definition: - p = 'heritableProperties' - elif 'properties' in class_definition: - p = 'properties' + if "heritableProperties" in class_definition: + p = "heritableProperties" + elif "properties" in class_definition: + p = "properties" elif proc_schema.class_is_primitive(class_name): continue else: raise ValueError(class_name, class_definition) - ancestor = proc_schema.raw_defs[class_name].get('inherits') + ancestor = proc_schema.raw_defs[class_name].get("inherits") if ancestor: ancestor = get_ancestor_with_attributes(ancestor, proc_schema) inheritance = f"Some {class_name} attributes are inherited from :ref:`{ancestor}`.\n" @@ -176,7 +198,8 @@ def main(proc_schema): add_ga4gh_digest(class_definition, f) print("\n**Information Model**", file=f) - print(f""" + print( + f""" {inheritance} .. list-table:: :class: clean-wrap @@ -188,14 +211,20 @@ def main(proc_schema): - Flags - Type - Limits - - Description""", file=f) + - Description""", + file=f, + ) for class_property_name, class_property_attributes in class_definition[p].items(): - print(f"""\ + print( + f"""\ * - {class_property_name} - {resolve_flags(class_property_attributes)} - {resolve_type(class_property_attributes)} - {resolve_cardinality(class_property_name, class_property_attributes, class_definition)} - - {class_property_attributes.get('description', '')}""", file=f) + - {class_property_attributes.get('description', '')}""", + file=f, + ) + def cli(): source_file = pathlib.Path(sys.argv[1]) @@ -205,5 +234,6 @@ def cli(): exit(0) main(p) + if __name__ == "__main__": - cli() \ No newline at end of file + cli() diff --git a/src/ga4gh/gks/metaschema/tools/source_proc.py b/src/ga4gh/gks/metaschema/tools/source_proc.py index 2bdde49..53c8763 100755 --- a/src/ga4gh/gks/metaschema/tools/source_proc.py +++ b/src/ga4gh/gks/metaschema/tools/source_proc.py @@ -1,53 +1,54 @@ #!/usr/bin/env python3 """convert yaml on stdin to json on stdout""" + import copy import json -import yaml import re -from pathlib import Path from collections import defaultdict +from pathlib import Path from urllib.parse import urlparse +import yaml + SCHEMA_DEF_KEYWORD_BY_VERSION = { "https://json-schema.org/draft-07/schema": "definitions", - "https://json-schema.org/draft/2020-12/schema": "$defs" + "https://json-schema.org/draft/2020-12/schema": "$defs", } -ref_re = re.compile(r':ref:`(.*?)(\s?<.*>)?`') -link_re = re.compile(r'`(.*?)\s?\<(.*)\>`_') -curie_re = re.compile(r'(\S+):(\S+)') -defs_re = re.compile(r'#/(\$defs|definitions)/.*') +ref_re = re.compile(r":ref:`(.*?)(\s?<.*>)?`") +link_re = re.compile(r"`(.*?)\s?\<(.*)\>`_") +curie_re = re.compile(r"(\S+):(\S+)") +defs_re = re.compile(r"#/(\$defs|definitions)/.*") class YamlSchemaProcessor: - def __init__(self, schema_fp, root_fp=None): self.schema_fp = Path(schema_fp) self.imported = root_fp is not None self.root_schema_fp = root_fp self.raw_schema = self.load_schema(schema_fp) - self.id = self.raw_schema['$id'] - self.yaml_key = self.raw_schema.get('yaml-target', 'yaml') - self.json_key = self.raw_schema.get('json-target', 'json') - self.defs_key = self.raw_schema.get('def-target', f'def') + self.id = self.raw_schema["$id"] + self.yaml_key = self.raw_schema.get("yaml-target", "yaml") + self.json_key = self.raw_schema.get("json-target", "json") + self.defs_key = self.raw_schema.get("def-target", "def") # schema_root_name = str(self.schema_fp.stem)[:-7] # removes "-source" self.yaml_fp = self.schema_fp.parent / self.yaml_key self.json_fp = self.schema_fp.parent / self.json_key self.def_fp = self.schema_fp.parent / self.defs_key # self.def_fp = self.schema_fp.parent / self.raw_schema.get('def-target', f'def/{schema_root_name}') - self.namespaces = self.raw_schema.get('namespaces', list()) - self.schema_def_keyword = SCHEMA_DEF_KEYWORD_BY_VERSION[self.raw_schema['$schema']] + self.namespaces = self.raw_schema.get("namespaces", []) + self.schema_def_keyword = SCHEMA_DEF_KEYWORD_BY_VERSION[self.raw_schema["$schema"]] self.raw_defs = self.raw_schema.get(self.schema_def_keyword, None) - self.imports = dict() + self.imports = {} self.import_dependencies() - self.strict = self.raw_schema.get('strict', False) - self.enforce_ordered = self.raw_schema.get('enforce_ordered', self.strict) + self.strict = self.raw_schema.get("strict", False) + self.enforce_ordered = self.raw_schema.get("enforce_ordered", self.strict) self._init_from_raw() def _init_from_raw(self): - self.has_children_urls = dict() - self.has_children = dict() + self.has_children_urls = {} + self.has_children = {} self.build_inheritance_dicts() self.has_protected_members = defaultdict(set) self.processed_schema = copy.deepcopy(self.raw_schema) @@ -62,31 +63,31 @@ def build_inheritance_dicts(self): # If an abstract class, register oneOf enumerations # If it inherits from a class, register the inheritance for cls, cls_def in self.raw_defs.items(): - cls_url = f'#/{self.schema_def_keyword}/{cls}' - if self.class_is_abstract(cls) and ('oneOf' in cls_def or '$ref' in cls_def): + cls_url = f"#/{self.schema_def_keyword}/{cls}" + if self.class_is_abstract(cls) and ("oneOf" in cls_def or "$ref" in cls_def): maps_to_urls = self.has_children_urls.get(cls_url, set()) maps_to = self.has_children.get(cls, set()) - if 'oneOf' in cls_def: - records = cls_def['oneOf'] + if "oneOf" in cls_def: + records = cls_def["oneOf"] else: - records = [{'$ref': cls_def['$ref']}] + records = [{"$ref": cls_def["$ref"]}] for record in records: if not isinstance(record, dict): continue assert len(record) == 1 - if '$ref' in record: - mapped = record['$ref'] - elif '$refCurie' in record: - mapped = self.resolve_curie(record['$refCurie']) + if "$ref" in record: + mapped = record["$ref"] + elif "$refCurie" in record: + mapped = self.resolve_curie(record["$refCurie"]) maps_to_urls.add(mapped) - maps_to.add(mapped.split('/')[-1]) + maps_to.add(mapped.split("/")[-1]) self.has_children_urls[cls_url] = maps_to_urls self.has_children[cls] = maps_to - if 'inherits' in cls_def: - target = cls_def['inherits'] - if ':' in target: + if "inherits" in cls_def: + target = cls_def["inherits"] + if ":" in target: continue # Ignore mappings from definitions in other sources - target_url = f'#/{self.schema_def_keyword}/{target}' + target_url = f"#/{self.schema_def_keyword}/{target}" maps_to_urls = self.has_children_urls.get(target_url, set()) maps_to = self.has_children.get(target, set()) maps_to_urls.add(cls_url) @@ -104,9 +105,9 @@ def get_all_descendants(self, cls): def merge_imported(self): # register all import namespaces and create process order # note: relying on max_recursion_depth errors and not checking for cyclic imports - self.import_locations = dict() - self.import_processors = dict() - self.import_process_order = list() + self.import_locations = {} + self.import_processors = {} + self.import_process_order = [] self._register_merge_import(self) # check that all classes defined in imports are unique @@ -117,9 +118,9 @@ def merge_imported(self): defined_classes.update(other.processed_classes) for key in self.import_process_order: - self.namespaces[key] = f'#/{self.schema_def_keyword}/' + self.namespaces[key] = f"#/{self.schema_def_keyword}/" other = self.import_processors[key] - other_ns = other.raw_schema.get('namespaces', list()) + other_ns = other.raw_schema.get("namespaces", []) if other_ns: for ns in other_ns: if ns not in self.import_process_order: @@ -129,18 +130,18 @@ def merge_imported(self): # revise all class.inherits attributes from CURIE to local defs for cls in defined_classes: - cls_inherits_prop = self.raw_defs[cls].get('inherits', '') + cls_inherits_prop = self.raw_defs[cls].get("inherits", "") if curie_re.match(cls_inherits_prop): - self.raw_defs[cls]['inherits'] = cls_inherits_prop.split(':')[1] + self.raw_defs[cls]["inherits"] = cls_inherits_prop.split(":")[1] # check all class.properties match expected definitions style self.raw_defs[cls] = self._check_local_defs_property(self.raw_defs[cls]) # clear imports - self.imports = dict() + self.imports = {} # update title - self.raw_schema['title'] = self.raw_schema['title'] + '-Merged-Imports' + self.raw_schema["title"] = self.raw_schema["title"] + "-Merged-Imports" # reprocess raw_schema self.raw_defs = self.raw_schema.get(self.schema_def_keyword, None) @@ -152,7 +153,7 @@ def _check_local_defs_property(self, obj): if isinstance(v, dict): obj[k] = self._check_local_defs_property(v) elif isinstance(v, list): - l = list() + l = [] # noqa: E741 for element in v: l.append(self._check_local_defs_property(element)) obj[k] = l @@ -184,8 +185,8 @@ def load_schema(schema_fp): return schema def import_dependencies(self): - for dependency in self.raw_schema.get('imports', list()): - fp = Path(self.raw_schema['imports'][dependency]) + for dependency in self.raw_schema.get("imports", []): + fp = Path(self.raw_schema["imports"][dependency]) if not fp.is_absolute(): base_path = self.schema_fp.parent fp = base_path.joinpath(fp) @@ -204,36 +205,38 @@ def process_schema(self): def class_is_abstract(self, schema_class): schema_class_def, _ = self.get_local_or_inherited_class(schema_class, raw=True) - return 'properties' not in schema_class_def and not self.class_is_primitive(schema_class) + return "properties" not in schema_class_def and not self.class_is_primitive(schema_class) def class_is_protected(self, schema_class): schema_class_def, _ = self.get_local_or_inherited_class(schema_class, raw=True) - return 'protectedClassOf' in schema_class_def + return "protectedClassOf" in schema_class_def def class_is_ga4gh_identifiable(self, schema_class): schema_class_def, _ = self.get_local_or_inherited_class(schema_class, raw=True) - return 'ga4ghDigest' in schema_class_def and 'prefix' in schema_class_def['ga4ghDigest'] + return "ga4ghDigest" in schema_class_def and "prefix" in schema_class_def["ga4ghDigest"] def class_is_passthrough(self, schema_class): if not self.class_is_abstract(schema_class): return False raw_class_definition, _ = self.get_local_or_inherited_class(schema_class, raw=True) - if 'heritableProperties' not in raw_class_definition \ - and 'properties' not in raw_class_definition \ - and raw_class_definition.get('inherits', False): + if ( + "heritableProperties" not in raw_class_definition + and "properties" not in raw_class_definition + and raw_class_definition.get("inherits", False) + ): return True return False def class_is_primitive(self, schema_class): schema_class_def, _ = self.get_local_or_inherited_class(schema_class, raw=True) - schema_class_type = schema_class_def.get('type', 'abstract') - if schema_class_type not in ['abstract', 'object']: + schema_class_type = schema_class_def.get("type", "abstract") + if schema_class_type not in ["abstract", "object"]: return True return False def class_is_subclass(self, schema_class, parent_class): - schema_class_fragment = f'#/{self.schema_def_keyword}/{schema_class}' - parent_class_fragment = f'#/{self.schema_def_keyword}/{parent_class}' + schema_class_fragment = f"#/{self.schema_def_keyword}/{schema_class}" + parent_class_fragment = f"#/{self.schema_def_keyword}/{parent_class}" children = self.concretize_class_ref(parent_class_fragment) return schema_class_fragment in children @@ -244,22 +247,22 @@ def js_yaml_dump(self, stream): yaml.dump(self.for_js, stream, sort_keys=False) def resolve_curie(self, curie): - namespace, identifier = curie.split(':') + namespace, identifier = curie.split(":") base_url = self.namespaces[namespace] return base_url + identifier def process_property_tree_refs(self, raw_node, processed_node): if isinstance(raw_node, dict): for k, v in raw_node.items(): - if k.endswith('Curie'): + if k.endswith("Curie"): new_k = k[:-5] processed_node[new_k] = self.resolve_curie(v) - del (processed_node[k]) - elif k == '$ref' and v.startswith('#/') and self.imported: + del processed_node[k] + elif k == "$ref" and v.startswith("#/") and self.imported: # TODO: fix below hard-coded name convention, yuck. rel_root = self.schema_fp.parent.relative_to(self.root_schema_fp.parent, walk_up=True) - schema_stem = self.schema_fp.stem.split('-')[0] - processed_node[k] = str(rel_root / f'{schema_stem}.json{v}') + schema_stem = self.schema_fp.stem.split("-")[0] + processed_node[k] = str(rel_root / f"{schema_stem}.json{v}") else: self.process_property_tree_refs(raw_node[k], processed_node[k]) elif isinstance(raw_node, list): @@ -268,7 +271,7 @@ def process_property_tree_refs(self, raw_node, processed_node): return def get_local_or_inherited_class(self, schema_class, raw=False): - components = schema_class.split(':') + components = schema_class.split(":") if len(components) == 1: inherited_class_name = components[0] if raw: @@ -281,11 +284,9 @@ def get_local_or_inherited_class(self, schema_class, raw=False): inherited_class_name = components[1] proc = self.imports[components[0]] if raw: - inherited_class = \ - proc.raw_schema[proc.schema_def_keyword][inherited_class_name] + inherited_class = proc.raw_schema[proc.schema_def_keyword][inherited_class_name] else: - inherited_class = \ - proc.processed_schema[proc.schema_def_keyword][inherited_class_name] + inherited_class = proc.processed_schema[proc.schema_def_keyword][inherited_class_name] else: raise ValueError return inherited_class, proc @@ -293,18 +294,18 @@ def get_local_or_inherited_class(self, schema_class, raw=False): def get_class_uri(self, schema_class, mode): abs_path = self.get_class_abs_path(schema_class, mode) parsed_url = urlparse(self.id) - return f'{parsed_url.scheme}://{parsed_url.netloc}{abs_path}' + return f"{parsed_url.scheme}://{parsed_url.netloc}{abs_path}" def get_class_abs_path(self, schema_class, mode): - if mode == 'json': + if mode == "json": export_key = self.json_key - elif mode == 'yaml': + elif mode == "yaml": export_key = self.yaml_key else: - raise ValueError('mode must be json or yaml') + raise ValueError("mode must be json or yaml") if self.class_is_protected(schema_class): - frag_containing_class = self.raw_defs[schema_class]['protectedClassOf'] - class_ref = f'{frag_containing_class}#/{self.schema_def_keyword}/{schema_class}' + frag_containing_class = self.raw_defs[schema_class]["protectedClassOf"] + class_ref = f"{frag_containing_class}#/{self.schema_def_keyword}/{schema_class}" else: class_ref = schema_class parsed_url = urlparse(self.id) @@ -320,11 +321,11 @@ def process_schema_class(self, schema_class): # Check GKS maturity model on all public, concrete classes if not (self.class_is_protected(schema_class) or self.class_is_abstract(schema_class)): - assert 'maturity' in processed_class_def, schema_class - assert processed_class_def['maturity'] in ['draft', 'trial use', 'normative', 'deprecated'], schema_class + assert "maturity" in processed_class_def, schema_class + assert processed_class_def["maturity"] in ["draft", "trial use", "normative", "deprecated"], schema_class if self.class_is_protected(schema_class): - containing_class = self.raw_defs[schema_class]['protectedClassOf'] + containing_class = self.raw_defs[schema_class]["protectedClassOf"] self.has_protected_members[containing_class].add(schema_class) if containing_class in self.has_children: for descendant in self.get_all_descendants(containing_class): @@ -333,52 +334,51 @@ def process_schema_class(self, schema_class): if self.class_is_primitive(schema_class): self.processed_classes.add(schema_class) return - inherited_properties = dict() + inherited_properties = {} inherited_required = set() - inherits = processed_class_def.get('inherits', None) + inherits = processed_class_def.get("inherits", None) if inherits is not None: inherited_class, proc = self.get_local_or_inherited_class(inherits) # extract properties / heritableProperties and required / heritableRequired from inherited_class # currently assumes inheritance from abstract classes only–will break otherwise - inherited_properties |= copy.deepcopy(inherited_class['heritableProperties']) - inherited_required |= set(inherited_class.get('heritableRequired', list())) + inherited_properties |= copy.deepcopy(inherited_class["heritableProperties"]) + inherited_required |= set(inherited_class.get("heritableRequired", [])) # inherit ga4ghDigest keys - if 'ga4ghDigest' in processed_class_def or 'ga4ghDigest' in inherited_class: - if 'ga4ghDigest' not in processed_class_def: - assert self.class_is_abstract(schema_class), \ - f'{schema_class} is missing a defined prefix.' - processed_class_def['ga4ghDigest'] = copy.deepcopy(inherited_class['ga4ghDigest']) - elif 'ga4ghDigest' not in inherited_class: + if "ga4ghDigest" in processed_class_def or "ga4ghDigest" in inherited_class: + if "ga4ghDigest" not in processed_class_def: + assert self.class_is_abstract(schema_class), f"{schema_class} is missing a defined prefix." + processed_class_def["ga4ghDigest"] = copy.deepcopy(inherited_class["ga4ghDigest"]) + elif "ga4ghDigest" not in inherited_class: pass else: - ga4ghDigest_keys = set(inherited_class['ga4ghDigest']['keys']) - ga4ghDigest_keys |= set(processed_class_def['ga4ghDigest'].get('keys', list())) - processed_class_def['ga4ghDigest']['keys'] = sorted(list(ga4ghDigest_keys)) + ga4ghDigest_keys = set(inherited_class["ga4ghDigest"]["keys"]) + ga4ghDigest_keys |= set(processed_class_def["ga4ghDigest"].get("keys", [])) + processed_class_def["ga4ghDigest"]["keys"] = sorted(ga4ghDigest_keys) if self.class_is_abstract(schema_class): - prop_k = 'heritableProperties' - req_k = 'heritableRequired' + prop_k = "heritableProperties" + req_k = "heritableRequired" else: - prop_k = 'properties' - req_k = 'required' - raw_class_properties = raw_class_def.get(prop_k, dict()) # Nested inheritance! - processed_class_properties = processed_class_def.get(prop_k, dict()) + prop_k = "properties" + req_k = "required" + raw_class_properties = raw_class_def.get(prop_k, {}) # Nested inheritance! + processed_class_properties = processed_class_def.get(prop_k, {}) processed_class_required = set(processed_class_def.get(req_k, [])) # Process refs self.process_property_tree_refs(raw_class_properties, processed_class_properties) for prop, prop_attribs in processed_class_properties.items(): # Mix in inherited properties - if 'extends' in prop_attribs: + if "extends" in prop_attribs: # assert that the extended property is in inherited properties - assert prop_attribs['extends'] in inherited_properties - extended_property = prop_attribs['extends'] + assert prop_attribs["extends"] in inherited_properties + extended_property = prop_attribs["extends"] # fix $ref and oneOf $ref inheritance if "$ref" in prop_attribs: - if 'oneOf' in inherited_properties[extended_property]: + if "oneOf" in inherited_properties[extended_property]: inherited_properties[extended_property].pop("oneOf") - elif 'anyOf' in inherited_properties[extended_property]: + elif "anyOf" in inherited_properties[extended_property]: inherited_properties[extended_property].pop("anyOf") if "oneOf" in prop_attribs or "anyOf" in prop_attribs: if "$ref" in inherited_properties[extended_property]: @@ -386,98 +386,103 @@ def process_schema_class(self, schema_class): # merge and clean up inherited properties processed_class_properties[prop] = inherited_properties[extended_property] processed_class_properties[prop].update(prop_attribs) - processed_class_properties[prop].pop('extends') + processed_class_properties[prop].pop("extends") inherited_properties.pop(extended_property) # update required field if extended_property in inherited_required: inherited_required.remove(extended_property) processed_class_required.add(prop) # Validate required array attribute for GKS specs - if self.enforce_ordered and prop_attribs.get('type', '') == 'array': - assert 'ordered' in prop_attribs, f'{schema_class}.{prop} missing ordered attribute.' - assert isinstance(prop_attribs['ordered'], bool) - if self.strict and prop_attribs.get('type', '') == 'object': - assert prop_attribs.get('additionalProperties', None) is not None, \ - f'"additionalProperties" expected to be defined in {schema_class}.{prop}' + if self.enforce_ordered and prop_attribs.get("type", "") == "array": + assert "ordered" in prop_attribs, f"{schema_class}.{prop} missing ordered attribute." + assert isinstance(prop_attribs["ordered"], bool) + if self.strict and prop_attribs.get("type", "") == "object": + assert ( + prop_attribs.get("additionalProperties", None) is not None + ), f'"additionalProperties" expected to be defined in {schema_class}.{prop}' # Validate class structures for GKS specs if self.class_is_abstract(schema_class): - assert 'type' not in processed_class_def, schema_class + assert "type" not in processed_class_def, schema_class else: - assert 'type' in processed_class_def, schema_class - assert processed_class_def['type'] == 'object', schema_class + assert "type" in processed_class_def, schema_class + assert processed_class_def["type"] == "object", schema_class if self.class_is_ga4gh_identifiable(schema_class): - assert isinstance(processed_class_def['ga4ghDigest']['prefix'], str), schema_class - assert processed_class_def['ga4ghDigest']['prefix'] != '', schema_class - l = len(processed_class_def['ga4ghDigest']['keys']) - assert l >= 2, \ - f'GA4GH identifiable objects are expected to be defined by at least 2 properties, {schema_class} has {l}.' - assert 'type' in processed_class_def['ga4ghDigest']['keys'], \ - f'GA4GH identifiable objects are expected to include the class type but not included for {schema_class}.' - # Two properites should be `type` and at least one other field + assert isinstance(processed_class_def["ga4ghDigest"]["prefix"], str), schema_class + assert processed_class_def["ga4ghDigest"]["prefix"] != "", schema_class + l = len(processed_class_def["ga4ghDigest"]["keys"]) # noqa: E741 + assert ( + l >= 2 + ), f"GA4GH identifiable objects are expected to be defined by at least 2 properties, {schema_class} has {l}." # noqa: E501 + assert ( + "type" in processed_class_def["ga4ghDigest"]["keys"] + ), f"GA4GH identifiable objects are expected to include the class type but not included for {schema_class}." # noqa: E501 + # Two properites should be `type` and at least one other field processed_class_def[prop_k] = inherited_properties | processed_class_properties - processed_class_def[req_k] = sorted(list(inherited_required | processed_class_required)) + processed_class_def[req_k] = sorted(inherited_required | processed_class_required) if self.strict and not self.class_is_abstract(schema_class): - processed_class_def['additionalProperties'] = False + processed_class_def["additionalProperties"] = False self.processed_classes.add(schema_class) @staticmethod def _scrub_rst_markup(string): - string = ref_re.sub(r'\g<1>', string) - string = link_re.sub(r'[\g<1>](\g<2>)', string) - string = string.replace('\n', ' ') + string = ref_re.sub(r"\g<1>", string) + string = link_re.sub(r"[\g<1>](\g<2>)", string) + string = string.replace("\n", " ") return string def clean_for_js(self): - self.for_js.pop('namespaces', None) - self.for_js.pop('strict', None) - self.for_js.pop('enforce_ordered', None) - self.for_js.pop('imports', None) - abstract_class_removals = list() - for schema_class, schema_definition in self.for_js.get(self.schema_def_keyword, dict()).items(): - schema_definition.pop('inherits', None) - schema_definition.pop('protectedClassOf', None) + self.for_js.pop("namespaces", None) + self.for_js.pop("strict", None) + self.for_js.pop("enforce_ordered", None) + self.for_js.pop("imports", None) + abstract_class_removals = [] + for schema_class, schema_definition in self.for_js.get(self.schema_def_keyword, {}).items(): + schema_definition.pop("inherits", None) + schema_definition.pop("protectedClassOf", None) if self.class_is_abstract(schema_class): - schema_definition.pop('heritableProperties', None) - schema_definition.pop('heritableRequired', None) - schema_definition.pop('ga4ghDigest', None) - schema_definition.pop('header_level', None) + schema_definition.pop("heritableProperties", None) + schema_definition.pop("heritableRequired", None) + schema_definition.pop("ga4ghDigest", None) + schema_definition.pop("header_level", None) self.concretize_js_object(schema_definition) - if 'oneOf' not in schema_definition and 'allOf' not in schema_definition and '$ref' not in schema_definition: + if ( + "oneOf" not in schema_definition + and "allOf" not in schema_definition + and "$ref" not in schema_definition + ): abstract_class_removals.append(schema_class) - if 'description' in schema_definition: - schema_definition['description'] = \ - self._scrub_rst_markup(schema_definition['description']) - if 'properties' in schema_definition: - for p, p_def in schema_definition['properties'].items(): - if 'description' in p_def: - p_def['description'] = \ - self._scrub_rst_markup(p_def['description']) + if "description" in schema_definition: + schema_definition["description"] = self._scrub_rst_markup(schema_definition["description"]) + if "properties" in schema_definition: + for p, p_def in schema_definition["properties"].items(): + if "description" in p_def: + p_def["description"] = self._scrub_rst_markup(p_def["description"]) self.concretize_js_object(p_def) for cls in abstract_class_removals: self.for_js[self.schema_def_keyword].pop(cls) def concretize_js_object(self, js_obj): - if '$ref' in js_obj: - descendents = self.concretize_class_ref(js_obj['$ref']) - if descendents != {js_obj['$ref']}: - js_obj.pop('$ref') - js_obj['oneOf'] = self._build_ref_list(descendents) - elif 'oneOf' in js_obj: + if "$ref" in js_obj: + descendents = self.concretize_class_ref(js_obj["$ref"]) + if descendents != {js_obj["$ref"]}: + js_obj.pop("$ref") + js_obj["oneOf"] = self._build_ref_list(descendents) + elif "oneOf" in js_obj: # do the same check for each member - ref_list = js_obj['oneOf'] + ref_list = js_obj["oneOf"] descendents = set() - inlined = list() + inlined = [] for ref in ref_list: - if '$ref' not in ref: + if "$ref" not in ref: inlined.append(ref) else: - descendents.update(self.concretize_class_ref(ref['$ref'])) - js_obj['oneOf'] = self._build_ref_list(descendents) + inlined - elif js_obj.get('type', '') == 'array': - self.concretize_js_object(js_obj['items']) + descendents.update(self.concretize_class_ref(ref["$ref"])) + js_obj["oneOf"] = self._build_ref_list(descendents) + inlined + elif js_obj.get("type", "") == "array": + self.concretize_js_object(js_obj["items"]) def concretize_class_ref(self, cls_url): children = self.has_children_urls.get(cls_url, None) @@ -490,4 +495,4 @@ def concretize_class_ref(self, cls_url): @staticmethod def _build_ref_list(cls_urls): - return [{'$ref': url} for url in sorted(cls_urls)] + return [{"$ref": url} for url in sorted(cls_urls)] diff --git a/src/templates/maturity b/src/templates/maturity new file mode 100644 index 0000000..fcb5f32 --- /dev/null +++ b/src/templates/maturity @@ -0,0 +1,4 @@ +.. {{ info }}:: This data class is at a **{{ maturity_level }}** maturity level and may change + {{ modifier }} in future releases. Maturity levels are described in + the :ref:`maturity-model`. + diff --git a/tests/data/gnomAD/json/GnomadCAF b/tests/data/gnomAD/json/GnomadCAF deleted file mode 100644 index 313a0d4..0000000 --- a/tests/data/gnomAD/json/GnomadCAF +++ /dev/null @@ -1,116 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/gk-pilot/main/gnomAD/json/GnomadCAF", - "title": "GnomadCAF", - "type": "object", - "$defs": { - "GnomadCafProperties": { - "description": "Additional properties specific to the gnomAD CAF model.", - "protectedClassOf": "GnomadCAF", - "type": "object", - "maturity": "draft", - "properties": { - "ancillaryResults": { - "type": "object", - "properties": { - "grpMaxFAF95": { - "$ref": "#/$defs/GrpMaxFAF95" - }, - "jointGrpMaxFAF95": { - "description": "The Group Max Filtering Allele Frequency (95% confidence interval) calculated jointly from genome and exome data.", - "$ref": "#/$defs/GrpMaxFAF95" - }, - "homozygotes": { - "type": "integer" - }, - "hemizygotes": { - "type": "integer" - } - }, - "additionalProperties": false - }, - "qualityMeasures": { - "type": "object", - "properties": { - "meanDepth": { - "description": "The mean depth of coverage.", - "type": "number" - }, - "fractionCoverage20x": { - "description": "The fraction of individuals with at least 20x coverage.", - "type": "number" - }, - "qcFilters": { - "type": "array", - "items": { - "type": "string" - } - }, - "monoallelic": { - "description": "All samples are homozygous alternate for the variant.", - "type": "boolean" - }, - "lowComplexityRegion": { - "description": "This flag indicates the variant is found in a low complexity region. These regions were identified with the symmetric DUST algorithm at a score threshold of 30.", - "type": "boolean" - }, - "lowConfidenceLossOfFunctionError": { - "description": "Low confidence in predicted Loss of Function (pLoF), where variant is determined by LOFTEE to be unlikely loss of function for a transcript.", - "type": "boolean" - }, - "lossOfFunctionWarning": { - "description": "A warning provided by LOFTEE to use caution when interpreting the transcript or variant.", - "type": "boolean" - }, - "noncodingTranscriptError": { - "description": "Marked in a putative loss of function category by VEP (essential splice, stop-gained, or frameshift) but appears on a non-protein-coding transcript.", - "type": "boolean" - }, - "heterozygousSkewedAlleleCount": { - "description": "The count of individuals called as heterozygous for this variant with a skewed allele balance, indicating some of these individuals may be miscalled homozygous alternative allele.", - "type": "integer" - } - }, - "additionalProperties": false - } - }, - "required": [] - }, - "GrpMaxFAF95": { - "description": "The group maximum filtering allele frequency at 95% CI", - "protectedClassOf": "GnomadCAF", - "type": "object", - "maturity": "draft", - "properties": { - "frequency": { - "type": "number" - }, - "confidenceInterval": { - "type": "number", - "const": 0.95, - "default": 0.95 - }, - "groupId": { - "type": "string", - "description": "The genetic ancestry group from which the max frequency was calculated." - } - }, - "required": [ - "confidenceInterval", - "frequency", - "groupId" - ], - "additionalProperties": false - } - }, - "maturity": "draft", - "description": "The GA4GH Cohort Allele Frequency model, with additional schema properties specific to the gnomAD resource. ", - "allOf": [ - { - "$ref": "/ga4gh/schema/va-spec/1.x/profiles/caf/json/CohortAlleleFrequency" - }, - { - "$ref": "#/$defs/GnomadCafProperties" - } - ] -} \ No newline at end of file diff --git a/tests/data/vrs/def/Adjacency.rst b/tests/data/vrs/def/Adjacency.rst deleted file mode 100644 index 35bd98b..0000000 --- a/tests/data/vrs/def/Adjacency.rst +++ /dev/null @@ -1,94 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -The `Adjacency` class can represent either the termination of a sequence or the adjoining of the end of a sequence with the beginning of an adjacent sequence, potentially with an intervening linker sequence. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - AJ - - ['adjoinedSequences', 'linker', 'type'] - - -**Information Model** - -Some Adjacency attributes are inherited from :ref:`Variation`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 0..1 - - MUST be "Adjacency". - * - digest - - - - string - - 0..1 - - A sha512t24u digest created using the VRS Computed Identifier algorithm. - * - expressions - - - .. raw:: html - - UL - - :ref:`Expression` - - 0..m - - - * - adjoinedSequences - - - .. raw:: html - - OL - - :ref:`IRI` | :ref:`Location` - - 1..2 - - The terminal sequence or pair of adjoined sequences that defines in the adjacency. - * - linker - - - - :ref:`SequenceExpression` - - 0..1 - - The sequence found between adjoined sequences. diff --git a/tests/data/vrs/def/Allele.rst b/tests/data/vrs/def/Allele.rst deleted file mode 100644 index 1a7add7..0000000 --- a/tests/data/vrs/def/Allele.rst +++ /dev/null @@ -1,91 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -The state of a molecule at a :ref:`Location`. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - VA - - ['location', 'state', 'type'] - - -**Information Model** - -Some Allele attributes are inherited from :ref:`Variation`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 0..1 - - MUST be "Allele" - * - digest - - - - string - - 0..1 - - A sha512t24u digest created using the VRS Computed Identifier algorithm. - * - expressions - - - .. raw:: html - - UL - - :ref:`Expression` - - 0..m - - - * - location - - - - :ref:`IRI` | :ref:`Location` - - 1..1 - - The location of the Allele - * - state - - - - :ref:`SequenceExpression` - - 1..1 - - An expression of the sequence state diff --git a/tests/data/vrs/def/CopyNumber.rst b/tests/data/vrs/def/CopyNumber.rst deleted file mode 100644 index 9af40ee..0000000 --- a/tests/data/vrs/def/CopyNumber.rst +++ /dev/null @@ -1,80 +0,0 @@ -**Computational Definition** - -A measure of the copies of a :ref:`Location` within a system (e.g. genome, cell, etc.) - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - None - - ['location', 'type'] - - -**Information Model** - -Some CopyNumber attributes are inherited from :ref:`Variation`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 0..1 - - - * - digest - - - - string - - 0..1 - - A sha512t24u digest created using the VRS Computed Identifier algorithm. - * - expressions - - - .. raw:: html - - UL - - :ref:`Expression` - - 0..m - - - * - location - - - - :ref:`IRI` | :ref:`Location` - - 1..1 - - A location for which the number of systemic copies is described. diff --git a/tests/data/vrs/def/CopyNumberChange.rst b/tests/data/vrs/def/CopyNumberChange.rst deleted file mode 100644 index 19e9abe..0000000 --- a/tests/data/vrs/def/CopyNumberChange.rst +++ /dev/null @@ -1,91 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -An assessment of the copy number of a :ref:`Location` or a :ref:`Gene` within a system (e.g. genome, cell, etc.) relative to a baseline ploidy. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - CX - - ['copyChange', 'location', 'type'] - - -**Information Model** - -Some CopyNumberChange attributes are inherited from :ref:`CopyNumber`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 0..1 - - MUST be "CopyNumberChange" - * - digest - - - - string - - 0..1 - - A sha512t24u digest created using the VRS Computed Identifier algorithm. - * - expressions - - - .. raw:: html - - UL - - :ref:`Expression` - - 0..m - - - * - location - - - - :ref:`IRI` | :ref:`Location` - - 1..1 - - A location for which the number of systemic copies is described. - * - copyChange - - - - string - - 1..1 - - MUST be one of "efo:0030069" (complete genomic loss), "efo:0020073" (high-level loss), "efo:0030068" (low-level loss), "efo:0030067" (loss), "efo:0030064" (regional base ploidy), "efo:0030070" (gain), "efo:0030071" (low-level gain), "efo:0030072" (high-level gain). diff --git a/tests/data/vrs/def/CopyNumberCount.rst b/tests/data/vrs/def/CopyNumberCount.rst deleted file mode 100644 index c1bc418..0000000 --- a/tests/data/vrs/def/CopyNumberCount.rst +++ /dev/null @@ -1,91 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -The absolute count of discrete copies of a :ref:`Location` or :ref:`Gene`, within a system (e.g. genome, cell, etc.). - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - CN - - ['copies', 'location', 'type'] - - -**Information Model** - -Some CopyNumberCount attributes are inherited from :ref:`CopyNumber`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 0..1 - - MUST be "CopyNumberCount" - * - digest - - - - string - - 0..1 - - A sha512t24u digest created using the VRS Computed Identifier algorithm. - * - expressions - - - .. raw:: html - - UL - - :ref:`Expression` - - 0..m - - - * - location - - - - :ref:`IRI` | :ref:`Location` - - 1..1 - - A location for which the number of systemic copies is described. - * - copies - - - - integer | :ref:`Range` - - 1..1 - - The integral number of copies of the subject in a system diff --git a/tests/data/vrs/def/Expression.rst b/tests/data/vrs/def/Expression.rst deleted file mode 100644 index 00b7914..0000000 --- a/tests/data/vrs/def/Expression.rst +++ /dev/null @@ -1,39 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -Representation of a variation by a specified nomenclature or syntax for a Variation object. Common examples of expressions for the description of molecular variation include the HGVS and ISCN nomenclatures. - -**Information Model** - - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - syntax - - - - string - - 1..1 - - - * - value - - - - string - - 1..1 - - - * - syntax_version - - - - string - - 0..1 - - diff --git a/tests/data/vrs/def/Ga4ghIdentifiableObject.rst b/tests/data/vrs/def/Ga4ghIdentifiableObject.rst deleted file mode 100644 index 5998f87..0000000 --- a/tests/data/vrs/def/Ga4ghIdentifiableObject.rst +++ /dev/null @@ -1,67 +0,0 @@ -**Computational Definition** - -A contextual value object for which a GA4GH computed identifier can be created. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - None - - ['type'] - - -**Information Model** - -Some Ga4ghIdentifiableObject attributes are inherited from :ref:`gks.core:Entity`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 0..1 - - - * - digest - - - - string - - 0..1 - - A sha512t24u digest created using the VRS Computed Identifier algorithm. diff --git a/tests/data/vrs/def/Haplotype.rst b/tests/data/vrs/def/Haplotype.rst deleted file mode 100644 index 4477906..0000000 --- a/tests/data/vrs/def/Haplotype.rst +++ /dev/null @@ -1,89 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -An ordered set of co-occurring :ref:`variants ` on the same molecule. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - HT - - ['members', 'type'] - - -**Information Model** - -Some Haplotype attributes are inherited from :ref:`Variation`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 0..1 - - MUST be "Haplotype" - * - digest - - - - string - - 0..1 - - A sha512t24u digest created using the VRS Computed Identifier algorithm. - * - expressions - - - .. raw:: html - - UL - - :ref:`Expression` - - 0..m - - - * - members - - - .. raw:: html - - OL - - :ref:`Adjacency` | :ref:`Allele` | :ref:`IRI` - - 2..m - - A list of :ref:`Alleles ` and :ref:`Adjacencies ` that comprise a Haplotype. Members must share the same reference sequence as adjacent members. Alleles should not have overlapping or adjacent coordinates with neighboring Alleles. Neighboring alleles should be ordered by ascending coordinates, unless represented on a DNA inversion (following an Adjacency with end-defined adjoinedSequences), in which case they should be ordered in descending coordinates. Sequence references MUST be consistent for all members between and including the end of one Adjacency and the beginning of another. diff --git a/tests/data/vrs/def/LengthExpression.rst b/tests/data/vrs/def/LengthExpression.rst deleted file mode 100644 index fc92452..0000000 --- a/tests/data/vrs/def/LengthExpression.rst +++ /dev/null @@ -1,73 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -A sequence expressed only by its length. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - None - - ['length', 'type'] - - -**Information Model** - -Some LengthExpression attributes are inherited from :ref:`SequenceExpression`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 1..1 - - MUST be "LengthExpression" - * - length - - - - :ref:`Range` | integer - - 0..1 - - diff --git a/tests/data/vrs/def/LiteralSequenceExpression.rst b/tests/data/vrs/def/LiteralSequenceExpression.rst deleted file mode 100644 index fd20bb8..0000000 --- a/tests/data/vrs/def/LiteralSequenceExpression.rst +++ /dev/null @@ -1,73 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -An explicit expression of a Sequence. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - None - - ['sequence', 'type'] - - -**Information Model** - -Some LiteralSequenceExpression attributes are inherited from :ref:`SequenceExpression`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 1..1 - - MUST be "LiteralSequenceExpression" - * - sequence - - - - :ref:`SequenceString` - - 1..1 - - the literal sequence diff --git a/tests/data/vrs/def/Location.rst b/tests/data/vrs/def/Location.rst deleted file mode 100644 index 0d8a4a7..0000000 --- a/tests/data/vrs/def/Location.rst +++ /dev/null @@ -1,3 +0,0 @@ -**Computational Definition** - -A contiguous segment of a biological sequence. diff --git a/tests/data/vrs/def/MolecularVariation.rst b/tests/data/vrs/def/MolecularVariation.rst deleted file mode 100644 index cfebddc..0000000 --- a/tests/data/vrs/def/MolecularVariation.rst +++ /dev/null @@ -1,3 +0,0 @@ -**Computational Definition** - -A :ref:`variation` on a contiguous molecule. diff --git a/tests/data/vrs/def/Range.rst b/tests/data/vrs/def/Range.rst deleted file mode 100644 index 311084c..0000000 --- a/tests/data/vrs/def/Range.rst +++ /dev/null @@ -1,9 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -An inclusive range of values bounded by one or more integers. diff --git a/tests/data/vrs/def/ReferenceLengthExpression.rst b/tests/data/vrs/def/ReferenceLengthExpression.rst deleted file mode 100644 index 6143ceb..0000000 --- a/tests/data/vrs/def/ReferenceLengthExpression.rst +++ /dev/null @@ -1,83 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -An expression of a length of a sequence from a repeating reference. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - None - - ['length', 'repeatSubunitLength', 'type'] - - -**Information Model** - -Some ReferenceLengthExpression attributes are inherited from :ref:`SequenceExpression`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 1..1 - - MUST be "ReferenceLengthExpression" - * - length - - - - integer | :ref:`Range` - - 1..1 - - The number of residues in the expressed sequence. - * - sequence - - - - :ref:`SequenceString` - - 0..1 - - the :ref:`Sequence` encoded by the Reference Length Expression. - * - repeatSubunitLength - - - - integer - - 1..1 - - The number of residues in the repeat subunit. diff --git a/tests/data/vrs/def/Residue.rst b/tests/data/vrs/def/Residue.rst deleted file mode 100644 index e1a8545..0000000 --- a/tests/data/vrs/def/Residue.rst +++ /dev/null @@ -1,9 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -A character representing a specific residue (i.e., molecular species) or groupings of these ("ambiguity codes"), using `one-letter IUPAC abbreviations `_ for nucleic acids and amino acids. diff --git a/tests/data/vrs/def/SequenceExpression.rst b/tests/data/vrs/def/SequenceExpression.rst deleted file mode 100644 index 32b74d1..0000000 --- a/tests/data/vrs/def/SequenceExpression.rst +++ /dev/null @@ -1,62 +0,0 @@ -**Computational Definition** - -An expression describing a :ref:`Sequence`. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - None - - ['type'] - - -**Information Model** - -Some SequenceExpression attributes are inherited from :ref:`gks.core:Entity`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 1..1 - - The SequenceExpression class type. MUST match child class type. diff --git a/tests/data/vrs/def/SequenceLocation.rst b/tests/data/vrs/def/SequenceLocation.rst deleted file mode 100644 index fbda247..0000000 --- a/tests/data/vrs/def/SequenceLocation.rst +++ /dev/null @@ -1,88 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -A :ref:`Location` defined by an interval on a referenced :ref:`Sequence`. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - SL - - ['end', 'sequenceReference', 'start', 'type'] - - -**Information Model** - -Some SequenceLocation attributes are inherited from :ref:`Ga4ghIdentifiableObject`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 0..1 - - MUST be "SequenceLocation" - * - digest - - - - string - - 0..1 - - A sha512t24u digest created using the VRS Computed Identifier algorithm. - * - sequenceReference - - - - :ref:`IRI` | :ref:`SequenceReference` - - 0..1 - - A :ref:`SequenceReference`. - * - start - - - - integer | :ref:`Range` - - 0..1 - - The start coordinate or range of the SequenceLocation. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range less than or equal to the value of `end`. - * - end - - - - integer | :ref:`Range` - - 0..1 - - The end coordinate or range of the SequenceLocation. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range greater than or equal to the value of `start`. diff --git a/tests/data/vrs/def/SequenceReference.rst b/tests/data/vrs/def/SequenceReference.rst deleted file mode 100644 index 6fb9be2..0000000 --- a/tests/data/vrs/def/SequenceReference.rst +++ /dev/null @@ -1,78 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -A sequence of nucleic or amino acid character codes. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - None - - [] - - -**Information Model** - -Some SequenceReference attributes are inherited from :ref:`gks.core:Entity`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 0..1 - - - * - refgetAccession - - - - string - - 1..1 - - A `GA4GH RefGet ` identifier for the referenced sequence, using the sha512t24u digest. - * - residueAlphabet - - - - string - - 0..1 - - The interpretation of the character codes referred to by the refget accession, where "aa" specifies an amino acid character set, and "na" specifies a nucleic acid character set. diff --git a/tests/data/vrs/def/SequenceString.rst b/tests/data/vrs/def/SequenceString.rst deleted file mode 100644 index ee09e11..0000000 --- a/tests/data/vrs/def/SequenceString.rst +++ /dev/null @@ -1,9 +0,0 @@ - -.. warning:: This data class is at a **draft** maturity level and may change - significantly in future releases. Maturity levels are described in - the :ref:`maturity-model`. - - -**Computational Definition** - -A character string of :ref:`Residues ` that represents a biological sequence using the conventional sequence order (5’-to-3’ for nucleic acid sequences, and amino-to-carboxyl for amino acid sequences). IUPAC ambiguity codes are permitted in Sequence Strings. diff --git a/tests/data/vrs/def/SystemicVariation.rst b/tests/data/vrs/def/SystemicVariation.rst deleted file mode 100644 index e5cb71d..0000000 --- a/tests/data/vrs/def/SystemicVariation.rst +++ /dev/null @@ -1,3 +0,0 @@ -**Computational Definition** - -A Variation of multiple molecules in the context of a system, e.g. a genome, sample, or homologous chromosomes. diff --git a/tests/data/vrs/def/ValueObject.rst b/tests/data/vrs/def/ValueObject.rst deleted file mode 100644 index 0f1de8b..0000000 --- a/tests/data/vrs/def/ValueObject.rst +++ /dev/null @@ -1,3 +0,0 @@ -**Computational Definition** - -A contextual value whose equality is based on value, not identity. See https://en.wikipedia.org/wiki/Value_object for more on Value Objects. diff --git a/tests/data/vrs/def/Variation.rst b/tests/data/vrs/def/Variation.rst deleted file mode 100644 index 1793ae1..0000000 --- a/tests/data/vrs/def/Variation.rst +++ /dev/null @@ -1,75 +0,0 @@ -**Computational Definition** - -A representation of the state of one or more biomolecules. - -**GA4GH Digest** - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Prefix - - Keys - - * - None - - ['type'] - - -**Information Model** - -Some Variation attributes are inherited from :ref:`Ga4ghIdentifiableObject`. - -.. list-table:: - :class: clean-wrap - :header-rows: 1 - :align: left - :widths: auto - - * - Field - - Flags - - Type - - Limits - - Description - * - id - - - - string - - 0..1 - - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). - * - label - - - - string - - 0..1 - - A primary label for the entity. - * - description - - - - string - - 0..1 - - A free-text description of the entity. - * - extensions - - - .. raw:: html - - OL - - :ref:`Extension` - - 0..m - - - * - type - - - - string - - 0..1 - - - * - digest - - - - string - - 0..1 - - A sha512t24u digest created using the VRS Computed Identifier algorithm. - * - expressions - - - .. raw:: html - - UL - - :ref:`Expression` - - 0..m - - diff --git a/tests/data/vrs/json/Adjacency b/tests/data/vrs/json/Adjacency deleted file mode 100644 index dc9bf64..0000000 --- a/tests/data/vrs/json/Adjacency +++ /dev/null @@ -1,91 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/Adjacency", - "title": "Adjacency", - "type": "object", - "maturity": "draft", - "ga4ghDigest": { - "prefix": "AJ", - "keys": [ - "adjoinedSequences", - "linker", - "type" - ] - }, - "description": "The `Adjacency` class can represent either the termination of a sequence or the adjoining of the end of a sequence with the beginning of an adjacent sequence, potentially with an intervening linker sequence.", - "properties": { - "id": { - "type": "string", - "description": "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)." - }, - "label": { - "type": "string", - "description": "A primary label for the entity." - }, - "description": { - "type": "string", - "description": "A free-text description of the entity." - }, - "extensions": { - "type": "array", - "ordered": true, - "items": { - "$ref": "/ga4gh/schema/gks-common/1.x/json/Extension" - } - }, - "type": { - "type": "string", - "const": "Adjacency", - "default": "Adjacency", - "description": "MUST be \"Adjacency\"." - }, - "digest": { - "description": "A sha512t24u digest created using the VRS Computed Identifier algorithm.", - "type": "string", - "pattern": "^[0-9A-Za-z_\\-]{32}$" - }, - "expressions": { - "type": "array", - "ordered": false, - "items": { - "$ref": "/ga4gh/schema/vrs/2.x/json/Expression" - } - }, - "adjoinedSequences": { - "type": "array", - "uniqueItems": false, - "ordered": true, - "items": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/SequenceLocation" - }, - { - "$ref": "/ga4gh/schema/gks-common/1.x/json/IRI" - } - ] - }, - "description": "The terminal sequence or pair of adjoined sequences that defines in the adjacency.", - "minItems": 1, - "maxItems": 2 - }, - "linker": { - "description": "The sequence found between adjoined sequences.", - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/LengthExpression" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/LiteralSequenceExpression" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/ReferenceLengthExpression" - } - ] - } - }, - "required": [ - "adjoinedSequences" - ], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/Allele b/tests/data/vrs/json/Allele deleted file mode 100644 index b25d4ad..0000000 --- a/tests/data/vrs/json/Allele +++ /dev/null @@ -1,85 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/Allele", - "title": "Allele", - "type": "object", - "maturity": "draft", - "ga4ghDigest": { - "prefix": "VA", - "keys": [ - "location", - "state", - "type" - ] - }, - "description": "The state of a molecule at a Location.", - "properties": { - "id": { - "type": "string", - "description": "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)." - }, - "label": { - "type": "string", - "description": "A primary label for the entity." - }, - "description": { - "type": "string", - "description": "A free-text description of the entity." - }, - "extensions": { - "type": "array", - "ordered": true, - "items": { - "$ref": "/ga4gh/schema/gks-common/1.x/json/Extension" - } - }, - "type": { - "type": "string", - "const": "Allele", - "default": "Allele", - "description": "MUST be \"Allele\"" - }, - "digest": { - "description": "A sha512t24u digest created using the VRS Computed Identifier algorithm.", - "type": "string", - "pattern": "^[0-9A-Za-z_\\-]{32}$" - }, - "expressions": { - "type": "array", - "ordered": false, - "items": { - "$ref": "/ga4gh/schema/vrs/2.x/json/Expression" - } - }, - "location": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/SequenceLocation" - }, - { - "$ref": "/ga4gh/schema/gks-common/1.x/json/IRI" - } - ], - "description": "The location of the Allele" - }, - "state": { - "description": "An expression of the sequence state", - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/LengthExpression" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/LiteralSequenceExpression" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/ReferenceLengthExpression" - } - ] - } - }, - "required": [ - "location", - "state" - ], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/CopyNumberChange b/tests/data/vrs/json/CopyNumberChange deleted file mode 100644 index f5d3465..0000000 --- a/tests/data/vrs/json/CopyNumberChange +++ /dev/null @@ -1,85 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/CopyNumberChange", - "title": "CopyNumberChange", - "type": "object", - "maturity": "draft", - "ga4ghDigest": { - "keys": [ - "copyChange", - "location", - "type" - ], - "prefix": "CX" - }, - "description": "An assessment of the copy number of a Location or a Gene within a system (e.g. genome, cell, etc.) relative to a baseline ploidy.", - "properties": { - "id": { - "type": "string", - "description": "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)." - }, - "label": { - "type": "string", - "description": "A primary label for the entity." - }, - "description": { - "type": "string", - "description": "A free-text description of the entity." - }, - "extensions": { - "type": "array", - "ordered": true, - "items": { - "$ref": "/ga4gh/schema/gks-common/1.x/json/Extension" - } - }, - "type": { - "type": "string", - "const": "CopyNumberChange", - "default": "CopyNumberChange", - "description": "MUST be \"CopyNumberChange\"" - }, - "digest": { - "description": "A sha512t24u digest created using the VRS Computed Identifier algorithm.", - "type": "string", - "pattern": "^[0-9A-Za-z_\\-]{32}$" - }, - "expressions": { - "type": "array", - "ordered": false, - "items": { - "$ref": "/ga4gh/schema/vrs/2.x/json/Expression" - } - }, - "location": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/SequenceLocation" - }, - { - "$ref": "/ga4gh/schema/gks-common/1.x/json/IRI" - } - ], - "description": "A location for which the number of systemic copies is described." - }, - "copyChange": { - "type": "string", - "enum": [ - "efo:0030069", - "efo:0020073", - "efo:0030068", - "efo:0030067", - "efo:0030064", - "efo:0030070", - "efo:0030071", - "efo:0030072" - ], - "description": "MUST be one of \"efo:0030069\" (complete genomic loss), \"efo:0020073\" (high-level loss), \"efo:0030068\" (low-level loss), \"efo:0030067\" (loss), \"efo:0030064\" (regional base ploidy), \"efo:0030070\" (gain), \"efo:0030071\" (low-level gain), \"efo:0030072\" (high-level gain)." - } - }, - "required": [ - "copyChange", - "location" - ], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/CopyNumberCount b/tests/data/vrs/json/CopyNumberCount deleted file mode 100644 index 6bb8b9e..0000000 --- a/tests/data/vrs/json/CopyNumberCount +++ /dev/null @@ -1,82 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/CopyNumberCount", - "title": "CopyNumberCount", - "type": "object", - "maturity": "draft", - "ga4ghDigest": { - "keys": [ - "copies", - "location", - "type" - ], - "prefix": "CN" - }, - "description": "The absolute count of discrete copies of a Location or Gene, within a system (e.g. genome, cell, etc.).", - "properties": { - "id": { - "type": "string", - "description": "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)." - }, - "label": { - "type": "string", - "description": "A primary label for the entity." - }, - "description": { - "type": "string", - "description": "A free-text description of the entity." - }, - "extensions": { - "type": "array", - "ordered": true, - "items": { - "$ref": "/ga4gh/schema/gks-common/1.x/json/Extension" - } - }, - "type": { - "type": "string", - "const": "CopyNumberCount", - "default": "CopyNumberCount", - "description": "MUST be \"CopyNumberCount\"" - }, - "digest": { - "description": "A sha512t24u digest created using the VRS Computed Identifier algorithm.", - "type": "string", - "pattern": "^[0-9A-Za-z_\\-]{32}$" - }, - "expressions": { - "type": "array", - "ordered": false, - "items": { - "$ref": "/ga4gh/schema/vrs/2.x/json/Expression" - } - }, - "location": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/SequenceLocation" - }, - { - "$ref": "/ga4gh/schema/gks-common/1.x/json/IRI" - } - ], - "description": "A location for which the number of systemic copies is described." - }, - "copies": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Range" - }, - { - "type": "integer" - } - ], - "description": "The integral number of copies of the subject in a system" - } - }, - "required": [ - "copies", - "location" - ], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/Expression b/tests/data/vrs/json/Expression deleted file mode 100644 index bf3b5fd..0000000 --- a/tests/data/vrs/json/Expression +++ /dev/null @@ -1,36 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/Expression", - "title": "Expression", - "type": "object", - "privateTo": "Variation", - "maturity": "draft", - "description": "Representation of a variation by a specified nomenclature or syntax for a Variation object. Common examples of expressions for the description of molecular variation include the HGVS and ISCN nomenclatures.", - "properties": { - "syntax": { - "type": "string", - "enum": [ - "hgvs.c", - "hgvs.p", - "hgvs.g", - "hgvs.m", - "hgvs.n", - "hgvs.r", - "iscn", - "gnomad", - "spdi" - ] - }, - "value": { - "type": "string" - }, - "syntax_version": { - "type": "string" - } - }, - "required": [ - "syntax", - "value" - ], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/Haplotype b/tests/data/vrs/json/Haplotype deleted file mode 100644 index dadcc11..0000000 --- a/tests/data/vrs/json/Haplotype +++ /dev/null @@ -1,78 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/Haplotype", - "title": "Haplotype", - "type": "object", - "maturity": "draft", - "ga4ghDigest": { - "prefix": "HT", - "keys": [ - "members", - "type" - ] - }, - "description": "An ordered set of co-occurring variants on the same molecule.", - "properties": { - "id": { - "type": "string", - "description": "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)." - }, - "label": { - "type": "string", - "description": "A primary label for the entity." - }, - "description": { - "type": "string", - "description": "A free-text description of the entity." - }, - "extensions": { - "type": "array", - "ordered": true, - "items": { - "$ref": "/ga4gh/schema/gks-common/1.x/json/Extension" - } - }, - "type": { - "type": "string", - "const": "Haplotype", - "default": "Haplotype", - "description": "MUST be \"Haplotype\"" - }, - "digest": { - "description": "A sha512t24u digest created using the VRS Computed Identifier algorithm.", - "type": "string", - "pattern": "^[0-9A-Za-z_\\-]{32}$" - }, - "expressions": { - "type": "array", - "ordered": false, - "items": { - "$ref": "/ga4gh/schema/vrs/2.x/json/Expression" - } - }, - "members": { - "type": "array", - "ordered": true, - "minItems": 2, - "uniqueItems": false, - "items": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Adjacency" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Allele" - }, - { - "$ref": "/ga4gh/schema/gks-common/1.x/json/IRI" - } - ] - }, - "description": "A list of Alleles that comprise a Haplotype. Members must share the same reference sequence as adjacent members. Alleles should not have overlapping or adjacent coordinates with neighboring Alleles. Neighboring alleles should be ordered by ascending coordinates, unless represented on a DNA inversion (following an Adjacency with end-defined adjoinedSequences), in which case they should be ordered in descending coordinates. Sequence references MUST be consistent for all members between and including the end of one Adjacency and the beginning of another." - } - }, - "required": [ - "members" - ], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/LengthExpression b/tests/data/vrs/json/LengthExpression deleted file mode 100644 index 397c48c..0000000 --- a/tests/data/vrs/json/LengthExpression +++ /dev/null @@ -1,55 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/LengthExpression", - "title": "LengthExpression", - "type": "object", - "maturity": "draft", - "ga4ghDigest": { - "keys": [ - "length", - "type" - ] - }, - "description": "A sequence expressed only by its length.", - "properties": { - "id": { - "type": "string", - "description": "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)." - }, - "label": { - "type": "string", - "description": "A primary label for the entity." - }, - "description": { - "type": "string", - "description": "A free-text description of the entity." - }, - "extensions": { - "type": "array", - "ordered": true, - "items": { - "$ref": "/ga4gh/schema/gks-common/1.x/json/Extension" - } - }, - "type": { - "type": "string", - "const": "LengthExpression", - "default": "LengthExpression", - "description": "MUST be \"LengthExpression\"" - }, - "length": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Range" - }, - { - "type": "integer" - } - ] - } - }, - "required": [ - "type" - ], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/LiteralSequenceExpression b/tests/data/vrs/json/LiteralSequenceExpression deleted file mode 100644 index d958b3d..0000000 --- a/tests/data/vrs/json/LiteralSequenceExpression +++ /dev/null @@ -1,50 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/LiteralSequenceExpression", - "title": "LiteralSequenceExpression", - "type": "object", - "maturity": "draft", - "ga4ghDigest": { - "keys": [ - "sequence", - "type" - ] - }, - "description": "An explicit expression of a Sequence.", - "properties": { - "id": { - "type": "string", - "description": "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)." - }, - "label": { - "type": "string", - "description": "A primary label for the entity." - }, - "description": { - "type": "string", - "description": "A free-text description of the entity." - }, - "extensions": { - "type": "array", - "ordered": true, - "items": { - "$ref": "/ga4gh/schema/gks-common/1.x/json/Extension" - } - }, - "type": { - "type": "string", - "const": "LiteralSequenceExpression", - "default": "LiteralSequenceExpression", - "description": "MUST be \"LiteralSequenceExpression\"" - }, - "sequence": { - "$ref": "/ga4gh/schema/vrs/2.x/json/SequenceString", - "description": "the literal sequence" - } - }, - "required": [ - "sequence", - "type" - ], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/Location b/tests/data/vrs/json/Location deleted file mode 100644 index 4612118..0000000 --- a/tests/data/vrs/json/Location +++ /dev/null @@ -1,11 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/Location", - "title": "Location", - "type": "object", - "description": "A contiguous segment of a biological sequence.", - "$ref": "/ga4gh/schema/vrs/2.x/json/SequenceLocation", - "discriminator": { - "propertyName": "type" - } -} \ No newline at end of file diff --git a/tests/data/vrs/json/MolecularVariation b/tests/data/vrs/json/MolecularVariation deleted file mode 100644 index f2de4e1..0000000 --- a/tests/data/vrs/json/MolecularVariation +++ /dev/null @@ -1,18 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/MolecularVariation", - "title": "MolecularVariation", - "type": "object", - "description": "A variation on a contiguous molecule.", - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Allele" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Haplotype" - } - ], - "discriminator": { - "propertyName": "type" - } -} \ No newline at end of file diff --git a/tests/data/vrs/json/Range b/tests/data/vrs/json/Range deleted file mode 100644 index 5ae7fda..0000000 --- a/tests/data/vrs/json/Range +++ /dev/null @@ -1,21 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/Range", - "title": "Range", - "type": "array", - "maturity": "draft", - "description": "An inclusive range of values bounded by one or more integers.", - "ordered": true, - "items": { - "oneOf": [ - { - "type": "integer" - }, - { - "type": "null" - } - ] - }, - "maxItems": 2, - "minItems": 2 -} \ No newline at end of file diff --git a/tests/data/vrs/json/ReferenceLengthExpression b/tests/data/vrs/json/ReferenceLengthExpression deleted file mode 100644 index b4687a9..0000000 --- a/tests/data/vrs/json/ReferenceLengthExpression +++ /dev/null @@ -1,67 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/ReferenceLengthExpression", - "title": "ReferenceLengthExpression", - "type": "object", - "maturity": "draft", - "ga4ghDigest": { - "keys": [ - "length", - "repeatSubunitLength", - "type" - ] - }, - "description": "An expression of a length of a sequence from a repeating reference.", - "properties": { - "id": { - "type": "string", - "description": "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)." - }, - "label": { - "type": "string", - "description": "A primary label for the entity." - }, - "description": { - "type": "string", - "description": "A free-text description of the entity." - }, - "extensions": { - "type": "array", - "ordered": true, - "items": { - "$ref": "/ga4gh/schema/gks-common/1.x/json/Extension" - } - }, - "type": { - "type": "string", - "const": "ReferenceLengthExpression", - "default": "ReferenceLengthExpression", - "description": "MUST be \"ReferenceLengthExpression\"" - }, - "length": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Range" - }, - { - "type": "integer" - } - ], - "description": "The number of residues in the expressed sequence." - }, - "sequence": { - "$ref": "/ga4gh/schema/vrs/2.x/json/SequenceString", - "description": "the Sequence encoded by the Reference Length Expression." - }, - "repeatSubunitLength": { - "type": "integer", - "description": "The number of residues in the repeat subunit." - } - }, - "required": [ - "length", - "repeatSubunitLength", - "type" - ], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/Residue b/tests/data/vrs/json/Residue deleted file mode 100644 index d96de64..0000000 --- a/tests/data/vrs/json/Residue +++ /dev/null @@ -1,9 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/Residue", - "title": "Residue", - "type": "string", - "maturity": "draft", - "description": "A character representing a specific residue (i.e., molecular species) or groupings of these (\"ambiguity codes\"), using [one-letter IUPAC abbreviations](https://en.wikipedia.org/wiki/International_Union_of_Pure_and_Applied_Chemistry#Amino_acid_and_nucleotide_base_codes) for nucleic acids and amino acids.", - "pattern": "[A-Z*\\-]" -} \ No newline at end of file diff --git a/tests/data/vrs/json/SequenceExpression b/tests/data/vrs/json/SequenceExpression deleted file mode 100644 index 4150c3f..0000000 --- a/tests/data/vrs/json/SequenceExpression +++ /dev/null @@ -1,18 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/SequenceExpression", - "title": "SequenceExpression", - "type": "object", - "description": "An expression describing a Sequence.", - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/LiteralSequenceExpression" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/ReferenceLengthExpression" - } - ], - "discriminator": { - "propertyName": "type" - } -} \ No newline at end of file diff --git a/tests/data/vrs/json/SequenceLocation b/tests/data/vrs/json/SequenceLocation deleted file mode 100644 index 6aa9252..0000000 --- a/tests/data/vrs/json/SequenceLocation +++ /dev/null @@ -1,84 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/SequenceLocation", - "title": "SequenceLocation", - "type": "object", - "maturity": "draft", - "ga4ghDigest": { - "keys": [ - "end", - "sequenceReference", - "start", - "type" - ], - "prefix": "SL" - }, - "description": "A Location defined by an interval on a referenced Sequence.", - "properties": { - "id": { - "type": "string", - "description": "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)." - }, - "label": { - "type": "string", - "description": "A primary label for the entity." - }, - "description": { - "type": "string", - "description": "A free-text description of the entity." - }, - "extensions": { - "type": "array", - "ordered": true, - "items": { - "$ref": "/ga4gh/schema/gks-common/1.x/json/Extension" - } - }, - "type": { - "type": "string", - "const": "SequenceLocation", - "default": "SequenceLocation", - "description": "MUST be \"SequenceLocation\"" - }, - "digest": { - "description": "A sha512t24u digest created using the VRS Computed Identifier algorithm.", - "type": "string", - "pattern": "^[0-9A-Za-z_\\-]{32}$" - }, - "sequenceReference": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/SequenceReference" - }, - { - "$ref": "/ga4gh/schema/gks-common/1.x/json/IRI" - } - ], - "description": "A SequenceReference." - }, - "start": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Range" - }, - { - "type": "integer" - } - ], - "description": "The start coordinate or range of the SequenceLocation. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range less than or equal to the value of `end`." - }, - "end": { - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Range" - }, - { - "type": "integer" - } - ], - "description": "The end coordinate or range of the SequenceLocation. The minimum value of this coordinate or range is 0. MUST represent a coordinate or range greater than or equal to the value of `start`." - } - }, - "required": [], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/SequenceReference b/tests/data/vrs/json/SequenceReference deleted file mode 100644 index 3b0a932..0000000 --- a/tests/data/vrs/json/SequenceReference +++ /dev/null @@ -1,53 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/SequenceReference", - "title": "SequenceReference", - "type": "object", - "maturity": "draft", - "ga4ghDigest": { - "assigned": true - }, - "description": "A sequence of nucleic or amino acid character codes.", - "properties": { - "id": { - "type": "string", - "description": "The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE)." - }, - "label": { - "type": "string", - "description": "A primary label for the entity." - }, - "description": { - "type": "string", - "description": "A free-text description of the entity." - }, - "extensions": { - "type": "array", - "ordered": true, - "items": { - "$ref": "/ga4gh/schema/gks-common/1.x/json/Extension" - } - }, - "type": { - "type": "string", - "const": "SequenceReference" - }, - "refgetAccession": { - "description": "A `GA4GH RefGet ` identifier for the referenced sequence, using the sha512t24u digest.", - "type": "string", - "pattern": "^SQ.[0-9A-Za-z_\\-]{32}$" - }, - "residueAlphabet": { - "type": "string", - "description": "The interpretation of the character codes referred to by the refget accession, where \"aa\" specifies an amino acid character set, and \"na\" specifies a nucleic acid character set.", - "enum": [ - "aa", - "na" - ] - } - }, - "required": [ - "refgetAccession" - ], - "additionalProperties": false -} \ No newline at end of file diff --git a/tests/data/vrs/json/SequenceString b/tests/data/vrs/json/SequenceString deleted file mode 100644 index 40c5158..0000000 --- a/tests/data/vrs/json/SequenceString +++ /dev/null @@ -1,9 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/SequenceString", - "title": "SequenceString", - "type": "string", - "maturity": "draft", - "description": "A character string of Residues that represents a biological sequence using the conventional sequence order (5\u2019-to-3\u2019 for nucleic acid sequences, and amino-to-carboxyl for amino acid sequences). IUPAC ambiguity codes are permitted in Sequence Strings.", - "pattern": "^[A-Z*\\-]*$" -} \ No newline at end of file diff --git a/tests/data/vrs/json/SystemicVariation b/tests/data/vrs/json/SystemicVariation deleted file mode 100644 index d7e4d61..0000000 --- a/tests/data/vrs/json/SystemicVariation +++ /dev/null @@ -1,18 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/SystemicVariation", - "title": "SystemicVariation", - "type": "object", - "description": "A Variation of multiple molecules in the context of a system, e.g. a genome, sample, or homologous chromosomes.", - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/CopyNumberChange" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/CopyNumberCount" - } - ], - "discriminator": { - "propertyName": "type" - } -} \ No newline at end of file diff --git a/tests/data/vrs/json/Variation b/tests/data/vrs/json/Variation deleted file mode 100644 index 65add7a..0000000 --- a/tests/data/vrs/json/Variation +++ /dev/null @@ -1,27 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/Variation", - "title": "Variation", - "type": "object", - "description": "A representation of the state of one or more biomolecules.", - "oneOf": [ - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Adjacency" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Allele" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/CopyNumberChange" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/CopyNumberCount" - }, - { - "$ref": "/ga4gh/schema/vrs/2.x/json/Haplotype" - } - ], - "discriminator": { - "propertyName": "type" - } -} \ No newline at end of file diff --git a/tests/test_basic.py b/tests/test_basic.py index ddbe756..61eee9f 100644 --- a/tests/test_basic.py +++ b/tests/test_basic.py @@ -1,36 +1,38 @@ -import yaml -import pytest -import shutil import os +import shutil from pathlib import Path -from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor -from ga4gh.gks.metaschema.scripts.y2t import main as y2t -from ga4gh.gks.metaschema.scripts.source2splitjs import split_defs_to_js +import pytest +import yaml + from ga4gh.gks.metaschema.scripts.source2classes import main as s2c +from ga4gh.gks.metaschema.scripts.source2splitjs import split_defs_to_js +from ga4gh.gks.metaschema.scripts.y2t import main as y2t +from ga4gh.gks.metaschema.tools.source_proc import YamlSchemaProcessor root = Path(__file__).parent -processor = YamlSchemaProcessor(root / 'data/vrs/vrs-source.yaml') -processor.js_yaml_dump(open(root /'data/vrs/vrs.yaml', 'w')) -target = yaml.load(open(root /'data/vrs/vrs.yaml'), Loader=yaml.SafeLoader) +processor = YamlSchemaProcessor(root / "data/vrs/vrs-source.yaml") +processor.js_yaml_dump(open(root / "data/vrs/vrs.yaml", "w")) +target = yaml.load(open(root / "data/vrs/vrs.yaml"), Loader=yaml.SafeLoader) + def test_mv_is_passthrough(): - assert processor.class_is_passthrough('MolecularVariation') + assert processor.class_is_passthrough("MolecularVariation") def test_se_not_passthrough(): - assert not processor.class_is_passthrough('SequenceExpression') + assert not processor.class_is_passthrough("SequenceExpression") def test_class_is_subclass(): - assert processor.class_is_subclass('Haplotype', 'Variation') - assert not processor.class_is_subclass('Haplotype', 'Location') + assert processor.class_is_subclass("Haplotype", "Variation") + assert not processor.class_is_subclass("Haplotype", "Location") def test_yaml_create(): - p = YamlSchemaProcessor(root /'data/gks-common/core-source.yaml') - p.js_yaml_dump(open(root /'data/gks-common/core.yaml', 'w')) + p = YamlSchemaProcessor(root / "data/gks-common/core-source.yaml") + p.js_yaml_dump(open(root / "data/gks-common/core.yaml", "w")) assert True @@ -40,14 +42,14 @@ def test_yaml_target_match(): def test_merged_create(): - p = YamlSchemaProcessor(root /'data/vrs/vrs-source.yaml') + p = YamlSchemaProcessor(root / "data/vrs/vrs-source.yaml") p.merge_imported() assert True def test_split_create(): split_defs_to_js(processor) - p = YamlSchemaProcessor(root /'data/gnomAD/gnomad-caf-source.yaml') + p = YamlSchemaProcessor(root / "data/gnomAD/gnomad-caf-source.yaml") split_defs_to_js(p) assert True @@ -63,3 +65,7 @@ def test_docs_create(): os.makedirs(defs) y2t(processor) assert True + + +if __name__ == "__main__": + pytest.main([__file__]) From 1cef800c686f2ad8808b2ce025d3cdd7c53271b6 Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Thu, 7 Nov 2024 17:06:55 -0500 Subject: [PATCH 13/16] fix: resolve UnboundLocalError for getting `template` in y2t.py --- src/ga4gh/gks/metaschema/scripts/y2t.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/ga4gh/gks/metaschema/scripts/y2t.py b/src/ga4gh/gks/metaschema/scripts/y2t.py index 422e004..3d60c4c 100755 --- a/src/ga4gh/gks/metaschema/scripts/y2t.py +++ b/src/ga4gh/gks/metaschema/scripts/y2t.py @@ -165,8 +165,8 @@ def main(proc_schema: YamlSchemaProcessor) -> None: for class_name, class_definition in proc_schema.defs.items(): with open(proc_schema.def_fp / (class_name + ".rst"), "w") as f: maturity = class_definition.get("maturity", "") + template = env.get_template("maturity") if maturity == "draft": - template = env.get_template("maturity") print( template.render(info="warning", maturity_level="draft", modifier="significantly"), file=f, From 40bcc498a2bfec5c2281a8901bddb833a47416db Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Thu, 7 Nov 2024 17:10:34 -0500 Subject: [PATCH 14/16] fix: trial use should be two words in `MATURITY_MAPPING` --- src/ga4gh/gks/metaschema/scripts/y2t.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/ga4gh/gks/metaschema/scripts/y2t.py b/src/ga4gh/gks/metaschema/scripts/y2t.py index 3d60c4c..94aba24 100755 --- a/src/ga4gh/gks/metaschema/scripts/y2t.py +++ b/src/ga4gh/gks/metaschema/scripts/y2t.py @@ -17,7 +17,7 @@ # Mapping to corresponding hex color code and code for maturity status MATURITY_MAPPING: dict[str, tuple[str, str]] = { "draft": ("D3D3D3", "D"), - "trial_use": ("FFFF99", "TU"), + "trial use": ("FFFF99", "TU"), "normative": ("B6D7A8", "N"), "deprecated": ("EA9999", "X"), } @@ -136,7 +136,7 @@ def resolve_flags(class_property_attributes: dict) -> str: if maturity is not None: background_color, maturity_code = MATURITY_MAPPING.get(maturity, (None, None)) if background_color and maturity_code: - title = f"{maturity.replace("_", " ").title()} Maturity Level" + title = f"{maturity.title()} Maturity Level" flags += f""" .. raw:: html From 047686c3bd362092ea3aa1cd6f22914b96a64257 Mon Sep 17 00:00:00 2001 From: Kori Kuzma Date: Fri, 8 Nov 2024 08:09:43 -0500 Subject: [PATCH 15/16] fix: remove new lines + extra spaces in maturity template close #33 --- src/templates/maturity | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/templates/maturity b/src/templates/maturity index fcb5f32..7c99364 100644 --- a/src/templates/maturity +++ b/src/templates/maturity @@ -1,4 +1,4 @@ -.. {{ info }}:: This data class is at a **{{ maturity_level }}** maturity level and may change - {{ modifier }} in future releases. Maturity levels are described in - the :ref:`maturity-model`. +.. {{ info }}:: This data class is at a **{{ maturity_level }}** maturity level and may \ + change{{ ' ' if modifier else '' }}{{ modifier }} in future releases. Maturity \ + levels are described in the :ref:`maturity-model`. From c5c8f9aeadd16a54bb6f0600395efc2c54f21cc3 Mon Sep 17 00:00:00 2001 From: "Alex H. Wagner, PhD" Date: Fri, 22 Nov 2024 12:17:15 -0500 Subject: [PATCH 16/16] add class maturity inheritance checks --- src/ga4gh/gks/metaschema/tools/source_proc.py | 29 ++++- tests/data/catvrs/catvrs-source.yaml | 1 + tests/data/gks-common/core-source.yaml | 3 + tests/data/gnomAD/json/GnomadCAF | 116 ++++++++++++++++++ .../data/va-spec/core-im/core-im-source.yaml | 5 + tests/data/vrs/def/CopyNumber.rst | 56 +++++++++ .../data/vrs/def/Ga4ghIdentifiableObject.rst | 48 ++++++++ tests/data/vrs/def/Location.rst | 9 ++ tests/data/vrs/def/MolecularVariation.rst | 9 ++ tests/data/vrs/def/SequenceExpression.rst | 44 +++++++ tests/data/vrs/def/SystemicVariation.rst | 9 ++ tests/data/vrs/def/ValueObject.rst | 9 ++ tests/data/vrs/def/Variation.rst | 52 ++++++++ tests/data/vrs/json/Location | 12 ++ tests/data/vrs/json/MolecularVariation | 19 +++ tests/data/vrs/json/SequenceExpression | 19 +++ tests/data/vrs/json/SystemicVariation | 19 +++ tests/data/vrs/json/Variation | 28 +++++ tests/data/vrs/vrs-source.yaml | 8 ++ tests/data/vrs/vrs.yaml | 5 + 20 files changed, 496 insertions(+), 4 deletions(-) create mode 100644 tests/data/gnomAD/json/GnomadCAF create mode 100644 tests/data/vrs/def/CopyNumber.rst create mode 100644 tests/data/vrs/def/Ga4ghIdentifiableObject.rst create mode 100644 tests/data/vrs/def/Location.rst create mode 100644 tests/data/vrs/def/MolecularVariation.rst create mode 100644 tests/data/vrs/def/SequenceExpression.rst create mode 100644 tests/data/vrs/def/SystemicVariation.rst create mode 100644 tests/data/vrs/def/ValueObject.rst create mode 100644 tests/data/vrs/def/Variation.rst create mode 100644 tests/data/vrs/json/Location create mode 100644 tests/data/vrs/json/MolecularVariation create mode 100644 tests/data/vrs/json/SequenceExpression create mode 100644 tests/data/vrs/json/SystemicVariation create mode 100644 tests/data/vrs/json/Variation diff --git a/src/ga4gh/gks/metaschema/tools/source_proc.py b/src/ga4gh/gks/metaschema/tools/source_proc.py index 53c8763..90d2247 100755 --- a/src/ga4gh/gks/metaschema/tools/source_proc.py +++ b/src/ga4gh/gks/metaschema/tools/source_proc.py @@ -21,6 +21,12 @@ curie_re = re.compile(r"(\S+):(\S+)") defs_re = re.compile(r"#/(\$defs|definitions)/.*") +maturity_levels = { + 'deprecated': 0, + 'draft': 1, + 'trial use': 2, + 'normative': 3 +} class YamlSchemaProcessor: def __init__(self, schema_fp, root_fp=None): @@ -55,6 +61,7 @@ def _init_from_raw(self): self.defs = self.processed_schema.get(self.schema_def_keyword, None) self.processed_classes = set() self.process_schema() + self.check_processed_schema() self.for_js = copy.deepcopy(self.processed_schema) self.clean_for_js() @@ -203,6 +210,20 @@ def process_schema(self): for schema_class in self.defs: self.process_schema_class(schema_class) + def check_processed_schema(self): + for cls in self.processed_classes: + cls_def = self.defs[cls] + if 'inherits' in cls_def: + inherited_cls_name = cls_def['inherits'] + if ':' in inherited_cls_name: + namespace, inherited_cls_split_name = inherited_cls_name.split(':') + inherited_cls_def = self.imports[namespace].defs[inherited_cls_split_name] + else: + inherited_cls_def = self.defs[inherited_cls_name] + assert inherited_cls_def['maturity'] >= cls_def['maturity'], \ + f"Maturity of {cls} is greater than parent class {inherited_cls_name}." + pass + def class_is_abstract(self, schema_class): schema_class_def, _ = self.get_local_or_inherited_class(schema_class, raw=True) return "properties" not in schema_class_def and not self.class_is_primitive(schema_class) @@ -319,10 +340,10 @@ def process_schema_class(self, schema_class): return processed_class_def = self.processed_schema[self.schema_def_keyword][schema_class] - # Check GKS maturity model on all public, concrete classes - if not (self.class_is_protected(schema_class) or self.class_is_abstract(schema_class)): - assert "maturity" in processed_class_def, schema_class - assert processed_class_def["maturity"] in ["draft", "trial use", "normative", "deprecated"], schema_class + # Check GKS maturity model on all defined classes + # if not (self.class_is_protected(schema_class) or self.class_is_abstract(schema_class)): + assert 'maturity' in processed_class_def, schema_class + assert processed_class_def['maturity'] in maturity_levels, schema_class if self.class_is_protected(schema_class): containing_class = self.raw_defs[schema_class]["protectedClassOf"] diff --git a/tests/data/catvrs/catvrs-source.yaml b/tests/data/catvrs/catvrs-source.yaml index 0fffe45..3eaef7d 100644 --- a/tests/data/catvrs/catvrs-source.yaml +++ b/tests/data/catvrs/catvrs-source.yaml @@ -23,6 +23,7 @@ $defs: CategoricalVariation: inherits: gks.core:DomainEntity + maturity: draft description: >- A representation of a categorically-defined domain for variation, in which individual contextual variation instances may be members of the domain. diff --git a/tests/data/gks-common/core-source.yaml b/tests/data/gks-common/core-source.yaml index cbb4978..0063c53 100644 --- a/tests/data/gks-common/core-source.yaml +++ b/tests/data/gks-common/core-source.yaml @@ -5,6 +5,7 @@ strict: true $defs: Entity: + maturity: draft description: >- Entity is the root class of ‘core’ classes model - those that have identifiers and other general metadata like labels, xrefs, urls, descriptions, etc. All core classes descend from and inherit @@ -30,6 +31,7 @@ $defs: MappableEntity: inherits: Entity + maturity: draft description: an Entity that is mappable to codings in other terminology systems. heritableProperties: mappings: @@ -125,6 +127,7 @@ $defs: DomainEntity: inherits: MappableEntity + maturity: draft description: >- An Entity that is specific to a particular biomedical domain such as disease, therapeutics, or genes. diff --git a/tests/data/gnomAD/json/GnomadCAF b/tests/data/gnomAD/json/GnomadCAF new file mode 100644 index 0000000..313a0d4 --- /dev/null +++ b/tests/data/gnomAD/json/GnomadCAF @@ -0,0 +1,116 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://w3id.org/ga4gh/schema/gk-pilot/main/gnomAD/json/GnomadCAF", + "title": "GnomadCAF", + "type": "object", + "$defs": { + "GnomadCafProperties": { + "description": "Additional properties specific to the gnomAD CAF model.", + "protectedClassOf": "GnomadCAF", + "type": "object", + "maturity": "draft", + "properties": { + "ancillaryResults": { + "type": "object", + "properties": { + "grpMaxFAF95": { + "$ref": "#/$defs/GrpMaxFAF95" + }, + "jointGrpMaxFAF95": { + "description": "The Group Max Filtering Allele Frequency (95% confidence interval) calculated jointly from genome and exome data.", + "$ref": "#/$defs/GrpMaxFAF95" + }, + "homozygotes": { + "type": "integer" + }, + "hemizygotes": { + "type": "integer" + } + }, + "additionalProperties": false + }, + "qualityMeasures": { + "type": "object", + "properties": { + "meanDepth": { + "description": "The mean depth of coverage.", + "type": "number" + }, + "fractionCoverage20x": { + "description": "The fraction of individuals with at least 20x coverage.", + "type": "number" + }, + "qcFilters": { + "type": "array", + "items": { + "type": "string" + } + }, + "monoallelic": { + "description": "All samples are homozygous alternate for the variant.", + "type": "boolean" + }, + "lowComplexityRegion": { + "description": "This flag indicates the variant is found in a low complexity region. These regions were identified with the symmetric DUST algorithm at a score threshold of 30.", + "type": "boolean" + }, + "lowConfidenceLossOfFunctionError": { + "description": "Low confidence in predicted Loss of Function (pLoF), where variant is determined by LOFTEE to be unlikely loss of function for a transcript.", + "type": "boolean" + }, + "lossOfFunctionWarning": { + "description": "A warning provided by LOFTEE to use caution when interpreting the transcript or variant.", + "type": "boolean" + }, + "noncodingTranscriptError": { + "description": "Marked in a putative loss of function category by VEP (essential splice, stop-gained, or frameshift) but appears on a non-protein-coding transcript.", + "type": "boolean" + }, + "heterozygousSkewedAlleleCount": { + "description": "The count of individuals called as heterozygous for this variant with a skewed allele balance, indicating some of these individuals may be miscalled homozygous alternative allele.", + "type": "integer" + } + }, + "additionalProperties": false + } + }, + "required": [] + }, + "GrpMaxFAF95": { + "description": "The group maximum filtering allele frequency at 95% CI", + "protectedClassOf": "GnomadCAF", + "type": "object", + "maturity": "draft", + "properties": { + "frequency": { + "type": "number" + }, + "confidenceInterval": { + "type": "number", + "const": 0.95, + "default": 0.95 + }, + "groupId": { + "type": "string", + "description": "The genetic ancestry group from which the max frequency was calculated." + } + }, + "required": [ + "confidenceInterval", + "frequency", + "groupId" + ], + "additionalProperties": false + } + }, + "maturity": "draft", + "description": "The GA4GH Cohort Allele Frequency model, with additional schema properties specific to the gnomAD resource. ", + "allOf": [ + { + "$ref": "/ga4gh/schema/va-spec/1.x/profiles/caf/json/CohortAlleleFrequency" + }, + { + "$ref": "#/$defs/GnomadCafProperties" + } + ] +} \ No newline at end of file diff --git a/tests/data/va-spec/core-im/core-im-source.yaml b/tests/data/va-spec/core-im/core-im-source.yaml index 9faa107..8913f72 100644 --- a/tests/data/va-spec/core-im/core-im-source.yaml +++ b/tests/data/va-spec/core-im/core-im-source.yaml @@ -17,6 +17,7 @@ namespaces: $defs: InformationEntity: inherits: gks.core:Entity + maturity: draft description: >- InformationEntities are abstract (non-physical) entities that are about something (i.e. they carry information about things in the real world). @@ -170,6 +171,7 @@ $defs: required: [ "value" ] Statement: inherits: InformationEntity + maturity: draft description: >- A Statement (aka ‘Assertion’) represents a claim of purported truth as made by a particular agent, on a particular occasion. @@ -201,6 +203,7 @@ $defs: - direction VariantStatement: inherits: Statement + maturity: draft description: >- A :ref:`Statement` describing the impact of a variant. heritableProperties: @@ -213,6 +216,7 @@ $defs: description: A variant that is the subject of the Statement. VariantClassification: inherits: VariantStatement + maturity: draft description: >- A :ref:`VariantStatement` classifying the impact of a variant. heritableProperties: @@ -226,6 +230,7 @@ $defs: - classification VariantStudySummary: inherits: VariantStatement + maturity: draft description: >- A :ref:`Statement` summarizing evidence about the impact of a variant from one or more studies. diff --git a/tests/data/vrs/def/CopyNumber.rst b/tests/data/vrs/def/CopyNumber.rst new file mode 100644 index 0000000..78b6eb9 --- /dev/null +++ b/tests/data/vrs/def/CopyNumber.rst @@ -0,0 +1,56 @@ + +.. warning:: This data class is at a **draft** maturity level and may change + significantly in future releases. Maturity levels are described in + the :ref:`maturity-model`. + + +**Computational Definition** + +A measure of the copies of a :ref:`Location` within a system (e.g. genome, cell, etc.) + +**Information Model** + +Some CopyNumber attributes are inherited from :ref:`Variation`. + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Field + - Type + - Limits + - Description + * - id + - string + - 0..1 + - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). + * - label + - string + - 0..1 + - A primary label for the entity. + * - description + - string + - 0..1 + - A free-text description of the entity. + * - extensions + - :ref:`Extension` + - 0..m + - + * - type + - string + - 0..1 + - + * - digest + - string + - 0..1 + - A sha512t24u digest created using the VRS Computed Identifier algorithm. + * - expressions + - :ref:`Expression` + - 0..m + - + * - location + - :ref:`IRI` | :ref:`Location` + - 1..1 + - A location for which the number of systemic copies is described. diff --git a/tests/data/vrs/def/Ga4ghIdentifiableObject.rst b/tests/data/vrs/def/Ga4ghIdentifiableObject.rst new file mode 100644 index 0000000..13b1cda --- /dev/null +++ b/tests/data/vrs/def/Ga4ghIdentifiableObject.rst @@ -0,0 +1,48 @@ + +.. warning:: This data class is at a **draft** maturity level and may change + significantly in future releases. Maturity levels are described in + the :ref:`maturity-model`. + + +**Computational Definition** + +A contextual value object for which a GA4GH computed identifier can be created. + +**Information Model** + +Some Ga4ghIdentifiableObject attributes are inherited from :ref:`gks.core:Entity`. + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Field + - Type + - Limits + - Description + * - id + - string + - 0..1 + - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). + * - label + - string + - 0..1 + - A primary label for the entity. + * - description + - string + - 0..1 + - A free-text description of the entity. + * - extensions + - :ref:`Extension` + - 0..m + - + * - type + - string + - 0..1 + - + * - digest + - string + - 0..1 + - A sha512t24u digest created using the VRS Computed Identifier algorithm. diff --git a/tests/data/vrs/def/Location.rst b/tests/data/vrs/def/Location.rst new file mode 100644 index 0000000..1de8531 --- /dev/null +++ b/tests/data/vrs/def/Location.rst @@ -0,0 +1,9 @@ + +.. warning:: This data class is at a **draft** maturity level and may change + significantly in future releases. Maturity levels are described in + the :ref:`maturity-model`. + + +**Computational Definition** + +A contiguous segment of a biological sequence. diff --git a/tests/data/vrs/def/MolecularVariation.rst b/tests/data/vrs/def/MolecularVariation.rst new file mode 100644 index 0000000..9661993 --- /dev/null +++ b/tests/data/vrs/def/MolecularVariation.rst @@ -0,0 +1,9 @@ + +.. warning:: This data class is at a **draft** maturity level and may change + significantly in future releases. Maturity levels are described in + the :ref:`maturity-model`. + + +**Computational Definition** + +A :ref:`variation` on a contiguous molecule. diff --git a/tests/data/vrs/def/SequenceExpression.rst b/tests/data/vrs/def/SequenceExpression.rst new file mode 100644 index 0000000..ff940b3 --- /dev/null +++ b/tests/data/vrs/def/SequenceExpression.rst @@ -0,0 +1,44 @@ + +.. warning:: This data class is at a **draft** maturity level and may change + significantly in future releases. Maturity levels are described in + the :ref:`maturity-model`. + + +**Computational Definition** + +An expression describing a :ref:`Sequence`. + +**Information Model** + +Some SequenceExpression attributes are inherited from :ref:`gks.core:Entity`. + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Field + - Type + - Limits + - Description + * - id + - string + - 0..1 + - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). + * - label + - string + - 0..1 + - A primary label for the entity. + * - description + - string + - 0..1 + - A free-text description of the entity. + * - extensions + - :ref:`Extension` + - 0..m + - + * - type + - string + - 1..1 + - The SequenceExpression class type. MUST match child class type. diff --git a/tests/data/vrs/def/SystemicVariation.rst b/tests/data/vrs/def/SystemicVariation.rst new file mode 100644 index 0000000..ab5e004 --- /dev/null +++ b/tests/data/vrs/def/SystemicVariation.rst @@ -0,0 +1,9 @@ + +.. warning:: This data class is at a **draft** maturity level and may change + significantly in future releases. Maturity levels are described in + the :ref:`maturity-model`. + + +**Computational Definition** + +A Variation of multiple molecules in the context of a system, e.g. a genome, sample, or homologous chromosomes. diff --git a/tests/data/vrs/def/ValueObject.rst b/tests/data/vrs/def/ValueObject.rst new file mode 100644 index 0000000..c9eba68 --- /dev/null +++ b/tests/data/vrs/def/ValueObject.rst @@ -0,0 +1,9 @@ + +.. warning:: This data class is at a **draft** maturity level and may change + significantly in future releases. Maturity levels are described in + the :ref:`maturity-model`. + + +**Computational Definition** + +A contextual value whose equality is based on value, not identity. See https://en.wikipedia.org/wiki/Value_object for more on Value Objects. diff --git a/tests/data/vrs/def/Variation.rst b/tests/data/vrs/def/Variation.rst new file mode 100644 index 0000000..bede2c7 --- /dev/null +++ b/tests/data/vrs/def/Variation.rst @@ -0,0 +1,52 @@ + +.. warning:: This data class is at a **draft** maturity level and may change + significantly in future releases. Maturity levels are described in + the :ref:`maturity-model`. + + +**Computational Definition** + +A representation of the state of one or more biomolecules. + +**Information Model** + +Some Variation attributes are inherited from :ref:`Ga4ghIdentifiableObject`. + +.. list-table:: + :class: clean-wrap + :header-rows: 1 + :align: left + :widths: auto + + * - Field + - Type + - Limits + - Description + * - id + - string + - 0..1 + - The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system. The identified entity may have a different 'id' in a different system, or may refer to an 'id' for the shared concept in another system (e.g. a CURIE). + * - label + - string + - 0..1 + - A primary label for the entity. + * - description + - string + - 0..1 + - A free-text description of the entity. + * - extensions + - :ref:`Extension` + - 0..m + - + * - type + - string + - 0..1 + - + * - digest + - string + - 0..1 + - A sha512t24u digest created using the VRS Computed Identifier algorithm. + * - expressions + - :ref:`Expression` + - 0..m + - diff --git a/tests/data/vrs/json/Location b/tests/data/vrs/json/Location new file mode 100644 index 0000000..ce6be7c --- /dev/null +++ b/tests/data/vrs/json/Location @@ -0,0 +1,12 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/Location", + "title": "Location", + "type": "object", + "maturity": "draft", + "description": "A contiguous segment of a biological sequence.", + "$ref": "/ga4gh/schema/vrs/2.x/json/SequenceLocation", + "discriminator": { + "propertyName": "type" + } +} \ No newline at end of file diff --git a/tests/data/vrs/json/MolecularVariation b/tests/data/vrs/json/MolecularVariation new file mode 100644 index 0000000..1d84ef1 --- /dev/null +++ b/tests/data/vrs/json/MolecularVariation @@ -0,0 +1,19 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/MolecularVariation", + "title": "MolecularVariation", + "type": "object", + "maturity": "draft", + "description": "A variation on a contiguous molecule.", + "oneOf": [ + { + "$ref": "/ga4gh/schema/vrs/2.x/json/Allele" + }, + { + "$ref": "/ga4gh/schema/vrs/2.x/json/Haplotype" + } + ], + "discriminator": { + "propertyName": "type" + } +} \ No newline at end of file diff --git a/tests/data/vrs/json/SequenceExpression b/tests/data/vrs/json/SequenceExpression new file mode 100644 index 0000000..85eba9e --- /dev/null +++ b/tests/data/vrs/json/SequenceExpression @@ -0,0 +1,19 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/SequenceExpression", + "title": "SequenceExpression", + "type": "object", + "maturity": "draft", + "description": "An expression describing a Sequence.", + "oneOf": [ + { + "$ref": "/ga4gh/schema/vrs/2.x/json/LiteralSequenceExpression" + }, + { + "$ref": "/ga4gh/schema/vrs/2.x/json/ReferenceLengthExpression" + } + ], + "discriminator": { + "propertyName": "type" + } +} \ No newline at end of file diff --git a/tests/data/vrs/json/SystemicVariation b/tests/data/vrs/json/SystemicVariation new file mode 100644 index 0000000..4ac06dc --- /dev/null +++ b/tests/data/vrs/json/SystemicVariation @@ -0,0 +1,19 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/SystemicVariation", + "title": "SystemicVariation", + "type": "object", + "maturity": "draft", + "description": "A Variation of multiple molecules in the context of a system, e.g. a genome, sample, or homologous chromosomes.", + "oneOf": [ + { + "$ref": "/ga4gh/schema/vrs/2.x/json/CopyNumberChange" + }, + { + "$ref": "/ga4gh/schema/vrs/2.x/json/CopyNumberCount" + } + ], + "discriminator": { + "propertyName": "type" + } +} \ No newline at end of file diff --git a/tests/data/vrs/json/Variation b/tests/data/vrs/json/Variation new file mode 100644 index 0000000..74e0ab7 --- /dev/null +++ b/tests/data/vrs/json/Variation @@ -0,0 +1,28 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://w3id.org/ga4gh/schema/vrs/2.x/json/Variation", + "title": "Variation", + "type": "object", + "maturity": "draft", + "description": "A representation of the state of one or more biomolecules.", + "oneOf": [ + { + "$ref": "/ga4gh/schema/vrs/2.x/json/Adjacency" + }, + { + "$ref": "/ga4gh/schema/vrs/2.x/json/Allele" + }, + { + "$ref": "/ga4gh/schema/vrs/2.x/json/CopyNumberChange" + }, + { + "$ref": "/ga4gh/schema/vrs/2.x/json/CopyNumberCount" + }, + { + "$ref": "/ga4gh/schema/vrs/2.x/json/Haplotype" + } + ], + "discriminator": { + "propertyName": "type" + } +} \ No newline at end of file diff --git a/tests/data/vrs/vrs-source.yaml b/tests/data/vrs/vrs-source.yaml index f024454..ada1f18 100644 --- a/tests/data/vrs/vrs-source.yaml +++ b/tests/data/vrs/vrs-source.yaml @@ -31,12 +31,14 @@ $defs: ValueObject: inherits: gks.core:Entity + maturity: draft description: >- A contextual value whose equality is based on value, not identity. See https://en.wikipedia.org/wiki/Value_object for more on Value Objects. Ga4ghIdentifiableObject: inherits: ValueObject + maturity: draft description: >- A contextual value object for which a GA4GH computed identifier can be created. ga4ghDigest: @@ -52,6 +54,7 @@ $defs: Variation: inherits: Ga4ghIdentifiableObject + maturity: draft description: >- A representation of the state of one or more biomolecules. oneOf: @@ -88,6 +91,7 @@ $defs: MolecularVariation: inherits: Variation + maturity: draft description: >- A :ref:`variation` on a contiguous molecule. oneOf: @@ -98,6 +102,7 @@ $defs: SystemicVariation: inherits: Variation + maturity: draft description: >- A Variation of multiple molecules in the context of a system, e.g. a genome, sample, or homologous chromosomes. @@ -186,6 +191,7 @@ $defs: # SystemicVariation CopyNumber: + maturity: draft ga4ghDigest: keys: - location @@ -300,6 +306,7 @@ $defs: Location: inherits: Ga4ghIdentifiableObject + maturity: draft description: >- A contiguous segment of a biological sequence. $ref: "#/$defs/SequenceLocation" @@ -388,6 +395,7 @@ $defs: SequenceExpression: inherits: ValueObject + maturity: draft ga4ghDigest: keys: - type diff --git a/tests/data/vrs/vrs.yaml b/tests/data/vrs/vrs.yaml index 294f51e..f5d4c47 100644 --- a/tests/data/vrs/vrs.yaml +++ b/tests/data/vrs/vrs.yaml @@ -4,6 +4,7 @@ title: GA4GH-VRS-Definitions type: object $defs: Variation: + maturity: draft description: A representation of the state of one or more biomolecules. oneOf: - $ref: '#/$defs/Adjacency' @@ -42,6 +43,7 @@ $defs: - value additionalProperties: false MolecularVariation: + maturity: draft description: A variation on a contiguous molecule. oneOf: - $ref: '#/$defs/Allele' @@ -49,6 +51,7 @@ $defs: discriminator: propertyName: type SystemicVariation: + maturity: draft description: A Variation of multiple molecules in the context of a system, e.g. a genome, sample, or homologous chromosomes. oneOf: @@ -304,6 +307,7 @@ $defs: - location additionalProperties: false Location: + maturity: draft description: A contiguous segment of a biological sequence. $ref: '#/$defs/SequenceLocation' discriminator: @@ -412,6 +416,7 @@ $defs: - refgetAccession additionalProperties: false SequenceExpression: + maturity: draft description: An expression describing a Sequence. oneOf: - $ref: '#/$defs/LiteralSequenceExpression'