From de029e539d58182beeb4fa093b19c4d004b8039d Mon Sep 17 00:00:00 2001 From: Larry Babb Date: Mon, 16 Dec 2024 14:36:18 -0500 Subject: [PATCH] update design decision record * GA4GH Inherent Properties * IRIs over CURIEs * VRS identifier syntax and versioning --------- Co-authored-by: Alex H. Wagner, PhD --- docs/source/appendices/design_decisions.rst | 91 ++++++++++++++++++ docs/source/appendices/index.rst | 1 + docs/source/appendices/maturity_model.rst | 8 +- .../2023-connect-gks-identifier-proposal.png | Bin 0 -> 542360 bytes 4 files changed, 96 insertions(+), 4 deletions(-) create mode 100644 docs/source/appendices/design_decisions.rst create mode 100644 docs/source/images/2023-connect-gks-identifier-proposal.png diff --git a/docs/source/appendices/design_decisions.rst b/docs/source/appendices/design_decisions.rst new file mode 100644 index 00000000..1a53bbb8 --- /dev/null +++ b/docs/source/appendices/design_decisions.rst @@ -0,0 +1,91 @@ +.. _design_decisions: + +Design Decisions +!!!!!!!!!!!!!!!! + +The following design decisions were made in the development of the VRS: + +GA4GH Inherent Properties over Value Objects +-------------------------------------------- + +In VRS 1.0 we operated under the principle that all identifiable objects in VRS (e.g. Allele, SequenceLocation, etc.) +would be *value objects*. This meant that they should be immutable and contain only required fields that are +necessary to uniquely identify the object. This approach somewhat simplified the ability to genertate the digests by +allowing the computation of the digest to be based on the entire object. An exception was made for properties with a +leading underscore (namely, the *_id* property), which was removed from the object before a digest was calculated. + +In VRS 2.0 we extended the principle of excepting designated attributes by explicitly defining *inherent properties* +that constitute the properties used to compute an object digest. This was done to enable expressivity of VRS, +enabling implementations to pass common, descriptive metadata as part of the identifiable objects without sacrificing +the ability to create globally unique, federated identifiers from VRS 1.3. + +As a result, we had to introduce a new field in the digest model called *ga4gh.inherent* which is described in detail +in the section on :ref:`ga4gh-inherent-properties`. + +IRIs over CURIEs +---------------- + +In VRS 2.0 we moved away from the use of CURIEs in favor of :ref:`iriReference`. Several factors played a role in +this decision. + +JSON Schema, the default data model for GKS specifications, does not allow for encoding of CURIE namespaces as is done +in other frameworks such as JSON-LD or XML. As a result, namespaces must be captured from custom data structures, API +endpoints, or documentation that may not persist as messages are exchanged between systems. To address this, references +in GKS specs now use IRIs to reference objects explicitly. + +IRI-References over IRIs +------------------------ +We opted for the general use of IRI-References as a way to provide a more flexible approach to the use of IRIs +in most GKS message structures. IRI-references (relative IRIs) benefit the users allow for compact representation +of concepts that are accessible within a system (e.g. a directory structure or web API). + +VRS identifier syntax and versioning +------------------------------------ + +The :ref:`versioning` section describes the versioning and release naming conventions for the VRS product. +Approved releases will be assigned to the version number alone, but connect, ballot and snapshot releases will +include the context term and date in addition to the target version number. + +During the GA4GH Connect April 2023 meeting the maturity model was discussed at length and the following +proposal was presented for instance and class GKS identifiers. + +.. image:: ../images/2023-connect-gks-identifier-proposal.png + :alt: GKS Identifiers Proposal from 2023 April Connect Session + :align: center + +As an example, the Github JSON Schema URL ($id) for the VRS 2.0.0 Allele is: + +.. code-block:: json + + { + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://w3id.org/ga4gh/schema/vrs/2.0.0/json/Allele", + ... + } + +During the **release and versioning** discussion at the GA4GH Connect April 2023 meeting the proposal +delved into the idea of including the major version number in the VRS identifier itself. Proponents of +this approach cited concern for the change in digests (and their derived identifiers) between major +versions of the same VRS object, which would become clearly visible in the identifier itself if the +major version was included. + +Opponents of this approach argued that new identifiers would be required for every type of VRS object +for every major version release. Meaning that even if a given type of object has no change that would +result in a new digest, a new identifier would still be required for the new major version. + +After much discussion, the decision was made to NOT include the major version number in the VRS identifier +itself. Therefore, the :ref:`identifier-construction` does NOT contain the version number, resulting in +the following syntax: + +**CURIE namespace resolution** + +.. code-block:: + + ga4gh:VA.Oop4kjdTtKcg1kiZjIJAAR3bp7qi4aNT + +**URI Syntax** + +.. code-block:: + + https://w3id.org/ga4gh/vrs/VA.Oop4kjdTtKcg1kiZjIJAAR3bp7qi4aNT + diff --git a/docs/source/appendices/index.rst b/docs/source/appendices/index.rst index 82ac3a98..18970e52 100644 --- a/docs/source/appendices/index.rst +++ b/docs/source/appendices/index.rst @@ -9,4 +9,5 @@ Appendices ga4gh_identifiers resource_identifiers truncated_digest_collision_analysis + design_decisions glossary diff --git a/docs/source/appendices/maturity_model.rst b/docs/source/appendices/maturity_model.rst index 8658b956..d677fe9d 100644 --- a/docs/source/appendices/maturity_model.rst +++ b/docs/source/appendices/maturity_model.rst @@ -131,7 +131,7 @@ Product Versioning and Releases Versions are used to identify releases of the entire specification, not to individual product features. Technical specification development is intrinsically linked to policy surrounding major and minor version -identification, which follow [semantic versioning v2](https://semver.org) practices for API versioning. +identification, which follow `semantic versioning v2 `__ practices for API versioning. Versioning examples ################### @@ -167,10 +167,10 @@ $$$$$$$$$$$$$$$$$$$$$$$ - Addition of implementation guidance, tests, or other supporting product features that do not directly affect data compatibility -Versioning of approved GA4GH standards additionally follow the procedures for [GA4GH Product Updates](https://www.ga4gh.org/our-products/development-and-approval-process/#section_7). +Versioning of approved GA4GH standards additionally follow the procedures for `GA4GH Product Updates `__. Specifically, advancement of data classes to the trial use or normative levels must be accompanied by a minor release increment, and therefore may only be included in a release following an appropriate community -and PRC consultation process ([GA4GH Product Development 32](https://www.ga4gh.org/our-products/development-and-approval-process/#section_7:~:text=32.%20Public%20comment,reduced%20or%20omitted.)). +and PRC consultation process (`GA4GH Product Development 32 `__). Releases ######## @@ -196,7 +196,7 @@ These pre-release labels are appended to the major, minor, and patch components a pre-release version following the SemVer ..-