Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dumpers and loaders: Separate concept of syntax and datamodel #687

Open
cmungall opened this issue Dec 6, 2023 · 1 comment
Open

Dumpers and loaders: Separate concept of syntax and datamodel #687

cmungall opened this issue Dec 6, 2023 · 1 comment

Comments

@cmungall
Copy link
Collaborator

cmungall commented Dec 6, 2023

Currently the Dumper.dump() method accepts a syntax argument, this is sometimes called format in other frameworks, e.g. robot.

The problem here is that this is frequently ambiguous, especially in the context of OAK which is pluralistic and supports multiple ways of modeling and serializing ontologies, e.g. owl mapped to rdf serialized as rdf/xml; skos (natively rdf), serialized to turtle.

This is compounded with loaders, where we might want to use a suffix to guess the underlying model and serialization and choose the appropriate parser. Unlike the owlapi, rdflib requires the format of rdf to be known in advance (and in my experience this is a good thing - there is a lot of confusion caused by the owlapi cycling through multiple parsers and models).

Examples:

  • .owl clearly means the OWL data model, in the OBO universe this is conventionally mapped to RDF and serialized as RDF/XML, but outside this universe the serialization is more typically Turtle, and may not be an RDF serialization at all
  • .xml means OWL/XML as far as the OWLAPI is concerned, but rdflib uses this to mean RDF/XML (which is very different!)
  • .rdf might typically mean some kind of RDF serialization of OWL, but SKOS is valid for the ontology-like artefacts in OAK and it can also be serialized as .rdf. Same for the extended RDFS-like model used by schema.org. On top of this, again, we don't know if this means rdf/xml, rdf/turtle, n-quads...

On top of this, there are various aliases (e.g ttl vs turtle). Frameworks like pyoxigraphs use mime types to try and enforce some kind of standard but this seems overkill

Proposal:

  • Loaders and dumpers take an optional model argument
  • If absent, this is inferred using syntax and sensible defaults
  • We encourage (but do not mandate) bipartite file suffixes to reduce ambiguity and facilitate default arguments

The syntax for bipartitle syntaxes would be .model.syntax. For example, .owl.ttl, .skos.nt

There is a potential argument for a tripartite model here, because of owl mapping to rdf, and to reduce the ambiguity of .owl.xml. However, this is likely overkill.

Something like

Unambiguous OWL syntaxes

  • .owx
  • .ofn
  • .omn

Model optional, if specified, MUST be owl

OWL layered on RDF

  • owl.ttl (aka turtle)
  • owl.nt (aka ntriples)
  • owl.rdfxml (maps to xml syntax in rdflib)
  • owl.jsonld

Non-canonical

  • .owl.xml - discouraged, but default interpretation is .owx
  • .owl.rdf - discouraged, but default interpretation is OWL layered on RDF and serialized as Turtle
  • .owl - discouraged, but default interpretation is OWL layered on RDF and serialized as Turtle(?) (or: RDF/XML, as per OBO)

SKOS

  • .skos.{syntax}

As per OWL layered on RDF

OBO Format and OBOGraphs

TODO

Aliases

TBD: favor shorter form (i.e. suffix) as the canonical format name?

  • ttl = turtle
  • rdfx = rdfxml
  • nt = ntriples
@balhoff
Copy link
Member

balhoff commented Dec 6, 2023

.owl - discouraged, but default interpretation is OWL layered on RDF and serialized as Turtle(?) (or: RDF/XML, as per OBO)

Most packages I'm familiar with assume that .owl is RDF/XML (e.g., Jena, Blazegraph).

cmungall added a commit that referenced this issue Dec 7, 2023
cmungall added a commit that referenced this issue Dec 21, 2023
cmungall added a commit that referenced this issue Mar 12, 2024
cmungall added a commit that referenced this issue Mar 13, 2024
cmungall added a commit that referenced this issue Mar 14, 2024
* First pass at oboformat and obographs conformance suite.

See

- owlcollab/oboformat#146
- geneontology/obographs#106

Note that that may move from the OAK repo in the longer term.

* Allow choice of format when exporting OWL.
Addung missing features to obo conversion

* lint

* lint

* Lint.
Removing prints.

* Added format_utilities.py -- see #687

* lint

* Fixing spelling mistakes

* dumper

* Fixed tests and dumper

---------

Co-authored-by: Nico Matentzoglu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants