Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove neo.owl from go-lego.owl used by minerva #260

Closed
5 of 6 tasks
goodb opened this issue Nov 13, 2019 · 19 comments
Closed
5 of 6 tasks

remove neo.owl from go-lego.owl used by minerva #260

goodb opened this issue Nov 13, 2019 · 19 comments

Comments

@goodb
Copy link
Contributor

goodb commented Nov 13, 2019

  • When minerva loads a GO-CAM RDF model, dynamically retrieve and add in upper-level type information for all the gene products in the model such that these are accessible to the Arachne reasoner and the shex validator. When reasoning and validation complete, remove these from the model.
  • Deploy new minerva code everywhere (merge dev)
  • Change the noctua-golr loader to load both go-lego.owl and neo.owl as the latter will no longer be part of the former (action to @kltm )
  • Change go-lego.owl to remove the neo.owl import.
  • restart dev minerva to allow testing
  • Test reasoning and validation

Per October 2019 discussions:

geneontology/neo#47
https://docs.google.com/document/d/1rOXCoJ-ZKGCGQ_0LpJOlsVVfVyKgUez52Kdq_VnUxEk/edit?ts=5d7ff0d8#
https://docs.google.com/document/d/1h_vnzkP94YC5l3ZmxKyZVOuTzIHIsHRsOLtBSGyB25k/edit?pli=1
In summary, we want to remove the dependency of loading classes from all gene products from all species into neo for access in minerva. This doesn't scale up very well. To accomplish this, the current plan is to move the class information (including label and main upper level type, e.g. protein, for gene product instances) out of neo and into GOLR alone. Minerva will then access this information dynamically as it builds and reasons over models.

@goodb
Copy link
Contributor Author

goodb commented Nov 13, 2019

noting that the GOLR instance to be used for development here is http://noctua-golr.berkeleybop.org/

Example gene product ids and parent list for a human protein and yeast gene:

UniProtKB:P32241-1
http://purl.obolibrary.org/obo/CHEBI_51143 nitrogen molecular entity
http://purl.obolibrary.org/obo/CHEBI_33839 macromolecule
http://purl.obolibrary.org/obo/CHEBI_33256 primary amide
http://purl.obolibrary.org/obo/CHEBI_33675 p-block molecular entity
http://purl.obolibrary.org/obo/CHEBI_36963 organooxygen compound
http://purl.obolibrary.org/obo/CHEBI_33579 main group molecular entity
http://purl.obolibrary.org/obo/CHEBI_32988 amide
http://purl.obolibrary.org/obo/CHEBI_25806 oxygen molecular entity
http://purl.obolibrary.org/obo/CHEBI_33285 heteroorganic entity
http://purl.obolibrary.org/obo/CHEBI_33582 carbon group molecular entity
http://purl.obolibrary.org/obo/CHEBI_36357 polyatomic entity
http://purl.obolibrary.org/obo/CHEBI_37622 carboxamide
http://purl.obolibrary.org/obo/CHEBI_23367 molecular entity
http://purl.obolibrary.org/obo/BFO_0000030 object
http://purl.obolibrary.org/obo/CHEBI_50047 organic amino compound
http://purl.obolibrary.org/obo/CHEBI_24431 chemical entity
http://purl.obolibrary.org/obo/CHEBI_50860 organic molecular entity
http://purl.obolibrary.org/obo/CHEBI_33302 pnictogen molecular entity
http://purl.obolibrary.org/obo/CHEBI_33304 chalcogen molecular entity
http://purl.obolibrary.org/obo/CHEBI_35352 organonitrogen compound
http://purl.obolibrary.org/obo/CHEBI_36962 organochalcogen compound
http://purl.obolibrary.org/obo/CHEBI_33694 biomacromolecule
http://purl.obolibrary.org/obo/CHEBI_33695 information biomacromolecule
http://purl.obolibrary.org/obo/CHEBI_36080 protein
http://purl.obolibrary.org/obo/CHEBI_16670 peptide
http://purl.obolibrary.org/obo/PR_000000001 protein

SGD:S000005952
http://purl.obolibrary.org/obo/BFO_0000030 object
http://purl.obolibrary.org/obo/CHEBI_33839 macromolecule
http://purl.obolibrary.org/obo/CHEBI_33582 carbon group molecular entity
http://purl.obolibrary.org/obo/CHEBI_36357 polyatomic entity
http://purl.obolibrary.org/obo/CHEBI_24431 chemical entity
http://purl.obolibrary.org/obo/CHEBI_33694 biomacromolecule
http://purl.obolibrary.org/obo/CHEBI_50860 organic molecular entity
http://purl.obolibrary.org/obo/CHEBI_33695 information biomacromolecule
http://purl.obolibrary.org/obo/CHEBI_33675 p-block molecular entity
http://purl.obolibrary.org/obo/CHEBI_23367 molecular entity
http://purl.obolibrary.org/obo/CHEBI_33579 main group molecular entity

@kltm @cmungall @balhoff my thinking here is to give just the most specific parent as the type of any instances. e.g. instances of UniProtKB:P32241-1 get rdf:type : CHEBI_36080 (protein) and instances of SGD:S000005952 get rdf:type CHEBI_33695 (information biomacromolecule)

Can you think of any other types that we will need to look for and cover here?

I see some complexes in the load, e.g. https://www.ebi.ac.uk/complexportal/complex/CPX-900 , but these are typed just like genes. They would default to being called information biomacromolecules CHEBI_33695

@goodb
Copy link
Contributor Author

goodb commented Nov 14, 2019

@balhoff and @kltm could you take a look at this when you have time? See recent commit. I think it might be done.. I tested it with a local noctua that had go_lego loaded without the neo import and it worked as I wanted. Reasoner worked, the gene product parent types are added and saved for genes/proteins.

Certainly some optimizations could be done, though it seems fast enough now. e.g. there are two requests to golr per incoming instance when there could be one. What else should I check on?

Here is an example OWL file generated with this on.
example_go_cam_lego_lite.txt

And screenshot.
Screen Shot 2019-11-14 at 9 24 23 AM

@kltm
Copy link
Member

kltm commented Nov 14, 2019

Discussed with @goodb after software call, with adjustments made to initial working list. There is probably more for conversation there, especially the final item concerning SynGO and how we want to treat exotic vs endemic models (e.g. categories as required add-in or not).

goodb added a commit that referenced this issue Nov 14, 2019
goodb pushed a commit that referenced this issue Nov 21, 2019
Can run from the minerva command line as it sits.  Most likely not useful code once we are done here.  Could run it off a branch or merge it in and consider removing it later.  or just leave it in.. not harmful.   example from minerva-cli

--update-gene-product-types -i /Users/bgood/Documents/GitHub/noctua-models/models/ -o /Users/bgood/test/typed_master/ -n /Users/bgood/gocam_ontology/neo_full_nov20_2019.owl -c /Users/bgood/gocam_ontology/catalog-no-import.xml
@goodb
Copy link
Contributor Author

goodb commented Dec 6, 2019

Changing first requirement here to a dynamic approach that never caches the rdf:type upper-type in the models that are displayed and saved, but only adds them as needed prior to reasoning.

Was - - [ ] When a gene product instance is created in Noctua, add the high level rdf:type (protein or information biomacromolecule) from GOLR .
is now - [ ] When minerva loads a GO-CAM RDF model, dynamically retrieve and add in upper-level type information for all the gene products in the model such that these are accessible to the Arachne reasoner and the shex validator. When reasoning and validation complete, remove these from the model.

Important that the main client interface does not see any changes as a result of this.

@goodb goodb self-assigned this Dec 6, 2019
@goodb
Copy link
Contributor Author

goodb commented Dec 10, 2019

@kltm when you are able, I would like to test #265 on dev. I believe it resolves our thanksgiving issue as well as reducing the number of other tasks on this issue list. If my local testing is reflective of dev and master server states, I think we are ready to tick down the rest of the checkboxes here.

@kltm
Copy link
Member

kltm commented Dec 17, 2019

Slowed down with the Alliance meeting. Now on dev.

@vanaukenk
Copy link

I tested the UI on noctua-dev this morning with entries for several different species and entity types.
All looks fine in the display on the form and graph editors.

image

image

Note: will need to discuss validation errors based on entity types values, however. Various aspects of that discussion are already in multiple tickets on geneontology/go-shapes.

@kltm
Copy link
Member

kltm commented Jan 10, 2020

Noting that the version @vanaukenk tested on dev was: 95bc102
The current production version is: 5ae8bf1

@goodb
Copy link
Contributor Author

goodb commented Mar 25, 2020

@kltm although the mechanism evolved a bit, I think the task list on top remains accurate.

@goodb goodb changed the title conversion to NEO lite and GOLR remove neo.owl from go-lego.owl used by minerva Mar 25, 2020
@kltm
Copy link
Member

kltm commented Mar 26, 2020

@goodb Okay, but my list there seems a little garbled to me now at this point, especially as I intended it to be ordered and I think we've accomplished some of these out of order already. It would be nice to go over this with you tomorrow either on the call or later on.

@goodb
Copy link
Contributor Author

goodb commented Mar 31, 2020

@kltm trying to summarize "the plan" below. I think it takes three key files to make a system that will work for all the models in dev (and thus also master), including reactome, and will not pollute the global type-ahead system with reactome entities.

Pipeline produces:

  • go-lego.owl = merged, pre-reasoned, collection of the ontologies listed in go-lego-edt.ofn (notably including neo.owl):
    -- purpose: provides all content for the GOLR type-ahead term search service
  • go-lego-no-neo-with-reacto.owl = go-lego.owl without neo.owl with reacto.owl added
    -- purpose: provide Minerva with the ability to reason efficiently while allowing neo to grow
  • go-lego-with-reacto.jnl = blazegraph RDF version of go-lego.owl with reacto.owl added
    -- purpose: allow minerva to function without loading neo.owl

Services:
GOLR - purpose: type ahead search over ontologies (and genes)
- input: go-lego.owl
MINERVA - purpose: web service backing the Noctua application, command-line client for go-cam processing
- input: go-lego-no-neo-with-reacto.owl , go-lego-with-reacto.jnl

@goodb
Copy link
Contributor Author

goodb commented Mar 31, 2020

@kltm if there were a way to filter GOLR requests to eliminate the reactome entities that would simplify things a bit (just add reacto to the import list that builds go-lego and no more need for -with-reacto everywhere). We would still need one OWL ontology that does not contain neo for minerva.

@kltm
Copy link
Member

kltm commented Apr 1, 2020

@goodb Does this sound right to you?

Current ontologies being made:

  • go-lego.owl - (merged) product of main pipeline
  • reacto.owl - product of main pipeline

Current ontology journals (for minerva):

  • Current: blazegraph-go-lego.jnl.gz

Ontologies we need:

  • go-lego-no-neo-with-reacto.owl (likely in main pipeline to sync with rest)

Ontology journals we need (for minerva):

  • blazegraph-go-lego-with-reacto.jnl.gz (replacing blazegraph-go-lego.jnl.gz in issue-35-neo-test branch)

@goodb
Copy link
Contributor Author

goodb commented Apr 1, 2020

Assuming there isn't a way to deal with excluding reacto entities at the golr level, yes, this is it.

@kltm
Copy link
Member

kltm commented Apr 1, 2020

@goodb Okay, what I'm trying to balance here is the simplest and least obscure product set with adding a set of required synchronized changes of client code (filtering reactome).

Currently, we filter with 'regulates_closure', 'CHEBI:23367'. In addition, we could add a namespace filter. While we could setup the experiment, with what you know, would these be enough to keep reacto items out?

Assuming we went the GOlr route, that would only save us the replacement of the journal above, correct? We'd still need some form of the other data products, right? It seems like as far as pipeline complexity goes, we save little. However, it does seem to make the products a little less stilted...

@goodb
Copy link
Contributor Author

goodb commented Apr 2, 2020

I'm not really clear on what the regulates closure entails. CHEBI:23367 would include all of reacto as it stands. We could do a lot to tun it up to match a filter if we want as it isn't used for anything aside from these models.

Given an uncertain future for reacto, I'd lean towards building the react-specific products rather the tuning up golr to avoid it for now.

@kltm
Copy link
Member

kltm commented Apr 2, 2020

From the conversation on today's software call, let's go ahead and take the "data" approach, producing the specific data products needed (rather than trying to fudge the clients). Producing the products should probably occur on geneontology/pipeline, with the issue-35-neo-test branch. geneontology/pipeline#35

kltm added a commit to geneontology/pipeline that referenced this issue Apr 4, 2020
@goodb
Copy link
Contributor Author

goodb commented Apr 18, 2020

Related - pr to make reacto in the go makefile geneontology/go-ontology#19288.

@goodb
Copy link
Contributor Author

goodb commented May 9, 2020

Closing. Only remaining thing to do is to get it running on master. believe that is its own issue.

@goodb goodb closed this as completed May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants