Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blazegraph for go lego experiment #297

Merged
merged 30 commits into from
Mar 23, 2020
Merged

Conversation

goodb
Copy link
Contributor

@goodb goodb commented Mar 22, 2020

This contains a completed (I hope..) migration to the use of a blazegraph instance to handle all tbox-related requirements for minerva except for OWL reasoning. This seems to work great in local testing.. Looking forward to trying it out on noctua-dev. There are a few things to be aware of:

  1. For Minerva server startup and for the minerva-cli, there is a new parameter to specify the location of the journal, e.g. :
    --ontojournal /tmp/blazegraph-lego.jnl
    A journal can be provided at that location or, if nothing is there, it will automatically download it from
    http://skyhook.berkeleybop.org/issue-35-neo-test/products/blazegraph/blazegraph-go-lego.jnl.gz
    and will save it at the same location. Subsequent startups will just read the file in. Clearly we will want to change that when the pipeline is stable. Add the NEO into the main pipeline pipeline#35

  2. This no longer uses noctua-golr to get labels for ontology entities (including genes) to show in the Noctua editor. It pulls them from the same journal. This results in dramatically faster model loading, at least from home.. We will want to watch for any complaints about labeling differences.

  3. neo.owl needs to be a part of blazegraph journal here. But, it does not need to be a part of the OWL version of go-lego that gets loaded into Minerva and in fact should not be loaded as it will both slow and I can imagine confuse the server.. I handle this locally with a catalog that redirects neo.owl to an empty file. We should decide how we want to do this with regard to the pipeline and its products.

  4. This bundles in a bunch of work on the model search API, including:

goodb added 20 commits March 6, 2020 12:35
Using this to load go-lego-merged into a blazegraph journal.
more testing needed and need a way to gain access to journal e.g. from travis or anywhere else that can't build one from go_lego.owl.
Have also added patch to ontology using to load tbox..  This version is better..  most tests are working now.  Still don't know why previous commit was failing.
This would have caught a bunch of errors in the currently publicly release merged go_lego build.  Its a decent start on error checking for that process.  Not sure how best to wire it or something like it into the pipeline.
Identified the conversion of the OWL model into the json structure as by far the slowest part of the whole thing.  Tricky recursive function to blame.  Adding the reasoner appears to slow things down a lot but in fact, its just because it gives the renderer more work to do.
no reasoning:
2020-03-16 16:56:44,438 INFO  (JsonOrJsonpBatchHandler:218) model rendering is slowing things down...
2020-03-16 16:56:49,175 INFO  (JsonOrJsonpBatchHandler:220) See...

with reasoning
2020-03-16 16:57:00,648 INFO  (JsonOrJsonpBatchHandler:218) model rendering is slowing things down...
2020-03-16 16:57:48,679 INFO  (JsonOrJsonpBatchHandler:220) See...
Also added code to download the journal if it is not present at the specific location given to the constructor.
currently always set on.
builds an index on startup and uses that to do the search
need to add an update method to update the index when models are saved
e.g.
search/?exactdate=2020-02-07
search/?date=2018-08-20&dateend=2019-12-02
search/?date=2018-08-20   (no end so anything after)
e.g. :6800/taxa/

returns {"taxa":["NCBITaxon:9606","NCBITaxon:7955","NCBITaxon:6239","NCBITaxon:559292"]}

note its a different root from /search/  its /taxa/
calls for labels to noctua.golr.org were proving very slow and unpredictable.  Replaced these with label lookups on the blazegraph tbox graph service.  It works much faster.  And its less confusing.
@goodb goodb requested review from balhoff and kltm March 22, 2020 23:01
goodb added 8 commits March 22, 2020 16:10
could probably work around this to do client things that don't require it.
…interest

as they could reflect differences between the current ontology and label getting set up and the previous.
should actually delete this and all external lookup service things.
was failing do to slight differences in subclass closures (getting a couple more parents now than during previous tests.)
…ing disposed and adding in a path for the go-lego journal
@kltm
Copy link
Member

kltm commented Mar 23, 2020

@goodb I'm a little confused about the practical effects of 3 above.
A typical minerva restart cycle looks like:

[stop minerva-dev]
./node_modules/.bin/gulp batch-minerva-destroy-journal
./node_modules/.bin/gulp batch-minerva-create-journal
[start minerva-dev]

It sounds like 1 will have a natural fallback, so 3 may be the only thing that needs to change for the moment. Currently, minerva restarts with our doctored file:///tmp/go-lego.owl, what should that value now be for startup on dev?

@goodb
Copy link
Contributor Author

goodb commented Mar 23, 2020

@goodb I'm a little confused about the practical effects of 3 above.
A typical minerva restart cycle looks like:

[stop minerva-dev]
./node_modules/.bin/gulp batch-minerva-destroy-journal
./node_modules/.bin/gulp batch-minerva-create-journal
[start minerva-dev]

It sounds like 1 will have a natural fallback, so 3 may be the only thing that needs to change for the moment. Currently, minerva restarts with our doctored file:///tmp/go-lego.owl, what should that value now be for startup on dev?

Could you share that go-lego.owl file with me just to make sure we are on the same page? its been a while. I think it will work the way it is set up now but want to make sure.

To facilitate easy and consistent restarts we should probably move controls for ontology loads, catalogs etc. into the configuration file.

One option I see is to make 2 merged go-lego.owl builds, one with neo and one without. Have the purl for go-lego point to the one that is lacking neo. Then minerva wouldn't have to do anything different than it is doing now and anyone that tried to open a go-cam ttl file might stand a chance. The version that includes neo (e.g. call it go-lego-neo.owl ...) would only be used by the pipeline to populate the blazegraph journal and the solr index.

@kltm
Copy link
Member

kltm commented Mar 23, 2020

@goodb The go-lego.owl that I've been using is: https://gist.github.com/kltm/5ac3a40f321c79efc00c8c8a17d62da2

What I read you saying above is that there would be a go-lego.owl and go-lego-sans-neo.owl, which would hopefully satisfy our current use cases. This would mean that we're no longer doing NEO-lite, but rather "go-lego-lite" (i.e. sans NEO) to meet these needs?
#260
geneontology/neo#47

@goodb
Copy link
Contributor Author

goodb commented Mar 23, 2020

That file might explain why the server over there seems slower than I would expect. Unless you have a catalog set up, its loading all of neo and it does not need to. Could you delete this line and give it a try? Other than that seems fine.
Import(http://purl.obolibrary.org/obo/go/noctua/neo.owl)

(I leave it all just like that and use the catalog just to have one way of doing this.. but it should be the same to just take it out and let the rest of the imports happen). This will produce a go-lego-lite..

I think go-lego-lite is indeed a better description of what I am proposing should happen. Maybe we should just stick with the long names go-lego-with-neo.owl and go-lego-without-neo.owl so we stand more of a chance of knowing what we are talking about.

@kltm
Copy link
Member

kltm commented Mar 23, 2020

Agreed that knowing what is any of these files at any point is troubling. I prefer to have the shorter names for the default concept if possible (i.e. go-lego.owl would be the conceptual default and everything), but would be happy with anything explicit as well. I'm happy to defer to you, chris, and jim for best naming practices.

I've deleted Import(http://purl.obolibrary.org/obo/go/noctua/neo.owl) from /tmp/go-lego.owl; this will be the file used on restart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants