-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include viruses and bacteria in NEO #77
Comments
Noting that this is ~350k line uncompressed GPI 1.2. Inclusion is like a 25% increase. |
I believe this could be hacked in like Lines 41 to 45 in 10210c1
Probably best to time the addition after the next update cycle. |
@pgaudet Would it be possible to get this as a compressed file from upstream like the others, for consistency and size? |
@alexsign Can you please provide this data as a compressed file like the others GPIs? Thanks, Pascale |
@pgaudet file is gziped now and will be compressed in the future releases |
@alexsign Great, thank you. |
@cmungall Part of the Makefile is running |
Suffix with the taxon ID for now. Obviously this is not super-friendly but we should progress incrementally. It's better to have some disambguator than autocomplete flooded by 1000 rplNs When we rewrite my hacky old scripts from perl to python we will fix the whole naming strategy |
@cmungall Clarifying work: I'll extend What about for ontology id then? |
@alexsign: thanks for doing this, awesome! Can you populate the properties field? I assume all should have @kltm: Should we not document this here: https://github.com/geneontology/go-site/blob/master/metadata/datasets/goa.yaml together with inclusion/exclusion criteria (I assume this is only SP) |
|
Just want to record the implications here:
This is fine, no discussion necessary, just recording this here in case there is any confusion later |
@cmungall Yes, I thought about that, but:
I'm happy to go a more "normal" path as well, but would need to move a little slower. |
well we could list all 6k taxa in the yaml, but I agree this is suboptimal
totally fair, let's just proceed for now |
…filter feature to script to fill in taxon id in some cases; work on #77
@cmungall Locally tested PR that may be able to close this issue here #79 . |
Currently running full post-merge test. |
@kltm - do we need to do any testing on the Noctua autocompletes? |
From a discussion w/ @cmungall yesterday, I wanted to try and get a file product that could be eyeballed. A major concern was that this could flood out other things (a 25% increase in size with ~350k entities). While I'm testing the product production now, we could defer rolling this out until there is somebody available to take a look at it live. |
It'd be good to have the Swiss-Prot curators test for ids they'd expect to curate, and I'm happy to do other id testing just in case. |
Where can this be tested? Is this on Noctua or on some test server? |
Just confirmed with @pmasson55 that the Swiss-Prot reviewed is OK (for the record also, in response to #77 (comment)) |
Created working branch https://github.com/geneontology/neo/tree/issue-80-new-virus-bacteria |
Talking to @vanaukenk , we'll be temporarily switching back to ecocyc to get a NEO release out before continuing work. |
This is now a dupe of #82 |
The file is here:
http://ftp.ebi.ac.uk/pub/contrib/goa/uniprot_reviewed_virus_bacteria.gpi
@kltm please let me know if you need more information.
Thanks, Pascale
The text was updated successfully, but these errors were encountered: