-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding assembly_name attribute #1650
Conversation
Okay - we need @andrewkern's input here, I think. Observations:
So: should we have I feel like I'm over-complicating things, maybe? |
so in looking this over, do we need the assembly in the
so the the Having said that, I do like the idea of a test that could yield a warning if the user chose to say use a genetic map from an assembly that was mismatched with the genome, but maybe we should tag this as a TODO and put it off until after the release we are working on? |
I think I'm in the same place as @andrewkern here - this would be a nice thing to do, but let's put this off and not do it right now. There's a bunch of other things to think through (and implement) here, and we'd like to do that properly, but it's not urgent. I think the only things that we do want to do now are:
I've opened an issue for those. #1651 |
So - if you agree, let's close this, @silastittes? And, I'll close #1252 as well, but if you'd like to write up a summary in another issue for the future (which can refer to this PR), that'd be helpful? |
That sounds good to me! Regarding your two points.
The annotation files for PhoSin include the assembly nam (
Seems like we're good to close? |
Here's a draft solution for #1252
@grahamgower made a patch dealing with genetic map intervals that don't match that of the assembly. All described further in #719
I think the concern remains that there can be mismatches between assemblies and intervals along the assemblies (annotations and genetic maps).
I've started the process of adding the assembly names to the Annotation and GeneticMap classes, which would facilitate checking that the resources all belong on the same assembly.
It's hard to write explicit tests for this using the interval information, especially for annotations, because the tests likely won't fail if the annotation intervals are still within the bounds of the (incorrect) assembly. I think this largely comes down to humans being careful to check that the annotation/genetic maps match the assembly. Adding the attribute really just serves as a helpful reminder during the QC process to verify this, and do a liftover if it's wrong.
If this is the direction we want to go a few more things need to be done.
assembly_name
to stubs for generating new species in the catalogueOR
In this draft PR we can write simple checks like:
That could be added as a unit test.