Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce inference in cattle outbreak #66

Merged
merged 2 commits into from
Jun 27, 2024

Conversation

joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Jun 27, 2024

Description of proposed changes

Resolves #65 by

  1. setting divergence as default view
  2. removing augur traits so stop inference of location

Checklist

  • Checks pass
  • Trial build

Switching to divergence view as the default view since the data from
SRA does not include precise dates.

Part of #65
@trvrb
Copy link
Member

trvrb commented Jun 27, 2024

@joverlee521: I don't want to completely remove ancestral trait inference. The Texas origin is important. I'm working on a strategy here that will keep inference for known locations but won't infer locations for SRA sequences. Could you revert 828f977 and I'll append to this PR?

@joverlee521 joverlee521 force-pushed the reduce-inference-in-cattle-outbreak branch from 828f977 to bd425cb Compare June 27, 2024 22:35
This commit as a simple --sampling-bias-correction that inflates uncertainty in traits inference. This causes SRA tips to go from highly confident in location to not at all confident in location. For example A/CATTLE/USA/24-013021-002/2024 goes from 68% to 20% confident in Ohio location.

This _should_ cause all this tips to colored gray on the tree while leaving the deeper nodes and early GenBank tips appropriately colored (which is exactly what we wanted). However, this has revealed a bug in Auspice in which Auspice currently colors tips according to most likely state unlike the logic it has for internal branches which are grayed out as entropy in state confidence increases.

However, I think this commit is still appropriate to merge in avian-flu even if we're waiting on Auspice update to properly display.
@trvrb
Copy link
Member

trvrb commented Jun 27, 2024

I tried a couple different approaches before landing on the commit above:

  1. Modify the traits rule to produce an inferred labeled as division_inferred. Update Auspice config to have separate division (direct from metadata) and division_inferred (via traits) colorings. However, because lat/longs don't have division_inferred this made map not work appropriately.

  2. Modify the traits rule to swap ? in the input metadata to Unknown location. However, this was causing funkiness where deep internal states were assigned as Unknown location due to there being so many tips with Unknown location. It's possible that --weights could have corrected for this but I decided not to explore further.

Instead I went with:

  1. Add a heavy amount of uncertainty via --sampling-bias-correction. This causes A/CATTLE/USA/24-013021-002/2024 to go from 68% to 20% confident in Ohio location, while leaving the deeper nodes adjacent to resolved GenBank samples to remain confident.

As expressed in commit message, this revealed a bug in Auspice in which Auspice currently colors tips according to most likely state unlike the logic it has for internal branches which are grayed out as entropy in state confidence increases. However, I think this commit is still appropriate to merge in avian-flu even if we're waiting on Auspice update to properly display.

We could still decide to implement (1) in addition to (3) to get an observed only geo coloring, but I'd like to see how the Auspice fix looks before doing so.

@trvrb
Copy link
Member

trvrb commented Jun 27, 2024

I believe we can merge this PR as it stands, update live build and wait for Auspice update before doing further work here.

@trvrb trvrb merged commit 5763ebb into master Jun 27, 2024
6 checks passed
@trvrb trvrb deleted the reduce-inference-in-cattle-outbreak branch June 27, 2024 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce inference of unknown data/location for SRA sequences
2 participants