Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geographic filtering #1

Open
trvrb opened this issue Feb 28, 2021 · 1 comment
Open

Geographic filtering #1

trvrb opened this issue Feb 28, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@trvrb
Copy link
Member

trvrb commented Feb 28, 2021

I really like the general direction and having the prototype is super helpful. My immediate main use case / interest is in situations like B.1.526 ala "New York variant" which is of interest due to constellation of spike mutations alongside recent rise in New York:

newyork-logistic-B 1 526

At the moment, there are 748 B.1.526 viruses in GISAID, but if we look at this in current Nextstrain we have just 2 in the North American build and just 37 in the SPHERES New York build. This makes accurate estimation of frequencies quite difficult.

So, in the "nextfrequencies" case, I'd want a JSON with enough granularity to filter to country USA or division New York and look at the frequency of B.1.526 (or the frequency of 253G+484K).

I would first think this could be accomplished by adding "region", "country" and "division" columns to the list of "traits" in the data/frequencies.json file and then exposing the ability in the app to both (1) "group by" and (2) "filter by" elements in "traits".

In this case you'd hope that the resulting JSON wouldn't be too bloated by splitting "haplotypes" based on geography. However, this seems doubtful as we have 1411 divisions currently categorized and you could easily imagine a >100X increase in JSON size from incorporating division.

Thus, it seems necessary to pre-build a series of JSONs filtered to various geographies. This would be quick and wouldn't be difficult to serve a number of different JSON files of the sort of:

- ncov_north-america_frequencies.json
- ncov_USA_frequencies.json
- ncov_New-York_frequencies.json

and then we just need an interface to select JSON file of interest.

And as discussed you could imagine an interface to compare multiple JSON files, which could "color by" the same type across multiple frequency panels. This could expose things like B.1.1.7 frequencies across multiple countries or could compare clade frequency predictions across different models. This approach is nice in that you can treat geographic "filtering" in the same fashion as different prediction models.

Does this seem like a reasonable approach?

@trvrb trvrb added the enhancement New feature or request label Feb 28, 2021
@trvrb
Copy link
Member Author

trvrb commented Feb 28, 2021

Also, I think it's pretty instructive to look at how the well thought out covidcg.org does things. This provides frequencies of clades, lineages and AAs across the entire genome (one at a time). It's fully expressive in terms of filtering by geography, but this means a very long list of check boxes for different regions, countries and admin divisions. I here how you can get to exactly the view you're interested in. However, it's too much clicking much of the time.

Here is S:253G across a few different states:

253G across states

Here is B.1.526 in New York:

B 1 526 in NY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant