Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use weighted sampling for recent samples in global builds #1161

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Nov 16, 2024

This should have been a part of ae764bb but got lost in translation.

GISAID trial build links:

This should have been a part of "Use population-based weighted sampling
for global builds" (ae764bb) but got lost in translation.
@victorlin victorlin self-assigned this Nov 16, 2024
@victorlin victorlin marked this pull request as ready for review November 18, 2024 23:58
@victorlin victorlin requested a review from trvrb November 18, 2024 23:59
@victorlin
Copy link
Member Author

Merging without review since this seems like a trivial fix.

@victorlin victorlin merged commit 2f32f78 into master Dec 17, 2024
13 of 14 checks passed
@victorlin victorlin deleted the victorlin/weight-global-recent branch December 17, 2024 23:51
@corneliusroemer
Copy link
Member

corneliusroemer commented Dec 20, 2024

@victorlin I think this breaks things: the number of recent samples collapsed from ~4k to ~1.2k after the merge

Before:
Google Chrome Beta 2024-12-20 13 37 15
https://next.nextstrain.org/ncov/gisaid/global/1m@2024-12-17

After:
Google Chrome Beta 2024-12-20 13 37 26
https://next.nextstrain.org/ncov/gisaid/global/1m@2024-12-18

In retrospect this isn't a trivial change, I hope you agree, trial build would have been good :)

Reverting without review, hope that makes sense given this was merged without review and trial builds and people might be heading towards Christams.

@victorlin
Copy link
Member Author

Thanks for catching and addressing this @corneliusroemer. I added trial build links in the PR description but should have looked at the numbers before merging.

@victorlin
Copy link
Member Author

I've fixed the PR description to link to the commit that this addressed. Note that prior to ae764bb, the number of samples for gisaid/global/1m was ~2.7k. See before @2024-09-30 vs. after @2024-10-01. The difference is that ae764bb changed from region-level weights encoded via separate augur filter calls to a single augur filter call with uniform sampling across countries. This should have been paired with weighted sampling across countries, which is what this PR added. We're now back to what seems to be an incomplete shift towards country-level weights.

@corneliusroemer can you check whether 2024-09-30 or 2024-10-01 seems to make more sense? It would be interesting if you say 2024-10-01 – in that case we may want to consider removing weighted sampling for all 1m/2m builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants