From 4f60ced103b99af0a2366691a9475778953644aa Mon Sep 17 00:00:00 2001 From: Trevor Bedford Date: Thu, 16 Jan 2025 16:17:42 -0800 Subject: [PATCH 1/2] Update location threshold This commit updates the location threshold for both clades and Pango lineages to be 1000 sequences from a location in the previous 150 days. The move from 30 days to 150 days makes it so that the location inclusion is based on the full analysis window. This is nicely simplifying. I also removed the previous prune_seq_days. This had existed because exciting variants were previously submitted to databases faster. This is no longer the case. I also dropped the minimum number of sequences to break out a clade from 2000 to 500. I've updated the text on the viz site correspondingly --- config/config.yaml | 24 ++++++++++-------------- viz/src/App.jsx | 4 ++-- 2 files changed, 12 insertions(+), 16 deletions(-) diff --git a/config/config.yaml b/config/config.yaml index e79024d..898d8c9 100644 --- a/config/config.yaml +++ b/config/config.yaml @@ -16,19 +16,17 @@ prepare_data: nextstrain_clades: global: included_days: 150 - location_min_seq: 50 - location_min_seq_days: 30 + location_min_seq: 1000 + location_min_seq_days: 150 excluded_locations: "defaults/global_excluded_locations.txt" - prune_seq_days: 12 - clade_min_seq: 2000 + clade_min_seq: 500 clade_min_seq_days: 150 pango_lineages: global: included_days: 150 - location_min_seq: 150 - location_min_seq_days: 30 + location_min_seq: 1000 + location_min_seq_days: 150 excluded_locations: "defaults/global_excluded_locations.txt" - prune_seq_days: 12 clade_min_seq: 1 clade_min_seq_days: 150 collapse_threshold: 350 @@ -36,19 +34,17 @@ prepare_data: nextstrain_clades: global: included_days: 150 - location_min_seq: 50 - location_min_seq_days: 30 + location_min_seq: 1000 + location_min_seq_days: 150 excluded_locations: "defaults/global_excluded_locations.txt" - prune_seq_days: 12 - clade_min_seq: 2000 + clade_min_seq: 500 clade_min_seq_days: 150 pango_lineages: global: included_days: 150 - location_min_seq: 150 - location_min_seq_days: 30 + location_min_seq: 1000 + location_min_seq_days: 150 excluded_locations: "defaults/global_excluded_locations.txt" - prune_seq_days: 12 clade_min_seq: 1 clade_min_seq_days: 150 collapse_threshold: 350 diff --git a/viz/src/App.jsx b/viz/src/App.jsx index 6412d8a..d828a0a 100644 --- a/viz/src/App.jsx +++ b/viz/src/App.jsx @@ -31,7 +31,7 @@ function App() {

Each line represents the estimated frequency of a particular clade through time. Equivalent Pango lineage is given in parenthesis, eg clade 23A (lineage XBB.1.5). Only - locations with more than 50 sequences from samples collected in the previous 30 days are + locations with more than 1000 sequences from samples collected in the previous 150 days are included. Results last updated {mlrCladesData?.modelData?.get('updated') || 'loading'}.

{/* surrounding div(s) used for static-images.js script */} @@ -54,7 +54,7 @@ function App() {

Each line represents the estimated frequency of a particular Pango lineage through time. Lineages with fewer than 350 observations are collapsed into parental lineage. Only - locations with more than 150 sequences from samples collected in the previous 30 days are + locations with more than 1000 sequences from samples collected in the previous 150 days are included. Results last updated {mlrLineagesData?.modelData?.get('updated') || 'loading'}.

From be4ec05d88754bacc47a6c32579f182bd058aa19 Mon Sep 17 00:00:00 2001 From: Jover Lee Date: Thu, 16 Jan 2025 16:26:47 -0800 Subject: [PATCH 2/2] README: remove outdated config description Removing instead of updating to match new params so that we don't have to maintain it. The location and clade thresholds are explained in plain language in `viz/src/App.jsx`. --- README.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/README.md b/README.md index 3d4fd72..2998bc1 100644 --- a/README.md +++ b/README.md @@ -96,14 +96,6 @@ The current available options for `geo_resolutions` are The `prepare_data` params in `config/config.yaml` are used to subset the full case counts and clades counts data to specific date range, locations, and clades. -As of 2023-04-04, the config for the automated pipeline is set to only include data from: - -- the past 150 days - - excluding sequences from the last 12 days since they may be overly enriched for variants -- locations that have at least 500 sequences in the last 30 days - - excluding locations specifically listed in `defaults/global_excluded_locations.txt` -- clades that have at least 5000 sequences in the last 150 days - ### Model configurations The specific model configurations are housed in separate config YAML files or each model.