Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RKI transform #490

Merged
merged 2 commits into from
Jan 30, 2025
Merged

Fix RKI transform #490

merged 2 commits into from
Jan 30, 2025

Conversation

joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Jan 30, 2025

Description of proposed changes

Fix and update the RKI transformation for pango_lineage.

  • Default to '?' if the RKI data does not include lineages to avoid the IndexError.
  • Filter to PANGOLIN_LATEST to get the latest lineage assignment.

Related issue(s)

Resolves #489
Resolves #478

Checklist

Unclear if there is an upstream data issue, but the latest GenBank/open
workflow failed during the `transform_rki_data` because of the lack of
lineages. Default to '?' if the RKI data does not include lineages to
avoid the IndexError.

Resolves <#489>
Based on <#476 (comment)>,
we can filter for `PANGOLIN_LATEST` to get the latest lineage assignment.

If none of the lineages are marked as `PANGOLIN_LATEST`, then just use
the first one in the list since this was the behavior before the change.
If there are multiple `PANGOLIN_LATEST` lineages, then just use the
first one and output a warning. I've removed the assertion because this
should not block the whole ncov-ingest workflow.

Resolves <#478>
@joverlee521
Copy link
Contributor Author

Will merge tomorrow before the automated workflow runs tomorrow if the trial run completes successfully.

@joverlee521 joverlee521 merged commit 74bb91c into master Jan 30, 2025
10 checks passed
@joverlee521 joverlee521 deleted the fix-rki-transform branch January 30, 2025 17:50
@jameshadfield
Copy link
Member

Nice one! I was going to say it's great that the assertion we added proved useful, but I actually think the error was an IndexError from the line immediatly before the assertion?

@joverlee521
Copy link
Contributor Author

Nice one! I was going to say it's great that the assertion we added proved useful, but I actually think the error was an IndexError from the line immediatly before the assertion?

Ah, yeah. If the assertion was added to the line before entry['pango_lineage'] = lineage_dict[0]['lineage'] then this would have been an AssertionError instead of the IndexError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GenBank/open failure during transform_rki_data RKI: filter for PANGOLIN_LATEST
2 participants