Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rn/use root of tree as reference #33

Merged
merged 7 commits into from
May 29, 2024

Conversation

rneher
Copy link
Member

@rneher rneher commented May 12, 2024

Add inferred cattle outbreak root and gff to the repo, use these to infer ancestral sequences and mutations on the tree.

@rneher
Copy link
Member Author

rneher commented May 12, 2024

This LINK now loads nextclade with the H5 genome tree.

The real link would use the live build instead of the test build used here.

@rneher rneher requested a review from jameshadfield May 14, 2024 20:44
@jameshadfield
Copy link
Member

Thanks @rneher - I've been out most of this week so will get to this next week, but the changes look in the right direction.

I'm probably missing something -- why do we add the full genome A/goose/Guangdong/1/1996(H5N1) GenBank instead of the previous situation where we used the (same) one produced via rule join_genbank?

And how did you generate h5_cattle_genome_root.fasta?

With the upcoming changes to nextclade (nextstrain/nextclade#1455) we would no longer need the gff3 here, right?

@rneher
Copy link
Member Author

rneher commented May 17, 2024

yes, we wouldn't need the gff3 anymore.

regarding the root sequence... I did an ancestral reconstruction of the current dataset and pulled the sequence out of the nt_muts.json.

The genbank file is also the cattle root. need to label it better to avoid confusing it with the GD/1996 sequence. I just replaced the sequence in the genbank record with the cattle root.

@jameshadfield
Copy link
Member

The root-sequence changes here look great - thanks for adding them (I mistakenly thought the sequence was the same as A/goose/Guangdong/1/1996(H5N1) but as you've clarified we're just using that co-ordinate system but with an updated ancestrally-reconstruced sequence)

Given my understanding of nextstrain/nextclade#1455 we shouldn't need any of the files defined in the JSON here (and committed to the repo), which includes config/h5_cattle_genome_root.gff3 and config/h5_cattle_genome_root.fasta as nextclade should read all the necessary info from the JSON itself. I made and uploaded two datasets to /staging:

I expected Nextclade to treat them identically, but only your dataset works! (It works great, BTW.) Behind the scenes, the GFF3/fasta files aren't being fetched by nextclade (and nor should they be), so I don't know why my version isn't working

P.S. When I say my version isn't working, it's not working in 2 ways: when I drop on an example FASTA file the "suggesting" button spins forever and the "run" button is never enabled. Console error & screenshot:

Uncaught (in promise) Error: Internal Error: Tried to run minimizer search without minimizer index available. This is an internal issue, likely due to a programming mistake. Please report it to developers!
image

If I disable "suggest automatically" then I can run the data however I get the hard error:

When parsing raw Nextclade params: When parsing Auspice Tree JSON contents: When parsing JSON: missing field `files` at line 1 column 5359

@rneher
Copy link
Member Author

rneher commented May 24, 2024

in the previous set-up, the pathogen.json has mandatory field schemaVersion and files. files is necessary to tell the app what files to wait for and fetch (and error if they are missing). We should probably keep them mandatory to still allow providing additional files via url-params (or cli params) if required (like change logs, readme, etc).

But given that the whole meta.extension.nextclade is optional, an empty files:{} should be ok.

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented May 24, 2024

I made .meta.extensions.nextclade.pathogen.files field optional and applied some other fixes (see PR discussion), so now the second link should also work.

P.S. files: {} is no longer required, same as any of the fields in it. The checks for whether reference sequence is provided at least one of the possible ways is done on runtime now (I mean manually, rather just using struct field validation).

@jameshadfield
Copy link
Member

jameshadfield commented May 24, 2024

Thanks both! @rneher happy for this to be merged with or without the "files" field, but my preference would be to remove the .fasta and .gff3 files since they're no longer used by nextclade. Similarly for the clade_membership hack -- since Ivan's made this optional we could (should) rip this out of this PR.

@rneher
Copy link
Member Author

rneher commented May 29, 2024

I removed the unnecessary files, the clades hack, and additions to the json.

@rneher rneher merged commit b623aed into james/ref-gaps May 29, 2024
6 checks passed
@rneher rneher deleted the rn/use-root-of-tree-as-reference branch May 29, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants