Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minority mutations used for final assembly (metaspades, hybrid ONT-SR) #1350

Open
1 task done
lborcard opened this issue Aug 9, 2024 · 4 comments
Open
1 task done

Comments

@lborcard
Copy link

lborcard commented Aug 9, 2024

Description of bug

We performed an hybrid assembly using metaspades (ONT + SR), we had really huge coverage. Nonetheless, one of the final we discovered that there were 5 minority SNPs that were present in the final assembly instead of the majority ones.
SR_Assembly_igv_5332-5371

spades.log

SPAdesHybrid-TBEV-Neud.log

params.txt

params.txt

SPAdes version

SPAdes version: 3.15.3

Operating System

Linux-4.18.0-513.11.1.el8_9.x86_64-x86_64-with-glibc2.28

Python Version

Python version: 3.9.6

Method of SPAdes installation

container

No errors reported in spades.log

  • Yes
@asl
Copy link
Member

asl commented Aug 9, 2024

Can you run with read error correction disabled? (--only-assembler).

Overall, since you are running in metagenomic mode, this is expected, as the assembler assumes that there are multiple strains and the result is a consensus assembly.

@lborcard
Copy link
Author

lborcard commented Aug 20, 2024

What is surprising is that it chose the lower variant, which is not the consensus at all?
Using medaka from nanopore picked the correct variant.
edit: To add to the first point it only happened for this sample and was caught later in the process. at which level should one verify that this is not happening? Does spade create a VCF of some sort?

@lborcard
Copy link
Author

sorry to insist but we would really like to understand this unexpected behaviour.

@asl
Copy link
Member

asl commented Sep 4, 2024

It is very important to understand is that assembly does not take "consensus" from the reads. This is neither feasible nor intended.

This is especially true in metagenomics mode as assembler also tries to collapse strain differences to produce a back-bone assembly for a metagenome (and note that output of a metagenome assembly is not a combined assembly of individual species, e.g. some between-species variation could be collapsed, but then these repetitive sequences further resolved).

Even more, assembler does not operate on the span of individual nucleotides, so it does not know about "rare variants", etc., it is not a variant calling problem as reference is not available at all.

In your case you can try to disable read error correction, this might help (--only-assembler), but in general, do not expect nucleotide-level resolution from metagenomic assemblies.

See https://pubmed.ncbi.nlm.nih.gov/28298430/ for more information of metagenomic assemblies methods & output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants