-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metaSPAdes stuck on read error correction #152
Comments
Thank you for your interest in SPAdes. This is known issue, we will try to fix it in SPAdes 3.12.1 |
I think I have run in to this similar issue wherein 10 samples out of 53 got hung up at this step with 3.12.0. Samples were prefiltered for quality, adapters, polyg and human reads. Would reverting to an older version help or is there any potential work around? |
The only "workaround" so far is to skip the read error correction step via --only-assembler. The results ought to be sub-optimal though and the assembly would require much more time & memory. |
I believe I've hit this same issue, on three different assemblies. I was running 3.12.0 and just downloaded 3.13.0 and had the same issue. I would use the word looping instead of stuck since Metaspades is using all the CPU I gave it. Is this the same issue as above? And still open with the only workaround being --only-assembler? Thank you, Lisa |
It's likely the same issue. We hoped to get it fixed, but most probably some corner cases still remains. |
Is this still an issue in the latest version of spades (3.13.0)? I think I am running into similar issues in v3.12.0. Spades seems to stop at this step:
|
@ammaraziz There is no problem in your case. Subclustering might be slow depending on your dataset. Though, there stuck could be caused by I/O problems on your end. |
Hi @asl you are correct, there was an issue the first time around that killed the job at the subclustering stage, and because it takes a while (~10 hours for me), I assumed it had gotten stuck on the second run. Thank you for the quick reply and thank you for Spades! |
I think I encountered the same issue here. Using SPAdes 3.13.0. |
I am also using SPAdes 3.13.0 and ran into the same issue where the pipeline stopped during error correction at "processing reads". Because many of my assemblies are very large it is difficult to identify stuck jobs vs ones that are just taking a long time to assemble. It would be great to have a fix for this, and I'm happy to provide samples that get stuck for debugging purposess. One difference from other users is that I am also specifying merged reads in addition to forward and reverse reads for my metagenome assembly. Thank you! Log file attached. |
Hello This is known problem with CQF library we're using. For some reason it just stuck. We hope to be able to at least somehow workaround this issue in the next SPAdes release. |
As I commented in #355, skipping pre-metaspades quality filtering (I was using fastp) seemed to solve this issue for my problematic samples |
Hi, I am using SPAdes v3.13.1 to assemble Illumina short paired-end reads 2x101. In one of my libraries I have 96 samples all of which assembled quite nicely except in one sample spades hanged on the read error correction step. Specifically it seems to have problem counting kmers. I am wondering if I have used the correct settings for 2x101 or whether I am missing something. Thank you so much for this great tool and for your help in advance. |
@ralsallaq This is known problem with CQF implementation. So far there is no workaround. You may want to try --only-assembler after quality trimming |
Hi, |
No work was done in 3.15 to address this issue due to lack of bandwidth. You may skip the read error correction, the running time and memory consumption might be larger though. |
Skipping the error correction results in assembly differences compared to assembly with error correction (for samples that it worked with error correction). The run time is less of an issue but the differences in assembly are a problem for our use case. They result in higher strain heterogeneity in checkm. |
The difference in assembly results are surely expected. Note that read error correction could certainly (falsely) collapse the variation as it might be hard to distinguish the sequencing artefacts from the strain variability in low abundant strains. |
I understand the consequences of read error correction. We still need to use it, Is there any creative workaround? Performing the error correction for a concatenate file of R1 and R2? Or something like that? |
Hi all, I am using the latest SPAdes 3.15.4 to assemble 12 datasets. It worked quit well for the first four, and then stuck after processing the forward reads from the 5th dataset. I found this issue, and wonder whether there is any update regarding this problem. Thanks in advance. |
@Z-DAI So far there were no changes. Use `--only-assembler. |
Hi there. It appears that I am the same issue that brymerr921 described. I am including forward, reverse, and merged reads using the only-error-correction argument, and it's been stuck on the same metagenome for 4 days. I am wondering if the way to bypass this is to run error correction on fowrard and reverse reads, then merge them after this? Or, should I just run error correction on the merged reads and exclude the F/R reads? Thank you so much for your help. |
I had the same issue and worked around it like so:
Doing this on some datasets that did not get stuck, I saw that the outcome is not identical to a proper run with error correction and assembly in one. The assemblies from files made like described above contain fewer scaffolds, total bp and CDSs > 300 bp (predicted with Prodigal). |
Somewhat in line with #152 (comment), I was wondering, @asl, is the error correction step deterministic? If so, can it be performed on individual reads of a pair separately? And if so, can the error corrected reads then be used as input together with Because, this would allow to split the whole process into subprocesses, that could then be performed on individual nodes, i.e., in a HPC setting, the error correction could occur on nodes with less RAM and only reserve nodes with more RAM for the assembly step. It seems from my empirical observations that the error correction step is largely CPU bound, while for things related to the Hamming graph, I/O seems to be an important factor too. [UPDATE] Looking forward to your input and thanks for all the support! Best wishes and stay safe, Cedric |
Certainly not. As read error correction routine uses information from other reads to perform the correction. If you're having 1000 reads, then you can extract some information and perform such correction. If you'd split into 100 pieces of 10 reads, then... you're out of luck. And results ought to be suboptimal.
Sure. This would work |
@asl Thanks a lot for your reply and the clarification. |
I've been running metaSPAdes on my dataset for about one week but it looks to be stuck on the error correction step of the pipeline and hasn't progressed since. I find it strange since I've ran metaSPAdes on other datasets before to completion, so perhaps it has something to do with my data (I can send this over if needed). From the log, it looks to be stuck on the file
GS-Blade-bottom-all_merged.fq.gz
, which makes me think this file has a problem in it. But earlier in the spades pipeline it was processed, so I'm not sure whats happening?params.txt
spades.log
The text was updated successfully, but these errors were encountered: