metaSPAdes stuck on read error correction #152

kevinxchan · 2018-08-01T16:36:53Z

I've been running metaSPAdes on my dataset for about one week but it looks to be stuck on the error correction step of the pipeline and hasn't progressed since. I find it strange since I've ran metaSPAdes on other datasets before to completion, so perhaps it has something to do with my data (I can send this over if needed). From the log, it looks to be stuck on the file GS-Blade-bottom-all_merged.fq.gz, which makes me think this file has a problem in it. But earlier in the spades pipeline it was processed, so I'm not sure whats happening?

params.txt
spades.log

The text was updated successfully, but these errors were encountered:

asl · 2018-08-01T22:33:37Z

Thank you for your interest in SPAdes.

This is known issue, we will try to fix it in SPAdes 3.12.1

jbisanz · 2018-08-22T23:03:52Z

I think I have run in to this similar issue wherein 10 samples out of 53 got hung up at this step with 3.12.0. Samples were prefiltered for quality, adapters, polyg and human reads. Would reverting to an older version help or is there any potential work around?
Thanks,
Jordan

asl · 2018-08-23T03:00:32Z

The only "workaround" so far is to skip the read error correction step via --only-assembler. The results ought to be sub-optimal though and the assembly would require much more time & memory.

louellette · 2018-11-19T15:58:43Z

I believe I've hit this same issue, on three different assemblies. I was running 3.12.0 and just downloaded 3.13.0 and had the same issue. I would use the word looping instead of stuck since Metaspades is using all the CPU I gave it.
spades.log

Is this the same issue as above? And still open with the only workaround being --only-assembler?

Thank you, Lisa

asl · 2018-11-25T13:47:35Z

It's likely the same issue. We hoped to get it fixed, but most probably some corner cases still remains.

ammaraziz · 2019-03-05T05:54:57Z

Is this still an issue in the latest version of spades (3.13.0)? I think I am running into similar issues in v3.12.0. Spades seems to stop at this step:

0:20:32.848 552M / 1G INFO General (main.cpp : 173) Subclustering Hamming graph

asl · 2019-03-05T08:18:37Z

@ammaraziz There is no problem in your case. Subclustering might be slow depending on your dataset. Though, there stuck could be caused by I/O problems on your end.

ammaraziz · 2019-03-05T22:44:08Z

Hi @asl you are correct, there was an issue the first time around that killed the job at the subclustering stage, and because it takes a while (~10 hours for me), I assumed it had gotten stuck on the second run.

Thank you for the quick reply and thank you for Spades!

metaganal · 2019-03-11T05:00:08Z

I think I encountered the same issue here. Using SPAdes 3.13.0.
Why does some samples proceed just fine and others get stuck in the read correction process?

spades.log

brymerr921 · 2019-04-11T17:04:43Z

I am also using SPAdes 3.13.0 and ran into the same issue where the pipeline stopped during error correction at "processing reads". Because many of my assemblies are very large it is difficult to identify stuck jobs vs ones that are just taking a long time to assemble. It would be great to have a fix for this, and I'm happy to provide samples that get stuck for debugging purposess. One difference from other users is that I am also specifying merged reads in addition to forward and reverse reads for my metagenome assembly. Thank you! Log file attached.

spades.log

asl · 2019-04-23T20:44:07Z

Hello

This is known problem with CQF library we're using. For some reason it just stuck. We hope to be able to at least somehow workaround this issue in the next SPAdes release.

franciscozorrilla · 2019-10-10T11:23:45Z

As I commented in #355, skipping pre-metaspades quality filtering (I was using fastp) seemed to solve this issue for my problematic samples

ralsallaq · 2019-10-14T21:33:07Z

Hi,

I am using SPAdes v3.13.1 to assemble Illumina short paired-end reads 2x101. In one of my libraries I have 96 samples all of which assembled quite nicely except in one sample spades hanged on the read error correction step. Specifically it seems to have problem counting kmers.
Bypassing quality trimming did not change this behavior for this sample. I tried to bypass other steps before assembling but with no success. I attached params and log files.

I am wondering if I have used the correct settings for 2x101 or whether I am missing something. Thank you so much for this great tool and for your help in advance.

params.txt
spades.log

asl · 2019-10-14T21:34:43Z

@ralsallaq This is known problem with CQF implementation. So far there is no workaround. You may want to try --only-assembler after quality trimming

ilyavs · 2021-01-24T10:22:09Z

Hi,
I am having the same issue with spades 3.15.0.
@asl can you elaborate on the expected effect of skipping the error correction step?
I noticed that error correction with --meta for each of the r1 and r2 files separately (passed via -s) doesn't get stuck. Is it a good idea to use these error corrected reads for spades with --meta --only-assembler?
Also, error correction without --meta doesn't get stuck with both r1 and r2 passed together. Are there fundamental differences in the error correction with and without --meta?
Are there fundamental differences in error correction for single and paired end data with --meta?
Thanks,
Ilya.

asl · 2021-02-02T10:24:15Z

No work was done in 3.15 to address this issue due to lack of bandwidth. You may skip the read error correction, the running time and memory consumption might be larger though.

ilyavs · 2021-03-16T12:22:52Z

Skipping the error correction results in assembly differences compared to assembly with error correction (for samples that it worked with error correction). The run time is less of an issue but the differences in assembly are a problem for our use case. They result in higher strain heterogeneity in checkm.

asl · 2021-03-16T12:52:10Z

The difference in assembly results are surely expected. Note that read error correction could certainly (falsely) collapse the variation as it might be hard to distinguish the sequencing artefacts from the strain variability in low abundant strains.

ilyavs · 2021-03-17T08:29:01Z

I understand the consequences of read error correction. We still need to use it, Is there any creative workaround? Performing the error correction for a concatenate file of R1 and R2? Or something like that?

Z-DAI · 2022-02-28T11:38:54Z

Hi all,

I am using the latest SPAdes 3.15.4 to assemble 12 datasets. It worked quit well for the first four, and then stuck after processing the forward reads from the 5th dataset.

I found this issue, and wonder whether there is any update regarding this problem.

Thanks in advance.

asl · 2022-02-28T11:39:44Z

@Z-DAI So far there were no changes. Use `--only-assembler.

hlfreund · 2022-07-27T23:29:41Z

Hi there. It appears that I am the same issue that brymerr921 described. I am including forward, reverse, and merged reads using the only-error-correction argument, and it's been stuck on the same metagenome for 4 days. I am wondering if the way to bypass this is to run error correction on fowrard and reverse reads, then merge them after this? Or, should I just run error correction on the merged reads and exclude the F/R reads? Thank you so much for your help.

nikolasbasler · 2022-07-28T08:42:09Z

I had the same issue and worked around it like so:

Run metaSPAdes on the stuck dataset with --only assembler three times independently, each time either passing the R1, R2 or unpaired reads (if you have and use them) with the -s option.
R1 and R2 will not be paired anymore, so I consolidated them with the following commands (includes seqkit)

# Get a sorted list of R1 and R2 IDs without the @ and the extra info that the metaspades error correction appends:
zcat Sample.R1.fastq.00.0_0.cor.fastq.gz | awk 'NR%4==1 {print $1}' | cut -c 2- | sort > R1.IDs
zcat Sample.R2.fastq.00.0_0.cor.fastq.gz | awk 'NR%4==1 {print $1}' | cut -c 2- | sort > R2.IDs

# Get the IDs that are in both, R1 and R2 files:
comm -12 R1.IDs R2.IDs > paired.IDs

# Get the IDs that are only in R1:
comm -23 R1.IDs R2.IDs > only.R1

# Get the IDs that are only in R2:
comm -13 R1.IDs R2.IDs > only.R2

# Extract the reads with common IDs from R1 and R2:
seqkit grep -f paired.IDs Sample.R1.fastq.00.0_0.cor.fastq.gz > paired.R1.fastq
seqkit grep -f paired.IDs Sample.R2.fastq.00.0_0.cor.fastq.gz > paired.R2.fastq

# Extract the reads unique to R1:
seqkit grep -f only.R1 Sample.R1.fastq.00.0_0.cor.fastq.gz > unpaired.fastq

# Extract the reads unique to R2 and add to the previous file:
seqkit grep -f only.R2 Sample.R2.fastq.00.0_0.cor.fastq.gz >> unpaired.fastq

# Add the original unpaired reads to the file:
zcat Sample.unpaired.fastq.00.0_0.cor.fastq.gz >> unpaired.fastq

Doing this on some datasets that did not get stuck, I saw that the outcome is not identical to a proper run with error correction and assembly in one. The assemblies from files made like described above contain fewer scaffolds, total bp and CDSs > 300 bp (predicted with Prodigal).

claczny · 2023-05-05T09:39:21Z

Somewhat in line with #152 (comment), I was wondering, @asl, is the error correction step deterministic? If so, can it be performed on individual reads of a pair separately? And if so, can the error corrected reads then be used as input together with --only-assembler yielding the same results as with a "regular" run of SPAdes?

Because, this would allow to split the whole process into subprocesses, that could then be performed on individual nodes, i.e., in a HPC setting, the error correction could occur on nodes with less RAM and only reserve nodes with more RAM for the assembly step.

It seems from my empirical observations that the error correction step is largely CPU bound, while for things related to the Hamming graph, I/O seems to be an important factor too.
I am currently running a test with --tmp-dir set to /dev/shm(ramdisk) to see if this holds true.

[UPDATE]
Related to this, is it possible to run --only-error-correction and, after completion, run --continue on the same output folder?
If the final output is then identical, it would allow to execute the error correction on a node/reservation with many CPUs (say, 100) but not a lot of RAM, and then continue with the assembly on a node/reservation with maybe not so many CPUs (say, 40).

Looking forward to your input and thanks for all the support!

Best wishes and stay safe,

Cedric

asl · 2023-05-05T14:22:46Z

Somewhat in line with #152 (comment), I was wondering, @asl, is the error correction step deterministic? If so, can it be performed on individual reads of a pair separately? And if so, can the error corrected reads then be used as input together with --only-assembler yielding the same results as with a "regular" run of SPAdes?

Certainly not. As read error correction routine uses information from other reads to perform the correction. If you're having 1000 reads, then you can extract some information and perform such correction. If you'd split into 100 pieces of 10 reads, then... you're out of luck. And results ought to be suboptimal.

Related to this, is it possible to run --only-error-correction and, after completion, run --continue on the same output folder?

Sure. This would work

claczny · 2023-05-08T08:33:01Z

@asl Thanks a lot for your reply and the clarification.

asl added this to the SPAdes 3.12.1 milestone Aug 23, 2018

jmtsuji mentioned this issue Apr 5, 2019

Error after mismatch_shall_not_pass.hp #276

Closed

alexhbnr mentioned this issue May 24, 2019

Ancient DNA sequencing data: paired-end or single reads? #306

Closed

bsiranosian mentioned this issue Jul 9, 2019

Spades often gets stuck on read error correction bhattlab/bhattlab_workflows#24

Open

AroArz mentioned this issue Aug 23, 2019

Getting stuck on spades-hammer #355

Open

asl modified the milestones: SPAdes 3.12.1, SPAdes 3.15 Jan 19, 2020

Biofarmer mentioned this issue Feb 18, 2020

metaspades bxlab/metaWRAP#256

Open

samnooij mentioned this issue May 12, 2020

SPAdes error correction takes very long for some samples DennisSchmitz/Jovian_archive#144

Closed

asl mentioned this issue Aug 9, 2022

no contig files on metaspades run, but no error in log file #1011

Open

1 task

nickp60 mentioned this issue Apr 13, 2023

Assemblies Stalling vdblab/vdblab-shotgun#53

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metaSPAdes stuck on read error correction #152

metaSPAdes stuck on read error correction #152

kevinxchan commented Aug 1, 2018

asl commented Aug 1, 2018

jbisanz commented Aug 22, 2018

asl commented Aug 23, 2018

louellette commented Nov 19, 2018

asl commented Nov 25, 2018

ammaraziz commented Mar 5, 2019

asl commented Mar 5, 2019

ammaraziz commented Mar 5, 2019

metaganal commented Mar 11, 2019

brymerr921 commented Apr 11, 2019

asl commented Apr 23, 2019

franciscozorrilla commented Oct 10, 2019

ralsallaq commented Oct 14, 2019

asl commented Oct 14, 2019

ilyavs commented Jan 24, 2021

asl commented Feb 2, 2021

ilyavs commented Mar 16, 2021

asl commented Mar 16, 2021

ilyavs commented Mar 17, 2021

Z-DAI commented Feb 28, 2022

asl commented Feb 28, 2022

hlfreund commented Jul 27, 2022 •

edited

Loading

nikolasbasler commented Jul 28, 2022 •

edited

Loading

claczny commented May 5, 2023 •

edited

Loading

asl commented May 5, 2023 •

edited

Loading

claczny commented May 8, 2023

metaSPAdes stuck on read error correction #152

metaSPAdes stuck on read error correction #152

Comments

kevinxchan commented Aug 1, 2018

asl commented Aug 1, 2018

jbisanz commented Aug 22, 2018

asl commented Aug 23, 2018

louellette commented Nov 19, 2018

asl commented Nov 25, 2018

ammaraziz commented Mar 5, 2019

asl commented Mar 5, 2019

ammaraziz commented Mar 5, 2019

metaganal commented Mar 11, 2019

brymerr921 commented Apr 11, 2019

asl commented Apr 23, 2019

franciscozorrilla commented Oct 10, 2019

ralsallaq commented Oct 14, 2019

asl commented Oct 14, 2019

ilyavs commented Jan 24, 2021

asl commented Feb 2, 2021

ilyavs commented Mar 16, 2021

asl commented Mar 16, 2021

ilyavs commented Mar 17, 2021

Z-DAI commented Feb 28, 2022

asl commented Feb 28, 2022

hlfreund commented Jul 27, 2022 • edited Loading

nikolasbasler commented Jul 28, 2022 • edited Loading

claczny commented May 5, 2023 • edited Loading

asl commented May 5, 2023 • edited Loading

claczny commented May 8, 2023

hlfreund commented Jul 27, 2022 •

edited

Loading

nikolasbasler commented Jul 28, 2022 •

edited

Loading

claczny commented May 5, 2023 •

edited

Loading

asl commented May 5, 2023 •

edited

Loading