Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add separate Nanopore input option #275

Open
ljmesi opened this issue Feb 22, 2022 · 14 comments · May be fixed by #718
Open

Add separate Nanopore input option #275

ljmesi opened this issue Feb 22, 2022 · 14 comments · May be fixed by #718
Assignees
Labels
enhancement New feature or request

Comments

@ljmesi
Copy link

ljmesi commented Feb 22, 2022

Is your feature request related to a problem? Please describe

At the moment there is an input option to use only short reads.

Describe the solution you'd like

An additional option to use Nanopore long reads fastq files as input would be good to have.

@ljmesi ljmesi added the enhancement New feature or request label Feb 22, 2022
@ljmesi ljmesi self-assigned this Feb 22, 2022
@d4straub
Copy link
Collaborator

Totally agree. That would be a major enhancement, also some significant work though. It requires an nanopore-only assemblers, e.g. flye, and mapping processes for contig/mag quantification. If you have favorite programs, let us know.
This is definitely on my wish-list as well, but it might take a while. You are welcome to add it if you feel like it!

@ljmesi
Copy link
Author

ljmesi commented Mar 7, 2022

Thank you for your feedback @d4straub! I'm wondering that maybe these parts initially could be included:

  1. Add Nanopore reads as input to the pipeline
  2. Use Porechop for adapter/quality trimming
  3. Remove host reads with minimap2
  4. Classify the reads with centrifuge/kraken
    Do these sound good steps to take in order to chop this broader task into smaller subtasks?

@d4straub
Copy link
Collaborator

d4straub commented Mar 7, 2022

Oh, there might be a slight misunderstanding:

Currently, there is already the possibility to use Nanopore data, but only in addition to Illumina data, not on its own (i.e. Nanopore-only).
Also, adapter trimming (Porechop), quality visualisation (NanoPlot), quality filtering (Filtlong), Lambda read removal (NanoLyse) are implemented. Direct host read removal is not yet implemented, only via Filtlong, that depends on Illumina reads (i.e. Nanopore reads that are not covered by Illumina reada are discarded, i.e. when Illumina reads do not have host data, filtered Nanopore reads will also not). Hybrid (Illumina & Nanopore data) assembly is realized with hybridSPAdes already.

Having said that, the pipeline does not (yet) support Nanopore-only assembly, and this is what I was referring to. In case there are no Illumina reads, Filtlong doesnt work (because the settings require Illumina reads) and no Nanopore-only assembler (such as flye) is implemented in the pipeline yet. Additionally, Nanopore reads are currently not used in centrifuge/kraken (that might be relatively easy to add, actually).

@ljmesi
Copy link
Author

ljmesi commented Mar 10, 2022

Thank you @d4straub for the clarification! So if I understand correctly there should be a standalone way of having Nanopore reads fastq files as input. I'm working for Genomic Medicine Sweden and we've been hoping to have a Nanopore-only reads classification directly without assembly (if that seems suitable for this pipeline, possibly using centrifuge/kraken2). Would these steps seem okay additions to the pipeline? At least with kraken2, we have experience of using it with Nanopore reads for classification with seemingly good results.

@d4straub
Copy link
Collaborator

Yes, there could be a way of having Nanopore reads fastq files without Illumina data as input. And that would be desirable in this pipeline. But it would be important that those Nanopore reads are not taken only for Kraken2 but also for assembly, because this is an assembly focused pipeline. And as far as I understand, assembly is not your primary objective (please correct me if I am wrong).

There is a new pipeline in the making, see https://nf-co.re/taxprofiler, that is only focusing on taxonomic profiling. However, it might not allow Nanopore input yet, and it is under construction. So if you are not interested in assembly, and you consider implementing it yourself, I'd recommend to participate in nf-core/taxprofiler.

@ljmesi
Copy link
Author

ljmesi commented Mar 17, 2022

Thank you for your response @d4straub and thank you especially for the recommendation about taxprofiler! It looks like taxprofiler matches more accurately what we need in Genomic Medicine Sweden so I will contribute in adding the feature in taxprofiler instead. I will remove myself as an assignee but will not close the issue in case someone else would like to contribute in adding Nanopore assembly based classification.

@ljmesi ljmesi removed their assignment Mar 17, 2022
@abu85
Copy link

abu85 commented Apr 12, 2023

I thought my questions fit here.
I have question regarding adding a pooled nanopore sample to the pipeline and question an subsequent analysis based on the previous one.
I want to have hybrid comprehensive assembly from both short reads (illumina) and long reads (nanopore). but unfortunately I had to pool samples before nanopore sequencing, so i have fifty individual samples in short redas but one (combined) sample in nanopore. So my questions are

  1. which way i should add the nanopore sample in the samplesheet (my plan is to add this sample besides one of the he short read sample in samplesheet)?
  2. I would like to do binning groupwise on this hybrid assembly based on short reads samples, will this setup in the samplesheet make a problem later on here?
  3. How can i classify nanopore reads in this pipeline (there is no Kraken2 classification option for long reads)? any suggestion?
  4. I have so many fastq files in nanopore sample, should i combine them all into one before runing?

Thanks for your attention.

@d4straub
Copy link
Collaborator

That question would be better asked via nf-core slack (see https://nf-co.re/join) channel "mag". But because I am already here, short answers:

  1. once per row, i.e. once per illumina sample is the only way, but that would generate huge overhead in the pipeline. I am not not sure I got it right, but you can not make a co-assembly that way of course (using --coassemble_group).
  2. binning group wise is no problem, because it only depends on the short reads, it does not use the long reads.
  3. use nf-core/taxprofiler, now released
  4. yes, but your data is non-optimal (nanopore not separated into samples, are you sure that your "many fastq files" are not separated by sample, after all, also nanopore allows [de]multiplexing)

@abu85
Copy link

abu85 commented Apr 12, 2023

Thanks,

  1. I want to utilize this pooled longreads sample (where a bit of every sample were merged into one), I thought I can include in the analysis to make the analysis be better but now it seems that i can not do so from your point, or i misunderstood? Do you suggest anything here?
    4.no, they are not separetd by samples.

@dawnmy
Copy link

dawnmy commented Jun 19, 2023

agree. it is important to support long reads only input data as long reads sequencing is becoming more and more popular

@willros
Copy link
Contributor

willros commented Aug 30, 2023

Hi,

Any updates or fresh thoughts on adding a pure long-read track to the pipeline? I was checking out a few other nf-core pipelines and noticed that some, like viralrecon, have already embraced this idea. I would like to help set up a dedicated nanopore/long read track for this pipeline.

Should this discussion be moved to the Slack channel instead?

Thanks!
William

@d4straub
Copy link
Collaborator

As far as I know there are no new thoughts except that the pipeline is getting huge, additions should be kept at a minimum. I still think that nanopore-only assembly should be possible within the nf-core/mag pipeline.
General planning/updates should be here I think, more interactive discussion are more convenient in slack imho.

@willros
Copy link
Contributor

willros commented Sep 7, 2023

Hi again,

We're a group of people involved in Clinical Genomics in Sweden, and we're eager to introduce a dedicated long read track for metagenomic genome assembly. After chatting with @jfy133 , we've decided to first get together to figure out what features and functionality we want to include, then we'll dive into the how and where of adding this new track.

We're well aware that there's an ongoing discussion about the existing code base, and it might be a bit tricky to shoehorn something new into the current metagenomic assembly process, especially with the potential need for significant changes and rebuilds. So, one idea would be to start fresh with a completely new pipeline for long read implementation.

Perhaps we can keep the discussion going here, so others can participate with their thoughts on architecture and functionality.

Thanks!
William

@jfy133
Copy link
Member

jfy133 commented Sep 7, 2023

Small comment for now: I don't think we need an entire re-write of the pipeline per se, but the purely long read functionality could be a separate fresh workflow (like viral recon with illuminata Vs nanopore data)

@muabnezor muabnezor linked a pull request Dec 12, 2024 that will close this issue
11 tasks
@jfy133 jfy133 linked a pull request Jan 20, 2025 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants