Merge branch 'dev' into add-param-for-phred-encoding

nf-core · Jan 27, 2025 · 8f91ef6 · 8f91ef6
2 parents 0256b37 + e5e627d
commit 8f91ef6
Show file tree

Hide file tree

Showing 58 changed files with 509 additions and 203 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,14 +7,39 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### `Added`
 
+- [#798](https://github.com/nf-core/ampliseq/pull/798) - Added SILVA version 138.2 of DADA2 taxonomy database: `silva=138.2` or `silva` as parameter to `--dada2_ref_taxonomy`
 - [#801](https://github.com/nf-core/ampliseq/pull/801) - Parameter `--quality_type` allows specifying the type of quality scores in raw read data, by default `Auto` (i.e. default behavior did not change)
+- [#804](https://github.com/nf-core/ampliseq/pull/804) - Added version 10 of Unite as parameter for `--dada_ref_taxonomy` (issue [#768](https://github.com/nf-core/ampliseq/issues/768))
+- [#803](https://github.com/nf-core/ampliseq/pull/803) - New parameters introduced related to `--mergepairs_strategy`. These parameters would only be effective if `--mergepairs_strategy consensus` is set.
+- [#807](https://github.com/nf-core/ampliseq/pull/807) - Export of TreeSummarizedExperiment R object by default, can be omitted with `--skip_tse`, also added ability to skip phyloseq R object generation with `--skip_phyloseq`
+
+| **Parameter**                              | **Description**                                                                           | **Default Value** |
+| ------------------------------------------ | ----------------------------------------------------------------------------------------- | ----------------- |
+| **mergepairs_consensus_match**             | The score assigned for each matching base pair during sequence alignment.                 | 1                 |
+| **mergepairs_consensus_mismatch**          | The penalty score assigned for each mismatched base pair during sequence alignment.       | -2                |
+| **mergepairs_consensus_gap**               | The penalty score assigned for each gap introduced during sequence alignment.             | -4                |
+| **mergepairs_consensus_minoverlap**        | The minimum number of overlapping base pairs required to merge forward and reverse reads. | 12                |
+| **mergepairs_consensus_maxmismatch**       | The maximum number of mismatches allowed within the overlapping region for merging reads. | 0                 |
+| **mergepairs_consensus_percentile_cutoff** | The percentile cutoff determining the minimum observed overlap in the dataset.            | 0.001             |
 
 ### `Changed`
 
+- [#803](https://github.com/nf-core/ampliseq/pull/803) - Changed DADA2_DENOISING : `--concatenate_reads` renaming to `--mergepairs_strategy` ; support new method named "consensus" by setting `--mergepairs_strategy consensus` ; changed options of `--mergepairs_strategy` from TRUE/FALSE (boolean) to ["merge", "concatenate", "consensus"].
+- [#818](https://github.com/nf-core/ampliseq/pull/818) - Provide users the ability to not bump stack size in vsearch clustering.
+
 ### `Fixed`
 
+- [#800](https://github.com/nf-core/ampliseq/pull/800) - Fixed SH files for UNITE9.0, they were missing some entries due to a bug caused by API update in PlutoF
+- [#808](https://github.com/nf-core/ampliseq/pull/808) - Add missing library declaration in R script.
+
 ### `Dependencies`
 
+- [#797](https://github.com/nf-core/ampliseq/pull/797) - Update QIIME2
+
+| software | previously | now     |
+| -------- | ---------- | ------- |
+| QIIME2   | 2023.7     | 2024.10 |
+
 ### `Removed`
 
 ## nf-core/ampliseq version 2.12.0 - 2024-11-14

diff --git a/CITATIONS.md b/CITATIONS.md
@@ -149,6 +149,10 @@
 
   > McMurdie PJ, Holmes S (2013). “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data.” PLoS ONE, 8(4), e61217.
 
+- [TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2)
+
+  > Huang R, Soneson C, Ernst FGM et al. TreeSummarizedExperiment: a S4 class for data with hierarchical structure [version 2; peer review: 3 approved]. F1000Research 2021, 9:1246.
+
 ### Non-default tools
 
 - [ITSx](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12073)

diff --git a/assets/report_template.Rmd b/assets/report_template.Rmd
@@ -107,6 +107,7 @@ params:
     picrust_pathways: FALSE
     sbdi: FALSE
     phyloseq: FALSE
+    tse: FALSE
 ---
 
 <!-- Load libraries -->
@@ -1615,19 +1616,37 @@ but if you run nf-core/ampliseq with a sample metadata table (`--metadata`) any
 "))
 ```
 
-<!-- Section on PHYLOSEQ results -->
+<!-- Section on exported R objects -->
 
-```{r, eval = !isFALSE(params$phyloseq), results='asis'}
+```{r, results='asis'}
+any_robject <- !isFALSE(params$phyloseq) || !isFALSE(params$tse)
+```
+
+```{r, eval = !isFALSE(any_robject), results='asis'}
 cat(paste0("
-# Phyloseq
+# R objects
 
-[Phyloseq](https://doi.org/10.1371/journal.pone.0061217)
-is a popular R package to analyse and visualize microbiom data.
-The produced RDS files contain phyloseq objects and can be loaded directely into R and phyloseq.
+Microbiome data can be analysed and visualized with certain R packages. For convenience, R objects in RDS format are provided.
+"))
+
+if ( !isFALSE(params$phyloseq) ) {
+    cat(paste0("
+[Phyloseq](https://doi.org/10.1371/journal.pone.0061217) objects and can be loaded directely into R with package 'phyloseq'.
 The objects contain an ASV abundance table and a taxonomy table.
 If available, metadata and phylogenetic tree will also be included in the phyloseq object.
 The files can be found in folder [phyloseq](../phyloseq/).
-"))
+    "))
+}
+
+if ( !isFALSE(params$tse) ) {
+    cat(paste0("
+[TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2) (TreeSE, TSE)
+objects can be loaded into R with package 'TreeSummarizedExperiment'. and contain an ASV abundance table,
+a taxonomy table, and sequences.
+If available, metadata and phylogenetic tree will also be included in the object.
+The files can be found in folder [treesummarizedexperiment](../treesummarizedexperiment/).
+    "))
+}
 ```
 
 <!-- Section on methods -->

diff --git a/bin/taxref_reformat_standard.sh b/bin/taxref_reformat_standard.sh
@@ -5,4 +5,4 @@
 gunzip -c *train*gz > assignTaxonomy.fna
 
 # and the file for add species, identified by containing "species" in the name, is renamed
-mv *species*gz addSpecies.fna.gz
+mv *assign*gz addSpecies.fna.gz
diff --git a/conf/modules.config b/conf/modules.config
@@ -235,9 +235,12 @@ process {
         ].join(',').replaceAll('(,)*$', "")
         // setting from https://rdrr.io/bioc/dada2/man/mergePairs.html & https://rdrr.io/bioc/dada2/man/nwalign.html & match = getDadaOpt("MATCH"), mismatch = getDadaOpt("MISMATCH"), gap = getDadaOpt("GAP_PENALTY"), missing from the list below is: 'band = -1'
         ext.args2 = [
-            'minOverlap = 12, maxMismatch = 0, returnRejects = FALSE, propagateCol = character(0), trimOverhang = FALSE, match = 1, mismatch = -64, gap = -64, homo_gap = NULL, endsfree = TRUE, vec = FALSE',
-            params.concatenate_reads ? "justConcatenate = TRUE" : "justConcatenate = FALSE"
+            "homo_gap = NULL, endsfree = TRUE, vec = FALSE, propagateCol = character(0), trimOverhang = FALSE",
+            params.mergepairs_strategy == "consensus" ?
+                "returnRejects = TRUE, match = ${params.mergepairs_consensus_match}, mismatch = ${params.mergepairs_consensus_mismatch}, minOverlap = ${params.mergepairs_consensus_minoverlap}, maxMismatch = ${params.mergepairs_consensus_maxmismatch}, gap = ${params.mergepairs_consensus_gap}" :
+                "justConcatenate = ${params.mergepairs_strategy == 'concatenate' ? 'TRUE' : 'FALSE'}, returnRejects = FALSE, match = 1, mismatch = -64, gap = -64, minOverlap = 12, maxMismatch = 0"
         ].join(',').replaceAll('(,)*$', "")
+        ext.quantile = "${params.mergepairs_consensus_percentile_cutoff}"
         publishDir = [
             [
                 path: { "${params.outdir}/dada2/args" },
@@ -1060,6 +1063,15 @@ process {
             pattern: "*.rds"
         ]
     }
+
+    withName: TREESUMMARIZEDEXPERIMENT {
+        publishDir = [
+            path: { "${params.outdir}/treesummarizedexperiment" },
+            mode: params.publish_dir_mode,
+            pattern: "*.rds"
+        ]
+    }
+
     withName: 'MULTIQC' {
         ext.args   = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
         publishDir = [

diff --git a/conf/ref_databases.config b/conf/ref_databases.config
@@ -178,11 +178,18 @@ params {
             taxlevels = "Domain,Kingdom,Phylum,Class,Order,Family,Genus,Species"
         }
         'silva' {
-            title = "Silva 138.1 prokaryotic SSU"
-            file = [ "https://zenodo.org/record/4587955/files/silva_nr99_v138.1_wSpecies_train_set.fa.gz", "https://zenodo.org/record/4587955/files/silva_species_assignment_v138.1.fa.gz" ]
+            title = "Silva 138.2 prokaryotic SSU"
+            file = [ "https://zenodo.org/records/14169026/files/silva_nr99_v138.2_toSpecies_trainset.fa.gz", "https://zenodo.org/records/14169026/files/silva_v138.2_assignSpecies.fa.gz" ]
             citation = "Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. doi: 10.1093/nar/gks1219. Epub 2012 Nov 28. PMID: 23193283; PMCID: PMC3531112."
             fmtscript = "taxref_reformat_standard.sh"
-            dbversion = "SILVA v138.1 (https://zenodo.org/record/4587955)"
+            dbversion = "SILVA v138.2 (https://zenodo.org/records/14169026)"
+        }
+        'silva=138.2' {
+            title = "Silva 138.2 prokaryotic SSU"
+            file = [ "https://zenodo.org/records/14169026/files/silva_nr99_v138.2_toSpecies_trainset.fa.gz", "https://zenodo.org/records/14169026/files/silva_v138.2_assignSpecies.fa.gz" ]
+            citation = "Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. doi: 10.1093/nar/gks1219. Epub 2012 Nov 28. PMID: 23193283; PMCID: PMC3531112."
+            fmtscript = "taxref_reformat_standard.sh"
+            dbversion = "SILVA v138.2 (https://zenodo.org/records/14169026)"
         }
         'silva=138' {
             title = "Silva 138.1 prokaryotic SSU"
@@ -199,20 +206,28 @@ params {
             dbversion = "SILVA v132 (https://zenodo.org/record/1172783)"
         }
         'unite-fungi' {
-            title = "UNITE general FASTA release for Fungi - Version 9.0"
-            file = [ "https://s3.hpc.ut.ee/plutof-public/original/fa1038da-d18d-46b7-88a9-c21bcf38c43d.tgz" ]
-            citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2023): UNITE general FASTA release for Fungi. Version 18.07.2023. UNITE Community. https://doi.org/10.15156/BIO/2938067"
+            title = "UNITE general FASTA release for Fungi - Version 10.0"
+            file = [ "https://s3.hpc.ut.ee/plutof-public/original/d18aa648-3f4c-4f46-84d4-c8c5d48439ba.tgz" ]
+            citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2024): UNITE general FASTA release for Fungi. Version 04.04.2024. UNITE Community. https://doi.org/10.15156/BIO/2959332"
             fmtscript = "taxref_reformat_unite.sh"
-            dbversion = "UNITE-fungi v9.0 (https://doi.org/10.15156/BIO/2938067)"
-            shfile = [ "https://figshare.scilifelab.se/ndownloader/files/40788767", "https://figshare.scilifelab.se/ndownloader/files/40788770"]
+            dbversion = "UNITE-fungi v10.0 (https://doi.org/10.15156/BIO/2959332)"
+            shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50595459", "https://figshare.scilifelab.se/ndownloader/files/50595462"]
+        }
+        'unite-fungi=10.0' {
+            title = "UNITE general FASTA release for Fungi - Version 10.0"
+            file = [ "https://s3.hpc.ut.ee/plutof-public/original/d18aa648-3f4c-4f46-84d4-c8c5d48439ba.tgz" ]
+            citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2024): UNITE general FASTA release for Fungi. Version 04.04.2024. UNITE Community. https://doi.org/10.15156/BIO/2959332"
+            fmtscript = "taxref_reformat_unite.sh"
+            dbversion = "UNITE-fungi v10.0 (https://doi.org/10.15156/BIO/2959332)"
+            shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50595459", "https://figshare.scilifelab.se/ndownloader/files/50595462"]
         }
         'unite-fungi=9.0' {
             title = "UNITE general FASTA release for Fungi - Version 9.0"
             file = [ "https://s3.hpc.ut.ee/plutof-public/original/fa1038da-d18d-46b7-88a9-c21bcf38c43d.tgz" ]
             citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2023): UNITE general FASTA release for Fungi. Version 18.07.2023. UNITE Community. https://doi.org/10.15156/BIO/2938067"
             fmtscript = "taxref_reformat_unite.sh"
             dbversion = "UNITE-fungi v9.0 (https://doi.org/10.15156/BIO/2938067)"
-            shfile = [ "https://figshare.scilifelab.se/ndownloader/files/40788767", "https://figshare.scilifelab.se/ndownloader/files/40788770"]
+            shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50055762", "https://figshare.scilifelab.se/ndownloader/files/50055765"]
         }
         'unite-fungi=8.3' {
             title = "UNITE general FASTA release for Fungi - Version 8.3"
@@ -231,20 +246,28 @@ params {
             shfile = [ "https://scilifelab.figshare.com/ndownloader/files/34497971", "https://scilifelab.figshare.com/ndownloader/files/34497974"]
         }
         'unite-alleuk' {
-            title = "UNITE general FASTA release for eukaryotes - Version 9.0"
-            file = [ "https://s3.hpc.ut.ee/plutof-public/original/e318f5fd-1ef4-40fd-9e77-1b94d91b3858.tgz" ]
-            citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2023): UNITE general FASTA release for eukaryotes. Version 18.07.2023. UNITE Community. https://doi.org/10.15156/BIO/2938069"
+            title = "UNITE general FASTA release for eukaryotes - Version 10.0"
+            file = [ "https://s3.hpc.ut.ee/plutof-public/original/1dda2021-4893-4f2f-b50e-87bfea795267.tgz" ]
+            citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2024): UNITE general FASTA release for eukaryotes. Version 04.04.2024. UNITE Community. https://doi.org/10.15156/BIO/2959334"
             fmtscript = "taxref_reformat_unite.sh"
-            dbversion = "UNITE-alleuk v9.0 (https://doi.org/10.15156/BIO/2938069)"
-            shfile = [ "https://figshare.scilifelab.se/ndownloader/files/40788773", "https://figshare.scilifelab.se/ndownloader/files/40788776"]
+            dbversion = "UNITE-alleuk v10.0 (https://doi.org/10.15156/BIO/2959334)"
+            shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50595465", "https://figshare.scilifelab.se/ndownloader/files/50595471"]
+        }
+        'unite-alleuk=10.0' {
+            title = "UNITE general FASTA release for eukaryotes - Version 10.0"
+            file = [ "https://s3.hpc.ut.ee/plutof-public/original/1dda2021-4893-4f2f-b50e-87bfea795267.tgz" ]
+            citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2024): UNITE general FASTA release for eukaryotes. Version 04.04.2024. UNITE Community. https://doi.org/10.15156/BIO/2959334"
+            fmtscript = "taxref_reformat_unite.sh"
+            dbversion = "UNITE-alleuk v10.0 (https://doi.org/10.15156/BIO/2959334)"
+            shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50595465", "https://figshare.scilifelab.se/ndownloader/files/50595471"]
         }
         'unite-alleuk=9.0' {
             title = "UNITE general FASTA release for eukaryotes - Version 9.0"
             file = [ "https://s3.hpc.ut.ee/plutof-public/original/e318f5fd-1ef4-40fd-9e77-1b94d91b3858.tgz" ]
             citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2023): UNITE general FASTA release for eukaryotes. Version 18.07.2023. UNITE Community. https://doi.org/10.15156/BIO/2938069"
             fmtscript = "taxref_reformat_unite.sh"
             dbversion = "UNITE-alleuk v9.0 (https://doi.org/10.15156/BIO/2938069)"
-            shfile = [ "https://figshare.scilifelab.se/ndownloader/files/40788773", "https://figshare.scilifelab.se/ndownloader/files/40788776"]
+            shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50055768", "https://figshare.scilifelab.se/ndownloader/files/50055771"]
         }
         'unite-alleuk=8.3' {
             title = "UNITE general FASTA release for eukaryotes - Version 8.3"
@@ -282,6 +305,7 @@ params {
     //QIIME2 taxonomic reference databases
     qiime_ref_databases {
         //SILVA for QIIME2 v2023.7, see https://docs.qiime2.org/2023.7/data-resources/#silva-16s-18s-rrna
+        //SILVA for QIIME2 v2023.7 md5sums identical to QIIME2 v2024.10, but links on https://docs.qiime2.org/2024.10/data-resources/ are failing
         'silva=138' {
             title = "QIIME2 pre-formatted SILVA dereplicated at 99% similarity - Version 138"
             file = [ "https://data.qiime2.org/2023.7/common/silva-138-99-seqs.qza", "https://data.qiime2.org/2023.7/common/silva-138-99-tax.qza" ]

diff --git a/conf/test_multi.config b/conf/test_multi.config
@@ -30,4 +30,7 @@ params {
     dada_ref_taxonomy = "rdp=18"
     skip_dada_addspecies = true
     input = params.pipelines_testdata_base_path + "ampliseq/samplesheets/Samplesheet_multi.tsv"
+
+    skip_phyloseq = true
+    skip_tse = true
 }
diff --git a/conf/test_pacbio_its.config b/conf/test_pacbio_its.config
@@ -37,4 +37,6 @@ params {
 
     // Prevent default taxonomic classification
     skip_dada_taxonomy = true
+
+    skip_phyloseq = true
 }
diff --git a/conf/test_reftaxcustom.config b/conf/test_reftaxcustom.config
@@ -37,4 +37,6 @@ params {
 
     // Skip downstream analysis with QIIME2
     skip_qiime_downstream = true
+
+    skip_tse = true
 }
diff --git a/docs/output.md b/docs/output.md
@@ -46,7 +46,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
   - [Differential abundance analysis](#differential-abundance-analysis) - Calling differentially abundant features with ANCOM or ANCOM-BC
 - [PICRUSt2](#picrust2) - Predict the functional potential of a bacterial community
 - [SBDI export](#sbdi-export) - Swedish Biodiversity Infrastructure (SBDI) submission file
-- [Phyloseq](#phyloseq) - Phyloseq R objects
+- [R object](#r-objects) - Phyloseq and TreeSummarizedExperiment R objects
 - [Read count report](#read-count-report) - Report of read counts during various steps of the pipeline
 - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
 
@@ -629,15 +629,17 @@ Most of the fields in the template will not be populated by the export process,
 
 </details>
 
-### Phyloseq
+### R objects
 
-This directory will hold phyloseq objects for each taxonomy table produced by this pipeline. The objects will contain an ASV abundance table and a taxonomy table. If the pipeline is provided with metadata, that metadata will also be included in the phyloseq object. A phylogenetic tree will also be included if the pipeline produces a tree.
+Pipeline results are stored in phyloseq and TreeSummarizedExperiment R objects for each taxonomy table produced by this pipeline. The R objects will contain an ASV abundance table and a taxonomy table, and optionally sequences, metadata and a phylogenetic tree.
 
 <details markdown="1">
 <summary>Output files</summary>
 
 - `phyloseq/`
   - `<taxonomy>_phyloseq.rds`: Phyloseq R object.
+- `treesummarizedexperiment/`
+  - `<taxonomy>_TreeSummarizedExperiment.rds`: TreeSummarizedExperiment R object.
 
 </details>