Skip to content

Commit

Permalink
Merge branch 'dev' into add-param-for-phred-encoding
Browse files Browse the repository at this point in the history
  • Loading branch information
d4straub authored Jan 27, 2025
2 parents 0256b37 + e5e627d commit 8f91ef6
Show file tree
Hide file tree
Showing 58 changed files with 509 additions and 203 deletions.
25 changes: 25 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,39 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#798](https://github.com/nf-core/ampliseq/pull/798) - Added SILVA version 138.2 of DADA2 taxonomy database: `silva=138.2` or `silva` as parameter to `--dada2_ref_taxonomy`
- [#801](https://github.com/nf-core/ampliseq/pull/801) - Parameter `--quality_type` allows specifying the type of quality scores in raw read data, by default `Auto` (i.e. default behavior did not change)
- [#804](https://github.com/nf-core/ampliseq/pull/804) - Added version 10 of Unite as parameter for `--dada_ref_taxonomy` (issue [#768](https://github.com/nf-core/ampliseq/issues/768))
- [#803](https://github.com/nf-core/ampliseq/pull/803) - New parameters introduced related to `--mergepairs_strategy`. These parameters would only be effective if `--mergepairs_strategy consensus` is set.
- [#807](https://github.com/nf-core/ampliseq/pull/807) - Export of TreeSummarizedExperiment R object by default, can be omitted with `--skip_tse`, also added ability to skip phyloseq R object generation with `--skip_phyloseq`

| **Parameter** | **Description** | **Default Value** |
| ------------------------------------------ | ----------------------------------------------------------------------------------------- | ----------------- |
| **mergepairs_consensus_match** | The score assigned for each matching base pair during sequence alignment. | 1 |
| **mergepairs_consensus_mismatch** | The penalty score assigned for each mismatched base pair during sequence alignment. | -2 |
| **mergepairs_consensus_gap** | The penalty score assigned for each gap introduced during sequence alignment. | -4 |
| **mergepairs_consensus_minoverlap** | The minimum number of overlapping base pairs required to merge forward and reverse reads. | 12 |
| **mergepairs_consensus_maxmismatch** | The maximum number of mismatches allowed within the overlapping region for merging reads. | 0 |
| **mergepairs_consensus_percentile_cutoff** | The percentile cutoff determining the minimum observed overlap in the dataset. | 0.001 |

### `Changed`

- [#803](https://github.com/nf-core/ampliseq/pull/803) - Changed DADA2_DENOISING : `--concatenate_reads` renaming to `--mergepairs_strategy` ; support new method named "consensus" by setting `--mergepairs_strategy consensus` ; changed options of `--mergepairs_strategy` from TRUE/FALSE (boolean) to ["merge", "concatenate", "consensus"].
- [#818](https://github.com/nf-core/ampliseq/pull/818) - Provide users the ability to not bump stack size in vsearch clustering.

### `Fixed`

- [#800](https://github.com/nf-core/ampliseq/pull/800) - Fixed SH files for UNITE9.0, they were missing some entries due to a bug caused by API update in PlutoF
- [#808](https://github.com/nf-core/ampliseq/pull/808) - Add missing library declaration in R script.

### `Dependencies`

- [#797](https://github.com/nf-core/ampliseq/pull/797) - Update QIIME2

| software | previously | now |
| -------- | ---------- | ------- |
| QIIME2 | 2023.7 | 2024.10 |

### `Removed`

## nf-core/ampliseq version 2.12.0 - 2024-11-14
Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,10 @@

> McMurdie PJ, Holmes S (2013). “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data.” PLoS ONE, 8(4), e61217.
- [TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2)

> Huang R, Soneson C, Ernst FGM et al. TreeSummarizedExperiment: a S4 class for data with hierarchical structure [version 2; peer review: 3 approved]. F1000Research 2021, 9:1246.
### Non-default tools

- [ITSx](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12073)
Expand Down
33 changes: 26 additions & 7 deletions assets/report_template.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ params:
picrust_pathways: FALSE
sbdi: FALSE
phyloseq: FALSE
tse: FALSE
---

<!-- Load libraries -->
Expand Down Expand Up @@ -1615,19 +1616,37 @@ but if you run nf-core/ampliseq with a sample metadata table (`--metadata`) any
"))
```

<!-- Section on PHYLOSEQ results -->
<!-- Section on exported R objects -->

```{r, eval = !isFALSE(params$phyloseq), results='asis'}
```{r, results='asis'}
any_robject <- !isFALSE(params$phyloseq) || !isFALSE(params$tse)
```

```{r, eval = !isFALSE(any_robject), results='asis'}
cat(paste0("
# Phyloseq
# R objects
[Phyloseq](https://doi.org/10.1371/journal.pone.0061217)
is a popular R package to analyse and visualize microbiom data.
The produced RDS files contain phyloseq objects and can be loaded directely into R and phyloseq.
Microbiome data can be analysed and visualized with certain R packages. For convenience, R objects in RDS format are provided.
"))
if ( !isFALSE(params$phyloseq) ) {
cat(paste0("
[Phyloseq](https://doi.org/10.1371/journal.pone.0061217) objects and can be loaded directely into R with package 'phyloseq'.
The objects contain an ASV abundance table and a taxonomy table.
If available, metadata and phylogenetic tree will also be included in the phyloseq object.
The files can be found in folder [phyloseq](../phyloseq/).
"))
"))
}
if ( !isFALSE(params$tse) ) {
cat(paste0("
[TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2) (TreeSE, TSE)
objects can be loaded into R with package 'TreeSummarizedExperiment'. and contain an ASV abundance table,
a taxonomy table, and sequences.
If available, metadata and phylogenetic tree will also be included in the object.
The files can be found in folder [treesummarizedexperiment](../treesummarizedexperiment/).
"))
}
```

<!-- Section on methods -->
Expand Down
2 changes: 1 addition & 1 deletion bin/taxref_reformat_standard.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
gunzip -c *train*gz > assignTaxonomy.fna

# and the file for add species, identified by containing "species" in the name, is renamed
mv *species*gz addSpecies.fna.gz
mv *assign*gz addSpecies.fna.gz
16 changes: 14 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -235,9 +235,12 @@ process {
].join(',').replaceAll('(,)*$', "")
// setting from https://rdrr.io/bioc/dada2/man/mergePairs.html & https://rdrr.io/bioc/dada2/man/nwalign.html & match = getDadaOpt("MATCH"), mismatch = getDadaOpt("MISMATCH"), gap = getDadaOpt("GAP_PENALTY"), missing from the list below is: 'band = -1'
ext.args2 = [
'minOverlap = 12, maxMismatch = 0, returnRejects = FALSE, propagateCol = character(0), trimOverhang = FALSE, match = 1, mismatch = -64, gap = -64, homo_gap = NULL, endsfree = TRUE, vec = FALSE',
params.concatenate_reads ? "justConcatenate = TRUE" : "justConcatenate = FALSE"
"homo_gap = NULL, endsfree = TRUE, vec = FALSE, propagateCol = character(0), trimOverhang = FALSE",
params.mergepairs_strategy == "consensus" ?
"returnRejects = TRUE, match = ${params.mergepairs_consensus_match}, mismatch = ${params.mergepairs_consensus_mismatch}, minOverlap = ${params.mergepairs_consensus_minoverlap}, maxMismatch = ${params.mergepairs_consensus_maxmismatch}, gap = ${params.mergepairs_consensus_gap}" :
"justConcatenate = ${params.mergepairs_strategy == 'concatenate' ? 'TRUE' : 'FALSE'}, returnRejects = FALSE, match = 1, mismatch = -64, gap = -64, minOverlap = 12, maxMismatch = 0"
].join(',').replaceAll('(,)*$', "")
ext.quantile = "${params.mergepairs_consensus_percentile_cutoff}"
publishDir = [
[
path: { "${params.outdir}/dada2/args" },
Expand Down Expand Up @@ -1060,6 +1063,15 @@ process {
pattern: "*.rds"
]
}

withName: TREESUMMARIZEDEXPERIMENT {
publishDir = [
path: { "${params.outdir}/treesummarizedexperiment" },
mode: params.publish_dir_mode,
pattern: "*.rds"
]
}

withName: 'MULTIQC' {
ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
publishDir = [
Expand Down
54 changes: 39 additions & 15 deletions conf/ref_databases.config
Original file line number Diff line number Diff line change
Expand Up @@ -178,11 +178,18 @@ params {
taxlevels = "Domain,Kingdom,Phylum,Class,Order,Family,Genus,Species"
}
'silva' {
title = "Silva 138.1 prokaryotic SSU"
file = [ "https://zenodo.org/record/4587955/files/silva_nr99_v138.1_wSpecies_train_set.fa.gz", "https://zenodo.org/record/4587955/files/silva_species_assignment_v138.1.fa.gz" ]
title = "Silva 138.2 prokaryotic SSU"
file = [ "https://zenodo.org/records/14169026/files/silva_nr99_v138.2_toSpecies_trainset.fa.gz", "https://zenodo.org/records/14169026/files/silva_v138.2_assignSpecies.fa.gz" ]
citation = "Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. doi: 10.1093/nar/gks1219. Epub 2012 Nov 28. PMID: 23193283; PMCID: PMC3531112."
fmtscript = "taxref_reformat_standard.sh"
dbversion = "SILVA v138.1 (https://zenodo.org/record/4587955)"
dbversion = "SILVA v138.2 (https://zenodo.org/records/14169026)"
}
'silva=138.2' {
title = "Silva 138.2 prokaryotic SSU"
file = [ "https://zenodo.org/records/14169026/files/silva_nr99_v138.2_toSpecies_trainset.fa.gz", "https://zenodo.org/records/14169026/files/silva_v138.2_assignSpecies.fa.gz" ]
citation = "Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. doi: 10.1093/nar/gks1219. Epub 2012 Nov 28. PMID: 23193283; PMCID: PMC3531112."
fmtscript = "taxref_reformat_standard.sh"
dbversion = "SILVA v138.2 (https://zenodo.org/records/14169026)"
}
'silva=138' {
title = "Silva 138.1 prokaryotic SSU"
Expand All @@ -199,20 +206,28 @@ params {
dbversion = "SILVA v132 (https://zenodo.org/record/1172783)"
}
'unite-fungi' {
title = "UNITE general FASTA release for Fungi - Version 9.0"
file = [ "https://s3.hpc.ut.ee/plutof-public/original/fa1038da-d18d-46b7-88a9-c21bcf38c43d.tgz" ]
citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2023): UNITE general FASTA release for Fungi. Version 18.07.2023. UNITE Community. https://doi.org/10.15156/BIO/2938067"
title = "UNITE general FASTA release for Fungi - Version 10.0"
file = [ "https://s3.hpc.ut.ee/plutof-public/original/d18aa648-3f4c-4f46-84d4-c8c5d48439ba.tgz" ]
citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2024): UNITE general FASTA release for Fungi. Version 04.04.2024. UNITE Community. https://doi.org/10.15156/BIO/2959332"
fmtscript = "taxref_reformat_unite.sh"
dbversion = "UNITE-fungi v9.0 (https://doi.org/10.15156/BIO/2938067)"
shfile = [ "https://figshare.scilifelab.se/ndownloader/files/40788767", "https://figshare.scilifelab.se/ndownloader/files/40788770"]
dbversion = "UNITE-fungi v10.0 (https://doi.org/10.15156/BIO/2959332)"
shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50595459", "https://figshare.scilifelab.se/ndownloader/files/50595462"]
}
'unite-fungi=10.0' {
title = "UNITE general FASTA release for Fungi - Version 10.0"
file = [ "https://s3.hpc.ut.ee/plutof-public/original/d18aa648-3f4c-4f46-84d4-c8c5d48439ba.tgz" ]
citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2024): UNITE general FASTA release for Fungi. Version 04.04.2024. UNITE Community. https://doi.org/10.15156/BIO/2959332"
fmtscript = "taxref_reformat_unite.sh"
dbversion = "UNITE-fungi v10.0 (https://doi.org/10.15156/BIO/2959332)"
shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50595459", "https://figshare.scilifelab.se/ndownloader/files/50595462"]
}
'unite-fungi=9.0' {
title = "UNITE general FASTA release for Fungi - Version 9.0"
file = [ "https://s3.hpc.ut.ee/plutof-public/original/fa1038da-d18d-46b7-88a9-c21bcf38c43d.tgz" ]
citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2023): UNITE general FASTA release for Fungi. Version 18.07.2023. UNITE Community. https://doi.org/10.15156/BIO/2938067"
fmtscript = "taxref_reformat_unite.sh"
dbversion = "UNITE-fungi v9.0 (https://doi.org/10.15156/BIO/2938067)"
shfile = [ "https://figshare.scilifelab.se/ndownloader/files/40788767", "https://figshare.scilifelab.se/ndownloader/files/40788770"]
shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50055762", "https://figshare.scilifelab.se/ndownloader/files/50055765"]
}
'unite-fungi=8.3' {
title = "UNITE general FASTA release for Fungi - Version 8.3"
Expand All @@ -231,20 +246,28 @@ params {
shfile = [ "https://scilifelab.figshare.com/ndownloader/files/34497971", "https://scilifelab.figshare.com/ndownloader/files/34497974"]
}
'unite-alleuk' {
title = "UNITE general FASTA release for eukaryotes - Version 9.0"
file = [ "https://s3.hpc.ut.ee/plutof-public/original/e318f5fd-1ef4-40fd-9e77-1b94d91b3858.tgz" ]
citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2023): UNITE general FASTA release for eukaryotes. Version 18.07.2023. UNITE Community. https://doi.org/10.15156/BIO/2938069"
title = "UNITE general FASTA release for eukaryotes - Version 10.0"
file = [ "https://s3.hpc.ut.ee/plutof-public/original/1dda2021-4893-4f2f-b50e-87bfea795267.tgz" ]
citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2024): UNITE general FASTA release for eukaryotes. Version 04.04.2024. UNITE Community. https://doi.org/10.15156/BIO/2959334"
fmtscript = "taxref_reformat_unite.sh"
dbversion = "UNITE-alleuk v9.0 (https://doi.org/10.15156/BIO/2938069)"
shfile = [ "https://figshare.scilifelab.se/ndownloader/files/40788773", "https://figshare.scilifelab.se/ndownloader/files/40788776"]
dbversion = "UNITE-alleuk v10.0 (https://doi.org/10.15156/BIO/2959334)"
shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50595465", "https://figshare.scilifelab.se/ndownloader/files/50595471"]
}
'unite-alleuk=10.0' {
title = "UNITE general FASTA release for eukaryotes - Version 10.0"
file = [ "https://s3.hpc.ut.ee/plutof-public/original/1dda2021-4893-4f2f-b50e-87bfea795267.tgz" ]
citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2024): UNITE general FASTA release for eukaryotes. Version 04.04.2024. UNITE Community. https://doi.org/10.15156/BIO/2959334"
fmtscript = "taxref_reformat_unite.sh"
dbversion = "UNITE-alleuk v10.0 (https://doi.org/10.15156/BIO/2959334)"
shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50595465", "https://figshare.scilifelab.se/ndownloader/files/50595471"]
}
'unite-alleuk=9.0' {
title = "UNITE general FASTA release for eukaryotes - Version 9.0"
file = [ "https://s3.hpc.ut.ee/plutof-public/original/e318f5fd-1ef4-40fd-9e77-1b94d91b3858.tgz" ]
citation = "Abarenkov, Kessy; Zirk, Allan; Piirmann, Timo; Pöhönen, Raivo; Ivanov, Filipp; Nilsson, R. Henrik; Kõljalg, Urmas (2023): UNITE general FASTA release for eukaryotes. Version 18.07.2023. UNITE Community. https://doi.org/10.15156/BIO/2938069"
fmtscript = "taxref_reformat_unite.sh"
dbversion = "UNITE-alleuk v9.0 (https://doi.org/10.15156/BIO/2938069)"
shfile = [ "https://figshare.scilifelab.se/ndownloader/files/40788773", "https://figshare.scilifelab.se/ndownloader/files/40788776"]
shfile = [ "https://figshare.scilifelab.se/ndownloader/files/50055768", "https://figshare.scilifelab.se/ndownloader/files/50055771"]
}
'unite-alleuk=8.3' {
title = "UNITE general FASTA release for eukaryotes - Version 8.3"
Expand Down Expand Up @@ -282,6 +305,7 @@ params {
//QIIME2 taxonomic reference databases
qiime_ref_databases {
//SILVA for QIIME2 v2023.7, see https://docs.qiime2.org/2023.7/data-resources/#silva-16s-18s-rrna
//SILVA for QIIME2 v2023.7 md5sums identical to QIIME2 v2024.10, but links on https://docs.qiime2.org/2024.10/data-resources/ are failing
'silva=138' {
title = "QIIME2 pre-formatted SILVA dereplicated at 99% similarity - Version 138"
file = [ "https://data.qiime2.org/2023.7/common/silva-138-99-seqs.qza", "https://data.qiime2.org/2023.7/common/silva-138-99-tax.qza" ]
Expand Down
3 changes: 3 additions & 0 deletions conf/test_multi.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,7 @@ params {
dada_ref_taxonomy = "rdp=18"
skip_dada_addspecies = true
input = params.pipelines_testdata_base_path + "ampliseq/samplesheets/Samplesheet_multi.tsv"

skip_phyloseq = true
skip_tse = true
}
2 changes: 2 additions & 0 deletions conf/test_pacbio_its.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,6 @@ params {

// Prevent default taxonomic classification
skip_dada_taxonomy = true

skip_phyloseq = true
}
2 changes: 2 additions & 0 deletions conf/test_reftaxcustom.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,6 @@ params {

// Skip downstream analysis with QIIME2
skip_qiime_downstream = true

skip_tse = true
}
8 changes: 5 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Differential abundance analysis](#differential-abundance-analysis) - Calling differentially abundant features with ANCOM or ANCOM-BC
- [PICRUSt2](#picrust2) - Predict the functional potential of a bacterial community
- [SBDI export](#sbdi-export) - Swedish Biodiversity Infrastructure (SBDI) submission file
- [Phyloseq](#phyloseq) - Phyloseq R objects
- [R object](#r-objects) - Phyloseq and TreeSummarizedExperiment R objects
- [Read count report](#read-count-report) - Report of read counts during various steps of the pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Expand Down Expand Up @@ -629,15 +629,17 @@ Most of the fields in the template will not be populated by the export process,

</details>

### Phyloseq
### R objects

This directory will hold phyloseq objects for each taxonomy table produced by this pipeline. The objects will contain an ASV abundance table and a taxonomy table. If the pipeline is provided with metadata, that metadata will also be included in the phyloseq object. A phylogenetic tree will also be included if the pipeline produces a tree.
Pipeline results are stored in phyloseq and TreeSummarizedExperiment R objects for each taxonomy table produced by this pipeline. The R objects will contain an ASV abundance table and a taxonomy table, and optionally sequences, metadata and a phylogenetic tree.

<details markdown="1">
<summary>Output files</summary>

- `phyloseq/`
- `<taxonomy>_phyloseq.rds`: Phyloseq R object.
- `treesummarizedexperiment/`
- `<taxonomy>_TreeSummarizedExperiment.rds`: TreeSummarizedExperiment R object.

</details>

Expand Down
Loading

0 comments on commit 8f91ef6

Please sign in to comment.