Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dada2 MergePairs : consensus between merging & concatenating reads #803

Open
wants to merge 15 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### `Added`

- [#804](https://github.com/nf-core/ampliseq/pull/804) - Added version 10 of Unite as parameter for `--dada_ref_taxonomy` (issue [#768](https://github.com/nf-core/ampliseq/issues/768))
- [#803](https://github.com/nf-core/ampliseq/pull/803) - New parameters introduced related to `--mergepairs_strategy`. These parameters would only be effective if `--mergepairs_strategy consensus` is set.

| **Parameter** | **Description** | **Default Value** |
| ------------------------------------------ | ----------------------------------------------------------------------------------------- | ----------------- |
| **mergepairs_consensus_match** | The score assigned for each matching base pair during sequence alignment. | 5 |
| **mergepairs_consensus_mismatch** | The penalty score assigned for each mismatched base pair during sequence alignment. | -6 |
| **mergepairs_consensus_gap** | The penalty score assigned for each gap introduced during sequence alignment. | -64 |
| **mergepairs_consensus_minoverlap** | The minimum number of overlapping base pairs required to merge forward and reverse reads. | 12 |
| **mergepairs_consensus_maxmismatch** | The maximum number of mismatches allowed within the overlapping region for merging reads. | 0 |
| **mergepairs_consensus_percentile_cutoff** | The percentile cutoff determining the minimum observed overlap in the dataset. | 0.001 |
=======

### `Changed`

- [#803](https://github.com/nf-core/ampliseq/pull/803) - Changed DADA2_DENOISING : `--concatenate_reads` renaming to `--mergepairs_strategy` ; support new method named "consensus" by setting `--mergepairs_strategy consensus` ; changed options of `--mergepairs_strategy` from TRUE/FALSE (boolean) to ["merge", "concatenate", "consensus"].

### `Fixed`

- [#800](https://github.com/nf-core/ampliseq/pull/800) - Fixed SH files for UNITE9.0, they were missing some entries due to a bug caused by API update in PlutoF
Expand Down
7 changes: 5 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -232,9 +232,12 @@ process {
].join(',').replaceAll('(,)*$', "")
// setting from https://rdrr.io/bioc/dada2/man/mergePairs.html & https://rdrr.io/bioc/dada2/man/nwalign.html & match = getDadaOpt("MATCH"), mismatch = getDadaOpt("MISMATCH"), gap = getDadaOpt("GAP_PENALTY"), missing from the list below is: 'band = -1'
ext.args2 = [
'minOverlap = 12, maxMismatch = 0, returnRejects = FALSE, propagateCol = character(0), trimOverhang = FALSE, match = 1, mismatch = -64, gap = -64, homo_gap = NULL, endsfree = TRUE, vec = FALSE',
params.concatenate_reads ? "justConcatenate = TRUE" : "justConcatenate = FALSE"
"homo_gap = NULL, endsfree = TRUE, vec = FALSE, propagateCol = character(0)",
d4straub marked this conversation as resolved.
Show resolved Hide resolved
params.mergepairs_strategy == "consensus" ?
"returnRejects = TRUE, match = ${params.mergepairs_consensus_match}, mismatch = ${params.mergepairs_consensus_mismatch}, minOverlap = ${params.mergepairs_consensus_minoverlap}, maxMismatch = ${params.mergepairs_consensus_maxmismatch}, gap = ${params.mergepairs_consensus_gap}" :
"justConcatenate = ${params.mergepairs_strategy == 'concatenate' ? 'TRUE' : 'FALSE'}, returnRejects = FALSE, match = 1, mismatch = -64, gap = -64, minOverlap = 12, maxMismatch = 0"
].join(',').replaceAll('(,)*$', "")
ext.quantile = "${params.mergepairs_consensus_percentile_cutoff}"
publishDir = [
[
path: { "${params.outdir}/dada2/args" },
Expand Down
52 changes: 47 additions & 5 deletions modules/local/dada2_denoising.nf
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,14 @@ process DADA2_DENOISING {
def prefix = task.ext.prefix ?: "prefix"
def args = task.ext.args ?: ''
def args2 = task.ext.args2 ?: ''
def quantile = task.ext.quantile ?: 0.001
if (!meta.single_end) {
"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))

errF = readRDS("${errormodel[0]}")
errR = readRDS("${errormodel[1]}")
errF <- readRDS("${errormodel[0]}")
errR <- readRDS("${errormodel[1]}")

filtFs <- sort(list.files("./filtered/", pattern = "_1.filt.fastq.gz", full.names = TRUE), method = "radix")
filtRs <- sort(list.files("./filtered/", pattern = "_2.filt.fastq.gz", full.names = TRUE), method = "radix")
Expand All @@ -45,9 +46,50 @@ process DADA2_DENOISING {
saveRDS(dadaRs, "${prefix}_2.dada.rds")
sink(file = NULL)

#make table
mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, $args2, verbose=TRUE)
saveRDS(mergers, "${prefix}.mergers.rds")
# merge
if ("${params.asv_concatenate_reads}" == "consensus") {
d4straub marked this conversation as resolved.
Show resolved Hide resolved
mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, $args2, justConcatenate = FALSE, verbose=TRUE)
concats <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, $args2, justConcatenate = TRUE, verbose=TRUE)

# in case there is only one sample in the entire run
if (is.data.frame(mergers)) {
mergers <- list(sample = mergers)
concats <- list(sample = concats)
}

# define the overlap threshold to decide if concatenation or not
min_overlap_obs <- lapply(mergers, function(X) {
mergers_accepted <- X[["accept"]]
if (sum(mergers_accepted) > 0) {
min_overlap_obs <- X[["nmatch"]][mergers_accepted] + X[["nmismatch"]][mergers_accepted]
rep(min_overlap_obs, X[["abundance"]][mergers_accepted])
} else {
NA
}
})

min_overlap_obs <- Reduce(c, min_overlap_obs)
min_overlap_obs <- min_overlap_obs[!is.na(min_overlap_obs)]
min_overlap_obs <- quantile(min_overlap_obs, $quantile)

for (x in names(mergers)) {
to_concat <- !mergers[[x]][["accept"]] & (mergers[[x]][["nmismatch"]] + mergers[[x]][["nmatch"]]) < min_overlap_obs

if (sum(to_concat) > 0) {
mergers[[x]][to_concat, ] <- concats[[x]][to_concat, ]
# filter out unaccepted non concatenated sequences
mergers[[x]] <- mergers[[x]][mergers[[x]][["accept"]], ]
}

}

} else {
mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, $args2, verbose=TRUE)
}

saveRDS(mergers, "${meta.run}.mergers.rds")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
saveRDS(mergers, "${meta.run}.mergers.rds")
saveRDS(mergers, "${prefix}.mergers.rds")

better late than never, this seems to me as a breaking change under certain circumstances, was this intentional?


# make table
seqtab <- makeSequenceTable(mergers)
saveRDS(seqtab, "${prefix}.seqtab.rds")

Expand Down
124 changes: 65 additions & 59 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,65 +23,71 @@ params {
metadata = null

// Other options
save_intermediates = false
trunc_qmin = 25
trunc_rmin = 0.75
trunclenf = null
trunclenr = null
max_ee = 2
max_len = null
ignore_failed_filtering = false
min_len = 50
metadata_category = null
metadata_category_barplot = null
double_primer = false
retain_untrimmed = false
cutadapt_min_overlap = 3
cutadapt_max_error_rate = 0.1
exclude_taxa = "mitochondria,chloroplast"
min_frequency = 1
min_samples = 1
multiple_sequencing_runs = false
single_end = false
sample_inference = "independent"
illumina_novaseq = false
illumina_pe_its = false
concatenate_reads = false
cut_its = "none"
its_partial = 0
picrust = false
sbdiexport = false
addsh = false
tax_agglom_min = 2
tax_agglom_max = 6
min_read_counts = 1
ignore_failed_trimming = false
ignore_empty_input_files = false
qiime_adonis_formula = null
seed = 100
filter_ssu = null
min_len_asv = null
max_len_asv = null
filter_codons = null
orf_start = 1
orf_end = null
stop_codons = "TAA,TAG"
pplace_tree = null
pplace_aln = null
pplace_model = null
pplace_alnmethod = 'hmmer'
pplace_taxonomy = null
pplace_name = null
diversity_rarefaction_depth= 500
ancom_sample_min_count = 1
vsearch_cluster = null
vsearch_cluster_id = 0.97
ancom = false
ancombc = false
ancombc_effect_size = 1
ancombc_significance = 0.05
ancombc_formula = null
ancombc_formula_reflvl = null
save_intermediates = false
trunc_qmin = 25
trunc_rmin = 0.75
trunclenf = null
trunclenr = null
max_ee = 2
max_len = null
ignore_failed_filtering = false
min_len = 50
metadata_category = null
metadata_category_barplot = null
double_primer = false
retain_untrimmed = false
cutadapt_min_overlap = 3
cutadapt_max_error_rate = 0.1
exclude_taxa = "mitochondria,chloroplast"
min_frequency = 1
min_samples = 1
multiple_sequencing_runs = false
single_end = false
sample_inference = "independent"
illumina_novaseq = false
illumina_pe_its = false
mergepairs_strategy = "merge"
mergepairs_consensus_minoverlap = 12
mergepairs_consensus_maxmismatch = 0
mergepairs_consensus_gap = -64
mergepairs_consensus_match = 5
mergepairs_consensus_mismatch = -6
mergepairs_consensus_percentile_cutoff = 0.001
cut_its = "none"
its_partial = 0
picrust = false
sbdiexport = false
addsh = false
tax_agglom_min = 2
tax_agglom_max = 6
min_read_counts = 1
ignore_failed_trimming = false
ignore_empty_input_files = false
qiime_adonis_formula = null
seed = 100
filter_ssu = null
min_len_asv = null
max_len_asv = null
filter_codons = null
orf_start = 1
orf_end = null
stop_codons = "TAA,TAG"
pplace_tree = null
pplace_aln = null
pplace_model = null
pplace_alnmethod = 'hmmer'
pplace_taxonomy = null
pplace_name = null
diversity_rarefaction_depth = 500
ancom_sample_min_count = 1
vsearch_cluster = null
vsearch_cluster_id = 0.97
ancom = false
ancombc = false
ancombc_effect_size = 1
ancombc_significance = 0.05
ancombc_formula = null
ancombc_formula_reflvl = null

// Report options
report_template = "${projectDir}/assets/report_template.Rmd"
Expand Down
Loading
Loading