Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add phylogenetic placement #564

Merged
merged 6 commits into from
Mar 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ jobs:
test_fasta,
test_reftaxcustom,
test_novaseq,
test_pplace,
]
steps:
- name: Check out pipeline code
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#564](https://github.com/nf-core/ampliseq/pull/564) - Added phylogenetic placement

### `Changed`

- [#563](https://github.com/nf-core/ampliseq/pull/563) - Renamed DADA2 taxonomic classification files to include the chosen reference taxonomy abbreviation.
Expand Down
20 changes: 20 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,26 @@

> Sundh J, Manoharan L, Iwaszkiewicz-Eggebrecht E, Miraldo A, Andersson A, Ronquist F. COI reference sequences from BOLD DB. doi: https://doi.org/10.17044/scilifelab.20514192.v2.

### Phylogenetic placement

- nf-core/phyloplace (https://github.com/nf-core/phyloplace, https://nf-co.re/phyloplace) was originally written by Daniel Lundin.

- [HMMER](https://pubmed.ncbi.nlm.nih.gov/22039361/)

> Eddy, Sean R. “Accelerated Profile HMM Searches.” PLoS Comput Biol 7, no. 10 (October 20, 2011): e1002195. https://doi.org/10.1371/journal.pcbi.1002195.

- [MAFFT](https://pubmed.ncbi.nlm.nih.gov/12136088/)

> Katoh, Kazutaka, Kazuharu Misawa, Kei‐ichi Kuma, and Takashi Miyata. “MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform.” Nucleic Acids Research 30, no. 14 (July 15, 2002): 3059–66. https://doi.org/10.1093/nar/gkf436.

- [EPA-NG](https://pubmed.ncbi.nlm.nih.gov/30165689/)

> Barbera, Pierre, Alexey M Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomáš Flouri, and Alexandros Stamatakis. “EPA-Ng: Massively Parallel Evolutionary Placement of Genetic Sequences.” Systematic Biology 68, no. 2 (March 1, 2019): 365–69. https://doi.org/10.1093/sysbio/syy054.

- [Gappa](https://pubmed.ncbi.nlm.nih.gov/32016344/)

> Czech, Lucas, Pierre Barbera, and Alexandros Stamatakis. “Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data.” Bioinformatics 36, no. 10 (May 1, 2020): 3263–65. https://doi.org/10.1093/bioinformatics/btaa070.

### Downstream analysis

- [QIIME2](https://pubmed.ncbi.nlm.nih.gov/31341288/)
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

## Introduction

**nfcore/ampliseq** is a bioinformatics analysis pipeline used for amplicon sequencing, supporting denoising of any amplicon and, currently, taxonomic assignment of 16S, ITS, CO1 and 18S amplicons. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data. Default is the analysis of 16S rRNA gene amplicons sequenced paired-end with Illumina.
**nfcore/ampliseq** is a bioinformatics analysis pipeline used for amplicon sequencing, supporting denoising of any amplicon and, currently, taxonomic assignment of 16S, ITS, CO1 and 18S amplicons. Phylogenetic placement is also possible. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data. Default is the analysis of 16S rRNA gene amplicons sequenced paired-end with Illumina.

<p align="center">
<img src="docs/images/ampliseq_workflow.png" alt="nf-core/ampliseq workflow overview" width="60%">
Expand All @@ -37,6 +37,7 @@ By default, the pipeline currently performs the following:
- Trimming of reads ([Cutadapt](https://journal.embnet.org/index.php/embnetjournal/article/view/200))
- Infer Amplicon Sequence Variants (ASVs) ([DADA2](https://doi.org/10.1038/nmeth.3869))
- Predict whether ASVs are ribosomal RNA sequences ([Barrnap](https://github.com/tseemann/barrnap))
- Phylogenetic placement ([EPA-NG](https://github.com/Pbdas/epa-ng))
- Taxonomical classification using DADA2 or [QIIME2](https://www.nature.com/articles/s41587-019-0209-9)
- Excludes unwanted taxa, produces absolute and relative feature/taxa count tables and plots, plots alpha rarefaction curves, computes alpha and beta diversity indices and plots thereof ([QIIME2](https://www.nature.com/articles/s41587-019-0209-9))
- Calls differentially abundant taxa ([ANCOM](https://www.ncbi.nlm.nih.gov/pubmed/26028277))
Expand Down
155 changes: 155 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,161 @@ process {
]
}

withName: HMMER_HMMBUILD {
ext.prefix = { "${meta.id}.ref" }
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: HMMER_UNALIGNREF {
ext.prefix = { "${meta.id}.ref.unaligned" }
ext.args = "--gapsym=- afa"
ext.postprocessing = '| sed "/^>/!s/-//g"'
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: HMMER_HMMALIGNREF {
ext.prefix = { "${meta.id}.ref.hmmalign" }
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: HMMER_HMMALIGNQUERY {
ext.prefix = { "${meta.id}.query.hmmalign" }
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'HMMER_MASK.*' {
ext.args = '--rf-is-mask'
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'HMMER_MASKQUERY.*' {
ext.prefix = { "${meta.id}.query.hmmalign" }
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'HMMER_MASKREF.*' {
ext.prefix = { "${meta.id}.ref.hmmalign" }
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'HMMER_AFAFORMATQUERY.*' {
ext.prefix = { "${meta.id}.query.hmmalign.masked" }
ext.args = 'afa'
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'HMMER_AFAFORMATREF.*' {
ext.prefix = { "${meta.id}.ref.hmmalign.masked" }
ext.args = 'afa'
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'MAFFT' {
ext.args = '--keeplength'
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'EPANG_PLACE' {
ext.args = { "--model ${meta.model}" }
publishDir = [
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'GAPPA_GRAFT' {
ext.prefix = { "${meta.id}.graft" }
//test_pplace.graft.test_pplace.epa_result.newick
publishDir = [
[
path: { "${params.outdir}/pplace" },
mode: params.publish_dir_mode,
pattern: "*.newick"
],
[
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
]
}

withName: 'GAPPA_ASSIGN' {
ext.prefix = { "${meta.id}.taxonomy" }
ext.args = "--per-query-results --krona --sativa"
ext.when = { taxonomy }
publishDir = [
[
path: { "${params.outdir}/pplace" },
mode: params.publish_dir_mode,
pattern: "*.taxonomy.per_query.tsv"
],
[
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
]
}

withName: 'GAPPA_HEATTREE' {
ext.prefix = { "${meta.id}.heattree" }
ext.args = "--write-nexus-tree --write-phyloxml-tree --write-svg-tree"
publishDir = [
[
path: { "${params.outdir}/pplace" },
mode: params.publish_dir_mode,
pattern: "*.tree.svg"
],
[
path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
]
}

withName: 'QIIME2_INASV|QIIME2_INSEQ|QIIME2_INTAX' {
publishDir = [
path: { "${params.outdir}/qiime2/input" },
Expand Down
46 changes: 46 additions & 0 deletions conf/test_pplace.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/ampliseq -profile test_pplace,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_time = '6.h'

// Input data
FW_primer = "GTGYCAGCMGCCGCGGTAA"
RV_primer = "GGACTACNVGGGTWTCTAAT"
input = "https://raw.githubusercontent.com/nf-core/test-datasets/ampliseq/samplesheets/Samplesheet.tsv"
metadata = "https://raw.githubusercontent.com/nf-core/test-datasets/ampliseq/samplesheets/Metadata.tsv"
dada_ref_taxonomy = false
qiime_ref_taxonomy = "greengenes85"
filter_ssu = "bac"

//this is to remove low abundance ASVs to reduce runtime of downstream processes
min_samples = 2
min_frequency = 10

//pplace
pplace_tree = "https://github.com/nf-core/test-datasets/raw/phyloplace/testdata/cyanos_16s.newick"
pplace_aln = "https://github.com/nf-core/test-datasets/raw/phyloplace/testdata/cyanos_16s.alnfna"
pplace_model = "GTR+F+I+I+R3"
pplace_taxonomy = "https://github.com/nf-core/test-datasets/raw/phyloplace/testdata/cyanos_16s.taxonomy.tsv"
pplace_name = "test_pplace"


//Skip some steps to reduce runtime
skip_alpha_rarefaction = true
skip_fastqc = true
}
19 changes: 19 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [ITSx](#itsx) - Optionally, the ITS region can be extracted
- [Taxonomic classification with DADA2](#taxonomic-classification-with-dada2) - Taxonomic classification of (filtered) ASVs
- [assignSH](#assignsh) - Optionally, a UNITE species hypothesis (SH) can be added to the taxonomy
- [Phlogenetic placement and taxonomic classification](#phylogenetic-placement-and-taxonomic-classification) - Placing ASVs into a phyloenetic tree
- [QIIME2](#qiime2) - Secondary analysis
- [Taxonomic classification](#taxonomic-classification) - Taxonomical classification of ASVs
- [Abundance tables](#abundance-tables) - Exported abundance tables
Expand Down Expand Up @@ -231,6 +232,24 @@ Optionally, a UNITE species hypothesis (SH) can be added to the taxonomy. In sho

</details>

### Phlogenetic placement and taxonomic classification

Phylogenetic placement grafts sequences onto a phylogenetic reference tree and optionally outputs taxonomic annotations. The reference tree is ideally made from full-length high-quality sequences containing better evolutionary signal than short amplicons. It is hence superior to estimating de-novo phylogenetic trees from short amplicon sequences. On providing required reference files, ASV sequences are aligned to the reference alignment with either [HMMER](http://hmmer.org/) (default) or [MAFFT](https://mafft.cbrc.jp/alignment/software/). Subsequently, phylogenetic placement of query sequences is performed with [EPA-NG](https://github.com/Pbdas/epa-ng), and finally a number of summary operations are performed with [Gappa](https://github.com/lczech/gappa). This uses code from [nf-core/phyloplace](https://nf-co.re/phyloplace) in the form of its main [subworkflow](https://github.com/nf-core/modules/tree/master/subworkflows/nf-core/fasta_newick_epang_gappa), therefore its detailed documentation also applies here.

<details markdown="1">
<summary>Output files</summary>

- `pplace/`
- `*.graft.*.epa_result.newick`: Full phylogeny with query sequences grafted on to the reference phylogeny, in newick format.
- `*.taxonomy.per_query.tsv`: Tab separated file with taxonomy information per query from classification by `gappa examine examinassign`
- `*.heattree.tree.svg`: Heattree in SVG format from calling `gappa examine heattree`, see [Gappa documentation](https://github.com/Pbdas/epa-ng/blob/master/README.md) for details.
- `pplace/hmmer/`: Contains intermediatary files if HMMER is used
- `pplace/mafft/`: Contains intermediatary files if MAFFT is used
- `pplace/epang/`: Output files from EPA-NG.
- `pplace/gappa/`: Gappa output described in the [Gappa documentation](https://github.com/Pbdas/epa-ng/blob/master/README.md).

</details>

### QIIME2

**Quantitative Insights Into Microbial Ecology 2** ([QIIME2](https://qiime2.org/)) is a next-generation microbiome bioinformatics platform and the successor of the widely used [QIIME1](https://www.nature.com/articles/nmeth.f.303).
Expand Down
11 changes: 11 additions & 0 deletions lib/WorkflowAmpliseq.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,17 @@ class WorkflowAmpliseq {
System.exit(1)
}

if (params.pplace_tree) {
if (!params.pplace_aln) {
log.error "Missing parameter: Phylogenetic placement requires in addition to `--pplace_tree` also `--pplace_aln`."
System.exit(1)
}
if (!params.pplace_model) {
log.error "Missing parameter: Phylogenetic placement requires in addition to `--pplace_tree` also `--pplace_model`."
System.exit(1)
}
}

if (params.dada_assign_taxlevels && params.sbdiexport) {
log.error "Incompatible parameters: `--sbdiexport` expects specific taxonomics ranks (default) and therefore excludes modifying those using `--dada_assign_taxlevels`."
System.exit(1)
Expand Down
59 changes: 59 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,61 @@
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"epang/place": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
"installed_by": ["fasta_newick_epang_gappa"]
},
"epang/split": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
"installed_by": ["fasta_newick_epang_gappa"]
},
"fastqc": {
"branch": "master",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"gappa/examineassign": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
"installed_by": ["fasta_newick_epang_gappa"]
},
"gappa/examinegraft": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
"installed_by": ["fasta_newick_epang_gappa"]
},
"gappa/examineheattree": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
"installed_by": ["fasta_newick_epang_gappa"]
},
"hmmer/eslalimask": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
"installed_by": ["fasta_newick_epang_gappa"]
},
"hmmer/eslreformat": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
"installed_by": ["fasta_newick_epang_gappa"]
},
"hmmer/hmmalign": {
"branch": "master",
"git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
"installed_by": ["fasta_newick_epang_gappa"]
},
"hmmer/hmmbuild": {
"branch": "master",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["fasta_newick_epang_gappa"]
},
"mafft": {
"branch": "master",
"git_sha": "b265b4ff6a35b133b963b4eaddfca0ffb3395236",
"installed_by": ["fasta_newick_epang_gappa"]
},
"multiqc": {
"branch": "master",
"git_sha": "ee80d14721e76e2e079103b8dcd5d57129e584ba",
Expand All @@ -31,6 +81,15 @@
"installed_by": ["modules"]
}
}
},
"subworkflows": {
"nf-core": {
"fasta_newick_epang_gappa": {
"branch": "master",
"git_sha": "6ad90f5583fb375c60a913a24ed1c79339efc019",
"installed_by": ["subworkflows"]
}
}
}
}
}
Expand Down
Loading