nf-core · d4straub · Mar 29, 2023 · Mar 28, 2023 · Mar 28, 2023 · Mar 28, 2023
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -57,6 +57,7 @@ jobs:
             test_fasta,
             test_reftaxcustom,
             test_novaseq,
+            test_pplace,
           ]
     steps:
       - name: Check out pipeline code

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### `Added`
 
+- [#564](https://github.com/nf-core/ampliseq/pull/564) - Added phylogenetic placement
+
 ### `Changed`
 
 - [#563](https://github.com/nf-core/ampliseq/pull/563) - Renamed DADA2 taxonomic classification files to include the chosen reference taxonomy abbreviation.

diff --git a/CITATIONS.md b/CITATIONS.md
@@ -67,6 +67,26 @@
 
     > Sundh J, Manoharan L, Iwaszkiewicz-Eggebrecht E, Miraldo A, Andersson A, Ronquist F. COI reference sequences from BOLD DB. doi: https://doi.org/10.17044/scilifelab.20514192.v2.
 
+### Phylogenetic placement
+
+- nf-core/phyloplace (https://github.com/nf-core/phyloplace, https://nf-co.re/phyloplace) was originally written by Daniel Lundin.
+
+- [HMMER](https://pubmed.ncbi.nlm.nih.gov/22039361/)
+
+  > Eddy, Sean R. “Accelerated Profile HMM Searches.” PLoS Comput Biol 7, no. 10 (October 20, 2011): e1002195. https://doi.org/10.1371/journal.pcbi.1002195.
+
+- [MAFFT](https://pubmed.ncbi.nlm.nih.gov/12136088/)
+
+  > Katoh, Kazutaka, Kazuharu Misawa, Kei‐ichi Kuma, and Takashi Miyata. “MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform.” Nucleic Acids Research 30, no. 14 (July 15, 2002): 3059–66. https://doi.org/10.1093/nar/gkf436.
+
+- [EPA-NG](https://pubmed.ncbi.nlm.nih.gov/30165689/)
+
+  > Barbera, Pierre, Alexey M Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomáš Flouri, and Alexandros Stamatakis. “EPA-Ng: Massively Parallel Evolutionary Placement of Genetic Sequences.” Systematic Biology 68, no. 2 (March 1, 2019): 365–69. https://doi.org/10.1093/sysbio/syy054.
+
+- [Gappa](https://pubmed.ncbi.nlm.nih.gov/32016344/)
+
+  > Czech, Lucas, Pierre Barbera, and Alexandros Stamatakis. “Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data.” Bioinformatics 36, no. 10 (May 1, 2020): 3263–65. https://doi.org/10.1093/bioinformatics/btaa070.
+
 ### Downstream analysis
 
 - [QIIME2](https://pubmed.ncbi.nlm.nih.gov/31341288/)

diff --git a/README.md b/README.md
@@ -19,7 +19,7 @@
 
 ## Introduction
 
-**nfcore/ampliseq** is a bioinformatics analysis pipeline used for amplicon sequencing, supporting denoising of any amplicon and, currently, taxonomic assignment of 16S, ITS, CO1 and 18S amplicons. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data. Default is the analysis of 16S rRNA gene amplicons sequenced paired-end with Illumina.
+**nfcore/ampliseq** is a bioinformatics analysis pipeline used for amplicon sequencing, supporting denoising of any amplicon and, currently, taxonomic assignment of 16S, ITS, CO1 and 18S amplicons. Phylogenetic placement is also possible. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data. Default is the analysis of 16S rRNA gene amplicons sequenced paired-end with Illumina.
 
 <p align="center">
     <img src="docs/images/ampliseq_workflow.png" alt="nf-core/ampliseq workflow overview" width="60%">
@@ -37,6 +37,7 @@ By default, the pipeline currently performs the following:
 - Trimming of reads ([Cutadapt](https://journal.embnet.org/index.php/embnetjournal/article/view/200))
 - Infer Amplicon Sequence Variants (ASVs) ([DADA2](https://doi.org/10.1038/nmeth.3869))
 - Predict whether ASVs are ribosomal RNA sequences ([Barrnap](https://github.com/tseemann/barrnap))
+- Phylogenetic placement ([EPA-NG](https://github.com/Pbdas/epa-ng))
 - Taxonomical classification using DADA2 or [QIIME2](https://www.nature.com/articles/s41587-019-0209-9)
 - Excludes unwanted taxa, produces absolute and relative feature/taxa count tables and plots, plots alpha rarefaction curves, computes alpha and beta diversity indices and plots thereof ([QIIME2](https://www.nature.com/articles/s41587-019-0209-9))
 - Calls differentially abundant taxa ([ANCOM](https://www.ncbi.nlm.nih.gov/pubmed/26028277))

diff --git a/conf/modules.config b/conf/modules.config
@@ -418,6 +418,161 @@ process {
         ]
     }
 
+    withName: HMMER_HMMBUILD {
+        ext.prefix = { "${meta.id}.ref" }
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: HMMER_UNALIGNREF {
+        ext.prefix = { "${meta.id}.ref.unaligned" }
+        ext.args   = "--gapsym=- afa"
+        ext.postprocessing = '| sed "/^>/!s/-//g"'
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: HMMER_HMMALIGNREF {
+        ext.prefix = { "${meta.id}.ref.hmmalign" }
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: HMMER_HMMALIGNQUERY {
+        ext.prefix = { "${meta.id}.query.hmmalign" }
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'HMMER_MASK.*' {
+        ext.args   = '--rf-is-mask'
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'HMMER_MASKQUERY.*' {
+        ext.prefix = { "${meta.id}.query.hmmalign" }
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'HMMER_MASKREF.*' {
+        ext.prefix = { "${meta.id}.ref.hmmalign" }
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'HMMER_AFAFORMATQUERY.*' {
+        ext.prefix = { "${meta.id}.query.hmmalign.masked" }
+        ext.args   = 'afa'
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'HMMER_AFAFORMATREF.*' {
+        ext.prefix = { "${meta.id}.ref.hmmalign.masked" }
+        ext.args   = 'afa'
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'MAFFT' {
+        ext.args = '--keeplength'
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'EPANG_PLACE' {
+        ext.args   = { "--model ${meta.model}" }
+        publishDir = [
+            path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
+    withName: 'GAPPA_GRAFT' {
+        ext.prefix = { "${meta.id}.graft" }
+        //test_pplace.graft.test_pplace.epa_result.newick
+        publishDir = [
+            [
+                path: { "${params.outdir}/pplace" },
+                mode: params.publish_dir_mode,
+                pattern: "*.newick"
+            ],
+            [
+                path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+                mode: params.publish_dir_mode,
+                saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+            ]
+        ]
+    }
+
+    withName: 'GAPPA_ASSIGN' {
+        ext.prefix = { "${meta.id}.taxonomy" }
+        ext.args   = "--per-query-results --krona --sativa"
+        ext.when   = { taxonomy }
+        publishDir = [
+            [
+                path: { "${params.outdir}/pplace" },
+                mode: params.publish_dir_mode,
+                pattern: "*.taxonomy.per_query.tsv"
+            ],
+            [
+                path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+                mode: params.publish_dir_mode,
+                saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+            ]
+        ]
+    }
+
+    withName: 'GAPPA_HEATTREE' {
+        ext.prefix = { "${meta.id}.heattree" }
+        ext.args = "--write-nexus-tree --write-phyloxml-tree --write-svg-tree"
+        publishDir = [
+            [
+                path: { "${params.outdir}/pplace" },
+                mode: params.publish_dir_mode,
+                pattern: "*.tree.svg"
+            ],
+            [
+                path: { "${params.outdir}/pplace/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
+                mode: params.publish_dir_mode,
+                saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+            ]
+        ]
+    }
+
     withName: 'QIIME2_INASV|QIIME2_INSEQ|QIIME2_INTAX' {
         publishDir = [
             path: { "${params.outdir}/qiime2/input" },

diff --git a/conf/test_pplace.config b/conf/test_pplace.config
@@ -0,0 +1,46 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a fast and simple pipeline test.
+
+    Use as follows:
+        nextflow run nf-core/ampliseq -profile test_pplace,<docker/singularity> --outdir <OUTDIR>
+
+----------------------------------------------------------------------------------------
+*/
+
+params {
+    config_profile_name = 'Test profile'
+    config_profile_description = 'Minimal test dataset to check pipeline function'
+
+    // Limit resources so that this can run on GitHub Actions
+    max_cpus   = 2
+    max_memory = '6.GB'
+    max_time   = '6.h'
+
+    // Input data
+    FW_primer = "GTGYCAGCMGCCGCGGTAA"
+    RV_primer = "GGACTACNVGGGTWTCTAAT"
+    input = "https://raw.githubusercontent.com/nf-core/test-datasets/ampliseq/samplesheets/Samplesheet.tsv"
+    metadata = "https://raw.githubusercontent.com/nf-core/test-datasets/ampliseq/samplesheets/Metadata.tsv"
+    dada_ref_taxonomy = false
+    qiime_ref_taxonomy = "greengenes85"
+    filter_ssu = "bac"
+
+    //this is to remove low abundance ASVs to reduce runtime of downstream processes
+    min_samples = 2
+    min_frequency = 10
+
+    //pplace
+    pplace_tree = "https://github.com/nf-core/test-datasets/raw/phyloplace/testdata/cyanos_16s.newick"
+    pplace_aln = "https://github.com/nf-core/test-datasets/raw/phyloplace/testdata/cyanos_16s.alnfna"
+    pplace_model = "GTR+F+I+I+R3"
+    pplace_taxonomy = "https://github.com/nf-core/test-datasets/raw/phyloplace/testdata/cyanos_16s.taxonomy.tsv"
+    pplace_name = "test_pplace"
+
+
+    //Skip some steps to reduce runtime
+    skip_alpha_rarefaction = true
+    skip_fastqc = true
+}
diff --git a/docs/output.md b/docs/output.md
@@ -22,6 +22,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
   - [ITSx](#itsx) - Optionally, the ITS region can be extracted
 - [Taxonomic classification with DADA2](#taxonomic-classification-with-dada2) - Taxonomic classification of (filtered) ASVs
   - [assignSH](#assignsh) - Optionally, a UNITE species hypothesis (SH) can be added to the taxonomy
+- [Phlogenetic placement and taxonomic classification](#phylogenetic-placement-and-taxonomic-classification) - Placing ASVs into a phyloenetic tree
 - [QIIME2](#qiime2) - Secondary analysis
   - [Taxonomic classification](#taxonomic-classification) - Taxonomical classification of ASVs
   - [Abundance tables](#abundance-tables) - Exported abundance tables
@@ -231,6 +232,24 @@ Optionally, a UNITE species hypothesis (SH) can be added to the taxonomy. In sho
 
 </details>
 
+### Phlogenetic placement and taxonomic classification
+
+Phylogenetic placement grafts sequences onto a phylogenetic reference tree and optionally outputs taxonomic annotations. The reference tree is ideally made from full-length high-quality sequences containing better evolutionary signal than short amplicons. It is hence superior to estimating de-novo phylogenetic trees from short amplicon sequences. On providing required reference files, ASV sequences are aligned to the reference alignment with either [HMMER](http://hmmer.org/) (default) or [MAFFT](https://mafft.cbrc.jp/alignment/software/). Subsequently, phylogenetic placement of query sequences is performed with [EPA-NG](https://github.com/Pbdas/epa-ng), and finally a number of summary operations are performed with [Gappa](https://github.com/lczech/gappa). This uses code from [nf-core/phyloplace](https://nf-co.re/phyloplace) in the form of its main [subworkflow](https://github.com/nf-core/modules/tree/master/subworkflows/nf-core/fasta_newick_epang_gappa), therefore its detailed documentation also applies here.
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `pplace/`
+  - `*.graft.*.epa_result.newick`: Full phylogeny with query sequences grafted on to the reference phylogeny, in newick format.
+  - `*.taxonomy.per_query.tsv`: Tab separated file with taxonomy information per query from classification by `gappa examine examinassign`
+  - `*.heattree.tree.svg`: Heattree in SVG format from calling `gappa examine heattree`, see [Gappa documentation](https://github.com/Pbdas/epa-ng/blob/master/README.md) for details.
+  - `pplace/hmmer/`: Contains intermediatary files if HMMER is used
+  - `pplace/mafft/`: Contains intermediatary files if MAFFT is used
+  - `pplace/epang/`: Output files from EPA-NG.
+  - `pplace/gappa/`: Gappa output described in the [Gappa documentation](https://github.com/Pbdas/epa-ng/blob/master/README.md).
+
+</details>
+
 ### QIIME2
 
 **Quantitative Insights Into Microbial Ecology 2** ([QIIME2](https://qiime2.org/)) is a next-generation microbiome bioinformatics platform and the successor of the widely used [QIIME1](https://www.nature.com/articles/nmeth.f.303).

diff --git a/lib/WorkflowAmpliseq.groovy b/lib/WorkflowAmpliseq.groovy
@@ -63,6 +63,17 @@ class WorkflowAmpliseq {
             System.exit(1)
         }
 
+        if (params.pplace_tree) {
+            if (!params.pplace_aln) {
+                log.error "Missing parameter: Phylogenetic placement requires in addition to `--pplace_tree` also `--pplace_aln`."
+                System.exit(1)
+            }
+            if (!params.pplace_model) {
+                log.error "Missing parameter: Phylogenetic placement requires in addition to `--pplace_tree` also `--pplace_model`."
+                System.exit(1)
+            }
+        }
+
         if (params.dada_assign_taxlevels && params.sbdiexport) {
             log.error "Incompatible parameters: `--sbdiexport` expects specific taxonomics ranks (default) and therefore excludes modifying those using `--dada_assign_taxlevels`."
             System.exit(1)

diff --git a/modules.json b/modules.json
@@ -15,11 +15,61 @@
                         "git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
                         "installed_by": ["modules"]
                     },
+                    "epang/place": {
+                        "branch": "master",
+                        "git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
+                        "installed_by": ["fasta_newick_epang_gappa"]
+                    },
+                    "epang/split": {
+                        "branch": "master",
+                        "git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
+                        "installed_by": ["fasta_newick_epang_gappa"]
+                    },
                     "fastqc": {
                         "branch": "master",
                         "git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
                         "installed_by": ["modules"]
                     },
+                    "gappa/examineassign": {
+                        "branch": "master",
+                        "git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
+                        "installed_by": ["fasta_newick_epang_gappa"]
+                    },
+                    "gappa/examinegraft": {
+                        "branch": "master",
+                        "git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
+                        "installed_by": ["fasta_newick_epang_gappa"]
+                    },
+                    "gappa/examineheattree": {
+                        "branch": "master",
+                        "git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
+                        "installed_by": ["fasta_newick_epang_gappa"]
+                    },
+                    "hmmer/eslalimask": {
+                        "branch": "master",
+                        "git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
+                        "installed_by": ["fasta_newick_epang_gappa"]
+                    },
+                    "hmmer/eslreformat": {
+                        "branch": "master",
+                        "git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
+                        "installed_by": ["fasta_newick_epang_gappa"]
+                    },
+                    "hmmer/hmmalign": {
+                        "branch": "master",
+                        "git_sha": "0f8a77ff00e65eaeebc509b8156eaa983192474b",
+                        "installed_by": ["fasta_newick_epang_gappa"]
+                    },
+                    "hmmer/hmmbuild": {
+                        "branch": "master",
+                        "git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
+                        "installed_by": ["fasta_newick_epang_gappa"]
+                    },
+                    "mafft": {
+                        "branch": "master",
+                        "git_sha": "b265b4ff6a35b133b963b4eaddfca0ffb3395236",
+                        "installed_by": ["fasta_newick_epang_gappa"]
+                    },
                     "multiqc": {
                         "branch": "master",
                         "git_sha": "ee80d14721e76e2e079103b8dcd5d57129e584ba",
@@ -31,6 +81,15 @@
                         "installed_by": ["modules"]
                     }
                 }
+            },
+            "subworkflows": {
+                "nf-core": {
+                    "fasta_newick_epang_gappa": {
+                        "branch": "master",
+                        "git_sha": "6ad90f5583fb375c60a913a24ed1c79339efc019",
+                        "installed_by": ["subworkflows"]
+                    }
+                }
             }
         }
     }