Add sintax #577

jtangrot · 2023-05-03T07:21:05Z

This PR adds the SINTAX algorithm as an alternative for taxonomy assignment, and addresses issue #471.
If --sintax_ref_taxonomy is given, SINTAX is used for taxonomical classification in addition to DADA2. The SINTAX taxonomy is used instead of any other classification in downstream steps (qiime, sbdiexport). I also added the option --skip_dada_taxonomy, to skip running DADA2 to get taxonomies.

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/ampliseq branch on the nf-core/test-datasets repository.
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
Usage Documentation in docs/usage.md is updated.
Output Documentation in docs/output.md is updated.
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

github-actions · 2023-05-03T07:23:10Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit e103718

+| ✅ 157 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   2 tests had warnings |!

❗ Test warnings:

readme - README did not have a Nextflow minimum version badge.
schema_lint - Parameter input is not defined in the correct subschema (input_output_options)

❔ Tests ignored:

files_exist - File is ignored: conf/igenomes.config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-ampliseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-ampliseq_logo_light.png
files_exist - File found: docs/images/nf-core-ampliseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreSchema.groovy
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: lib/WorkflowAmpliseq.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-ampliseq_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.show_hidden_params
nextflow_config - Config variable found: params.schema_ignore_params
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: '2.6.0dev'
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-ampliseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-ampliseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-ampliseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - lib/NfcoreSchema.groovy matches the template
files_unchanged - lib/NfcoreTemplate.groovy matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
files_unchanged - pyproject.toml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (208 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: branch.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.8
Run at 2023-05-11 19:23:16

workflows/ampliseq.nf

d4straub

Looks good to me (except that I feel not confident with my review of ampliseq.nf because it seems to convoluted now).
Docs need a bit of polishing before it can be merged, imho.

docs/output.md

modules/local/format_taxonomy_sintax.nf

workflows/ampliseq.nf

d4straub · 2023-05-04T15:35:52Z

Because I solved the merge conflict: please pull your branch before you continue working!

d4straub · 2023-05-05T07:00:17Z

I broke the test_sintax, sorry. The solution should be (predicted above) to change 'biocontainers/biocontainers:v1.2.0_cv1' to 'docker.io/biocontainers/biocontainers:v1.2.0_cv1' in format_taxonomy_sintax.nf. Because I broke it I should also fix it I think. I hope I am not too intrusive here.

modules/local/format_taxonomy_sintax.nf

d4straub · 2023-05-08T12:26:35Z

I think test_pplace fails because ampliseq.nf contains still FASTA_NEWICK_EPANG_GAPPA.out.grafted_phylogeny which should be replaced by PPLACE_TAXONOMY_WF (but currently doesnt export the grafted_phylogeny).

d4straub

I think that looks great!
Moving dada2 & sintax taxonomy classification to own subworkflows simplifies ampliseq.nf significantly, I think. But my recommendation to put phylogenetic placement taxonomical assignment into its own subworkflow was a very bad idea, sorry, because that is already a subworkflow.

subworkflows/local/dada2_taxonomy_wf.nf

subworkflows/local/pplace_taxonomy_wf.nf

d4straub

Great simplification of the DADA2 taxonomic workflow, thanks! Thats so much easier to read now I think.
I tested a little (not everything) and all I saw was fine, except two tiny points:

the sintax taxonomy table had NA while other (DADA2, QIIME2) are blank when no info is available
Processes SINTAX_TAXONOMY_WF:* and QIIME2_TAXONOMY:QIIME2_CLASSIFY are shown even when not executed, while DADA2_TAXONOMY_WF is not shown when not executed

The first point might be good to address, the second point doesnt matter I think.

On an unrelated note, I recognized that NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_TREE is not listed and therefore no QIIME2_DIVERSITY:* is executing with -profile test. That still works with 2.5.0 but seems already broken in dev, so seems unrelated to your PR. I might have broken that in the phylogenetic palcement PR, because I fiddled with the tree info. Will need to investigate at another time.

jtangrot · 2023-05-10T12:10:32Z

Great simplification of the DADA2 taxonomic workflow, thanks! Thats so much easier to read now I think.

Thanks for the pointers!

I tested a little (not everything) and all I saw was fine, except two tiny points:
* the sintax taxonomy table had NA while other (DADA2, QIIME2) are blank when no info is available

True, fixed it.

* Processes SINTAX_TAXONOMY_WF:* and QIIME2_TAXONOMY:QIIME2_CLASSIFY are shown even when not executed, while DADA2_TAXONOMY_WF is not shown when not executed

This I think is because there is a check if DADA2_TAXONOMY_WF should run, while there is no test for e.g. SINTAX_TAXONOMY_WF:*, instead they are fed an empty input channel and therefore not run. I added a test for sintax, but left the qiime things as they are. There are a lot of qiime modules that are run or not depending on their input channels.

jtangrot · 2023-05-15T05:42:52Z

Thanks!

jtangrot added 27 commits March 7, 2023 13:16

Import vsearch/sintax module

7741fee

Add sintax databases to config

4ff980e

Include settings for vsearch sintax

dfbff72

Add sintax option to workflow

8b0c952

add sintax_ref_taxonomy to schema

957cc55

Set 1 cpu for reproducibility

9ada47a

Add coidb and unite 8.2

167e166

Use only --sintax_ref_taxonomy and remove --sintax parameter

0d3cc2c

Convert sintax output and enable input to qiime

46e2a8f

Enable cutits for sintax

bf5c4f2

Merge with current dev

649b251

Fix typo

c5e813f

Fix quotation marks

f5c855c

Add test profile for sintax

3fbdaf8

Fix typo

ba1e9eb

Fix test config

fd05b12

Fix test config

16bed62

Update help text

c690195

Use sintax results in sbdiexport

8764715

Add parameter skip_dada_taxonomy

2d7482d

Fix input to qiime when no dada2 taxonomy

5ae1f6b

Revert test config to pacbio data

9eb2717

Only run ITSx once

602590b

Merge with current dev

2afb7a0

Add docs and citation

4d7fa38

Prettier

25605bd

Revert test config to pacbio data, fix qiime test

3942988

jtangrot added 2 commits May 3, 2023 09:27

Update CHANGELOG

e1916e3

Fix spaces

e1d0f0e

d4straub reviewed May 3, 2023

View reviewed changes

workflows/ampliseq.nf Show resolved Hide resolved

Fix bug when sintax_ref_taxonomy is not set

0f7b7fa

d4straub reviewed May 4, 2023

View reviewed changes

Merge branch 'dev' into add_sintax

b6b6b5c

d4straub reviewed May 5, 2023

View reviewed changes

modules/local/format_taxonomy_sintax.nf Outdated Show resolved Hide resolved

d4straub and others added 6 commits May 5, 2023 09:01

Add docker.io to modules/local/format_taxonomy_sintax.nf

8956649

Move sintax taxonomy to subworkflow

d155d52

Move pplace taxonomy to subworkflow

afa4f9a

Move dada2 taxonomy to subworkflow

6a6310e

Fix docs

b34a187

Fix bug in sintax results conversion

528516e

d4straub reviewed May 9, 2023

View reviewed changes

subworkflows/local/dada2_taxonomy_wf.nf Outdated Show resolved Hide resolved

subworkflows/local/pplace_taxonomy_wf.nf Outdated Show resolved Hide resolved

jtangrot added 2 commits May 9, 2023 13:33

Tidy up dada2_taxonomy workflow

14229f3

Remove subworkflow for pplace

8a67f42

d4straub approved these changes May 10, 2023

View reviewed changes

jtangrot added 3 commits May 10, 2023 13:27

Change NA to blank in sintax table

ddfebe2

Add check if sintax should run

0135398

Remove comment

9b2688d

d4straub mentioned this pull request May 10, 2023

fix diversity analysis #582

Merged

9 tasks

Merge branch 'dev' into add_sintax

e103718

jtangrot merged commit 1e1d001 into nf-core:dev May 15, 2023

jtangrot deleted the add_sintax branch May 15, 2023 05:42

jtangrot mentioned this pull request Jun 15, 2023

Add VSEARCH/SINTAX taxonomy annotation #471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sintax #577

Add sintax #577

jtangrot commented May 3, 2023

github-actions bot commented May 3, 2023 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

d4straub left a comment

d4straub commented May 4, 2023

d4straub commented May 5, 2023 •

edited

Loading

d4straub commented May 8, 2023

d4straub left a comment

d4straub left a comment •

edited

Loading

jtangrot commented May 10, 2023

jtangrot commented May 15, 2023

Add sintax #577

Add sintax #577

Conversation

jtangrot commented May 3, 2023

PR checklist

github-actions bot commented May 3, 2023 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

d4straub left a comment

Choose a reason for hiding this comment

d4straub commented May 4, 2023

d4straub commented May 5, 2023 • edited Loading

d4straub commented May 8, 2023

d4straub left a comment

Choose a reason for hiding this comment

d4straub left a comment • edited Loading

Choose a reason for hiding this comment

jtangrot commented May 10, 2023

jtangrot commented May 15, 2023

github-actions bot commented May 3, 2023 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

d4straub commented May 5, 2023 •

edited

Loading

d4straub left a comment •

edited

Loading