Report shows wrong taxonomic classification stats for QIIME with UNITE #652

d4straub · 2023-10-23T14:08:20Z

Description of the bug

When using UNITE fungi with QIIME2 for taxonomic classification, the statistics in the summary report (results/summary_report/summary_report.html) shows for rank "Kingdom" 100% classification, while all other ranks receive 0%.

This is because UNITE database contains strings such as

k__Fungi;p__Ascomycota;c__Eurotiomycetes;o__Eurotiales;f__Aspergillaceae;g__Aspergillus;s__Aspergillus_penicillioides
k__Fungi
k__Fungi;p__Ascomycota

while Greengenes 16S - Version 13_8 produces taxonomic strings such as

k__Bacteria; p__Proteobacteria; c__Betaproteobacteria
k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Gallionellales; f__Gallionellaceae
k__Bacteria; p__Bacteroidetes; c__Flavobacteriia; o__Flavobacteriales; f__Flavobacteriaceae; g__Flavobacterium; s__
k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Comamonadaceae; g__Rhodoferax; s__

and parsing for the report takes only the Greengenes format into account with

ampliseq/assets/report_template.Rmd

Lines 991 to 999 in 4e48b71

    
           # Remove greengenes85 ".__" placeholders 
        
           df = as.data.frame(lapply(asv_tax, function(x) gsub(".__", "", x))) 
        
           # remove all last, empty ; 
        
           df = as.data.frame(lapply(df, function(x) gsub(" ;","",x))) 
        
           # remove last remaining, empty ; 
        
           df = as.data.frame(lapply(df, function(x) gsub("; $","",x))) 
        
           # get maximum amount of taxa levels per ASV 
        
           max_taxa <- lengths(regmatches(df$Taxon, gregexpr("; ", df$Taxon)))+1

Other taxonomic classifications that I did, i.e. DADA2 with UNITE-Fungi, Kraken2, and SINTAX with UNITE-Fungi (see below), were fine.

Command used and terminal output

nextflow run nf-core/ampliseq -r 2.7.0 -profile cfc --FW_primer CTTGGTCATTTAGAGGAAGTAA --RV_primer GCTGCGTTCTTCATCGATGC --input_fasta "ASV_seqs.fasta" --min_len_asv 1 --dada_ref_taxonomy "unite-fungi=9.0" --sintax_ref_taxonomy "unite-fungi=9.0" --kraken2_ref_tax_custom "https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_20231009.tar.gz" --kraken2_assign_taxlevels "D,P,C,O,F,G,S" --qiime_ref_taxonomy "unite-fungi" --outdir reclassification

Relevant files

No response

System information

No response

The text was updated successfully, but these errors were encountered:

d4straub · 2023-10-23T14:27:21Z

Doesnt really fit in here, but the stats of the length filter also seems off:
I used --min_len_asv 1 (as above, just to get the distribution figure in the report) and the report says

Filtering omitted all ASVs with length lower than 1 bp.

The number of ASVs was reduced by 27.5 ( 1.51 %), from 1817.5 to 1790 ASVs.

which isnt right, because there were 1790 ASVs already in the input file and no ASV was removed.
The figure itself seems to be fine.

d4straub · 2023-10-25T08:41:05Z

Documentation issues:

https://nf-co.re/ampliseq/2.7.0/parameters#multiple_sequencing_runs & https://nf-co.re/ampliseq/2.7.0/parameters#extension should reference --input_folder instead of --input in the help text. Also, that should be more clear in the description.
https://nf-co.re/ampliseq/2.7.0/parameters#asv-filtering should be before https://nf-co.re/ampliseq/2.7.0/parameters#taxonomic-database (--exclude_taxa is after taxonomic classification, 3 bottom ones are technically all in downstream analysis because in QIIME2)

d4straub · 2023-11-10T09:00:04Z

This is in dev now, closing the issue.

d4straub added the bug Something isn't working label Oct 23, 2023

d4straub mentioned this issue Nov 9, 2023

Improve reporting #657

Merged

10 tasks

d4straub closed this as completed Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report shows wrong taxonomic classification stats for QIIME with UNITE #652

Report shows wrong taxonomic classification stats for QIIME with UNITE #652

d4straub commented Oct 23, 2023 •

edited

Loading

d4straub commented Oct 23, 2023 •

edited

Loading

d4straub commented Oct 25, 2023 •

edited

Loading

d4straub commented Nov 10, 2023

Report shows wrong taxonomic classification stats for QIIME with UNITE #652

Report shows wrong taxonomic classification stats for QIIME with UNITE #652

Comments

d4straub commented Oct 23, 2023 • edited Loading

Description of the bug

Command used and terminal output

Relevant files

System information

d4straub commented Oct 23, 2023 • edited Loading

d4straub commented Oct 25, 2023 • edited Loading

d4straub commented Nov 10, 2023

d4straub commented Oct 23, 2023 •

edited

Loading

d4straub commented Oct 23, 2023 •

edited

Loading

d4straub commented Oct 25, 2023 •

edited

Loading