Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

25% of reads too short Unmapped in SmartSeq2 #2261

Open
Rohit-Satyam opened this issue Dec 28, 2024 · 1 comment
Open

25% of reads too short Unmapped in SmartSeq2 #2261

Rohit-Satyam opened this issue Dec 28, 2024 · 1 comment

Comments

@Rohit-Satyam
Copy link

Rohit-Satyam commented Dec 28, 2024

Dear @alexdobin

Greetings!!

I am running Starsolo on SmartSeq-2 data for my parasite data obtained from GSE145080 and the reads are 71 bp long PE. I carried out the ribodepletion using ribodetector and then used the non-rRNA reads for alignment. I didn't remove adapters I use the following parameters:

## indexing
STAR-avx2 --runThreadN 10 --runMode genomeGenerate --genomeDir 00_index \
--genomeFastaFiles ToxoDB-68_TgondiiME49_Genome_withERCC.fasta --sjdbGTFfile ToxoDB-68_TgondiiME49_withERCC.gtf \
--genomeSAindexNbases 12 --genomeChrBinNbits 15 --sjdbOverhang 70
STAR --genomeDir !{index.toRealPath()} --runThreadN !{task.cpus} \
--alignIntronMax 1000000 \
--alignIntronMin 20 \
--alignMatesGapMax 1000000 \
--alignSJDBoverhangMin 1 \
--alignSJoverhangMin 8 \
--genomeLoad NoSharedMemory \
--outBAMsortingThreadN !{task.cpus} \
--outFileNamePrefix !{sid}_ \
--outFilterMatchNminOverLread 0.3 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverReadLmax 0.04 \
--outFilterMultimapNmax 20 \
--outFilterScoreMinOverLread 0.3 \
--outFilterType BySJout \
--outSAMattributes All \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--outReadsUnmapped Fastx \
--outSAMattrRGline ID:${id} SM:!{sid} PL:ILLUMINA \
--readFilesCommand zcat \
--readFilesIn !{reads[0]} !{reads[1]} \
!{params.staralign_ext}\
--outBAMsortingBinsN 200 \
--sjdbGTFfile !{params.gtf}  \
--sjdbScore 1 \
--soloType SmartSeq \
--outSAMstrandField intronMotif \
--soloUMIdedup Exact NoDedup \
--soloStrand Unstranded --soloFeatures Gene GeneFull SJ \
--soloCellFilter None \
--soloOutFileNames output/ features.tsv barcodes.tsv matrix.mtx \
2> !{sid}.stderr

The too-short read percentage is same across almost all the cells. When I check these reads, these are mostly 28S rRNA from Toxoplasma (same organism) reads. But that's no excuse not to map.

                                 Started job on |	Dec 28 16:14:34
                             Started mapping on |	Dec 28 16:14:40
                                    Finished on |	Dec 28 16:16:54
       Mapping speed, Million of reads per hour |	16.81

                          Number of input reads |	625651
                      Average input read length |	142
                                    UNIQUE READS:
                   Uniquely mapped reads number |	336871
                        Uniquely mapped reads % |	53.84%
                          Average mapped length |	124.96
                       Number of splices: Total |	35148
            Number of splices: Annotated (sjdb) |	34259
                       Number of splices: GT/AG |	34867
                       Number of splices: GC/AG |	268
                       Number of splices: AT/AC |	5
               Number of splices: Non-canonical |	8
                      Mismatch rate per base, % |	1.08%
                         Deletion rate per base |	0.02%
                        Deletion average length |	1.46
                        Insertion rate per base |	0.01%
                       Insertion average length |	1.45
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |	62649
             % of reads mapped to multiple loci |	10.01%
        Number of reads mapped to too many loci |	2870
             % of reads mapped to too many loci |	0.46%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |	0
       % of reads unmapped: too many mismatches |	0.00%
            Number of reads unmapped: too short |	181133
                 % of reads unmapped: too short |	28.95%
                Number of reads unmapped: other |	42128
                     % of reads unmapped: other |	6.73%
                                  CHIMERIC READS:
                       Number of chimeric reads |	0
                            % of chimeric reads |	0.00%
@Rohit-Satyam Rohit-Satyam changed the title 25% of reads too short Unmapped 25% of reads too short Unmapped in SmartSeq2 Dec 28, 2024
@Rohit-Satyam
Copy link
Author

Rohit-Satyam commented Dec 29, 2024

Here is the multiqc report for RH_96 well plate. The also mentioned in their paper about this but didn't offer any explanation:

We reported ‘% mapped’ based on the meta-alignment output from STAR aligner. We checked for some of the unmapped reads on BLASTn and found the majority of them to map to Toxoplasma 28S ribosomal RNA. 

multiqc_report.html.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant