Nanopore Microbial Variant Calling Pipeline
This pipeline runs dorado for high accuracy basecalling and alignment to reference genomes creating a BAM file that can be used as an input the bwa_pipeline_nanopore. This pipeline makes variant calls by using samtools/bcftools, varscan and gatk. It annotates the variants by using snpEff/snpSift and combines/collates variants that can be filtered based on the number of callers.
Basecalling with dorado requires GPU. Basecalling/alignment steps were performed in Macboo Pro (M1),3.2 Ghz, 10 CPU/16 GPU, 32GB Memory.
It took about ~13 hours to make basecalling and alignment to merged Dvh and Mmp genomes by using high quality basecalling model.
Download provided binaries for the relevant platform from dorado Github repository.
Unpack the dorado under the dorado directory
Download the relevant model from dorado gihub repository. We have used [email protected] model for the latest high accuracy basecalling.
code/dorado-0.4.3-osx-arm64/bin/dorado download --model [email protected]
model file is placed under the model directory
pip install pod5
pod5 convert fast5 -r Run1/barcode07/*.fast5 Run2/barcode07/*.fast5 Run3/barcode07/*.fast5 Run4/barcode07/*.fast5 --output barcode07_combined.pod5
dorado/dorado-0.4.3-osx-arm64/bin/dorado basecaller model/[email protected] data/EPD9/barcode07/barcode07_combined.pod5 --reference reference/DvH_Mmp_merged_dna.genome.fasta > output/EPD9/barcode07/basecalls/barcode07.bam
Trio Variant Calling Pipeline uses BAM input file from dorado and calls variant by using samtools, gatk and varscan. Annotations are performed with snpEFF and variants are collated.
- Requirements
- samtools
- bcftools
- varscan
- gatk
- snpEff