Nanopore Microbial Variant Calling Pipeline
This pipeline runs dorado for high accuracy basecalling and alignment to reference genomes creating a BAM file that can be used as an input the bwa_pipeline_nanopore. This pipeline makes variant calls by using samtools/bcftools, varscan and gatk. It annotates the variants by using snpEff/snpSift and combines/collates variants that can be filtered based on the number of callers.
Basecalling with dorado requires GPU. Basecalling/alignment steps were performed in Macboo Pro (M1),3.2 Ghz, 10 CPU/16 GPU, 32GB Memory.
It took about ~13 hours to make basecalling and alignment to merged Dvh and Mmp genomes by using high quality basecalling model.
- Nanopore
-
code
-
data
- EPD9
- barcode07
- Run1, Run2, Run3, Run5
- barcode07
- EPD9
-
output
- EPD9
- barcode07
- alignment_results
- samtools_results
- gatk_results
- varscan_results
- combined_output
- dvh
- mmp
- snpeff_results
- dvh
- mmp
- barcode07
- basecalls
- barcode07
- EPD9
-
model
-
reference
-
dorado
-
Download provided binaries for the relevant platform from dorado Github repository.
Unpack the dorado under the dorado directory
Download the relevant model from dorado gihub repository. We have used [email protected] model for the latest high accuracy basecalling.
code/dorado-0.4.3-osx-arm64/bin/dorado download --model [email protected]
model file is placed under the model directory
pip install pod5
pod5 convert fast5 -r Run1/barcode07/*.fast5 Run2/barcode07/*.fast5 Run3/barcode07/*.fast5 Run4/barcode07/*.fast5 --output barcode07_combined.pod5
dorado/dorado-0.4.3-osx-arm64/bin/dorado basecaller model/[email protected] data/EPD9/barcode07/barcode07_combined.pod5 --reference reference/DvH_Mmp_merged_dna.genome.fasta > output/EPD9/barcode07/basecalls/barcode07.bam
Trio Variant Calling Pipeline uses BAM input file from dorado and calls variant by using samtools, gatk and varscan. Annotations are performed with snpEFF and variants are collated.
- Requirements
- samtools
- bcftools
- varscan
- gatk
- snpEff