Skip to content

Latest commit

 

History

History
133 lines (108 loc) · 6.27 KB

svision_readme.md

File metadata and controls

133 lines (108 loc) · 6.27 KB

SVision

SVision is a deep learning-based structural variants caller that takes aligned reads or contigs as input. Specially, SVision implements a targeted multi-objects recognition framework, detecting and characterizing both simple and complex structural variants from three-channel similarity images.

This file shows how SVision was installed on Polaris and characteristics of the run on Polaris. We will be downloading the models from the SVision repository and will use them to predict SVs on our data. We will not be building any models. HiFi sequence data is needed for building the models.

# build SVision on Polaris
module load conda/2022-09-08
mkdir -p /lus/grand/projects/covid-ct/arodriguez/tools/svision
cd /lus/grand/projects/covid-ct/arodriguez/tools/svision

# download latest from Git repo
git clone https://github.com/xjtu-omics/SVision.git
cd SVision

## Create conda environment and install SVision 
conda env create -f environment.yml
conda activate svisionenv
python setup.py install

#######
(/soft/datascience/conda/2022-09-08/svisionenv) arodriguez@polaris-login-02:/lus/grand/projects/covid-ct/arodriguez/tools/svision/SVision> SVision --help
usage: SVision [-h] -o OUT_PATH -b BAM_PATH -m MODEL_PATH -g GENOME -n SAMPLE
               [-t THREAD_NUM] [-s MIN_SUPPORT] [-c CHROM] [--hash] [--qname]
               [--graph] [--contig] [--debug] [--min_mapq MIN_MAPQ]
               [--min_sv_size MIN_SV_SIZE] [--max_sv_size MAX_SV_SIZE]
               [--window_size WINDOW_SIZE]
               [--patition_max_distance PATITION_MAX_DISTANCE]
               [--cluster_max_distance CLUSTER_MAX_DISTANCE]
               [--batch_size BATCH_SIZE] [--min_gt_depth MIN_GT_DEPTH]
               [--homo_thresh HOMO_THRESH] [--hete_thresh HETE_THRESH]
               [--k_size K_SIZE] [--min_accept MIN_ACCEPT]
               [--max_hash_len MAX_HASH_LEN]

SVision 1.3.8

Short Usage: SVision [parameters] -o <output path> -b <input bam path> -g <reference> -m <model path>

optional arguments:
  -h, --help            show this help message and exit

Input/Output parameters:
  -o OUT_PATH           Absolute path to output
  -b BAM_PATH           Absolute path to bam file
  -m MODEL_PATH         Absolute path to CNN predict model
  -g GENOME             Absolute path to your reference genome (.fai required
                        in the directory)
  -n SAMPLE             Name of the BAM sample name

Optional parameters:
  -t THREAD_NUM         Thread numbers (default: 1)
  -s MIN_SUPPORT        Minimum support read number required for SV calling
                        (default: 5)
  -c CHROM              Specific region (chr1:xxx-xxx) or chromosome (chr1) to
                        detect
  --hash                Activate local realignment for unmapped sequences
                        (default: False)
  --qname               Report support names for each events (default: False)
  --graph               Report graph for events (default: False)
  --contig              Activate contig mode (default: False)
  --debug               Activate debug mode and keep intermedia outputs
                        (default: False)

Collect parameters:
  --min_mapq MIN_MAPQ   Minimum mapping quality of reads to consider (default:
                        10)
  --min_sv_size MIN_SV_SIZE
                        Minimum SV size to detect (default: 50)
  --max_sv_size MAX_SV_SIZE
                        Maximum SV size to detect (default: 1000000)
  --window_size WINDOW_SIZE
                        The sliding window size in segment collection
                        (default: 10000000)

Cluster parameters:
  --patition_max_distance PATITION_MAX_DISTANCE
                        Maximum distance to partition signatures (default:
                        5000)
  --cluster_max_distance CLUSTER_MAX_DISTANCE
                        Clustering maximum distance for a partition (default:
                        0.3)

Predict parameters:
  --batch_size BATCH_SIZE
                        Batch size for the CNN prediction model (default: 128)

Genotype parameters:
  --min_gt_depth MIN_GT_DEPTH
                        Minimum reads required for genotyping (default: 4)
  --homo_thresh HOMO_THRESH
                        Minimum variant allele frequency to be called as
                        homozygous (default: 0.8)
  --hete_thresh HETE_THRESH
                        Minimum variant allele frequency to be called as
                        heterozygous (default: 0.2)

Hash table parameters:
  --k_size K_SIZE       Size of kmer (default: 10)
  --min_accept MIN_ACCEPT
                        Minimum match length for realignment (default: 50)
  --max_hash_len MAX_HASH_LEN
                        Maximum length of unmapped sequence length for
                        realignment (default: 1000)

#####

# you will manually have to download the model files from https://drive.google.com/drive/folders/1j74IN6kPKEx9hy3aENx3zHYPUnyYWGvj
# download to your computer and then transfer to polaris
# files:
# $ls /lus/grand/projects/covid-ct/arodriguez/tools/svision/SVision/models
# -rw-r--r--  1 arodri7     9420131 Oct 13 17:22 svision-cnn-model.ckpt.meta
# -rw-r--r--  1 arodri7         713 Oct 14 10:27 svision-cnn-model.ckpt.index
# -rw-r--r--  1 arodri7   227554836 Oct 14 10:27 svision-cnn-model.ckpt.data-00000-of-00001

We will now be able to submit the SVision job on the 30X HG00138 sample BAM file that was generated with Parabricks:

qsub -A covid-ct -I -l select=1 -l walltime=1:00:00 -l filesystems=home:eagle -q debug

module load conda/2022-09-08
conda activate svisionenv

# modify the command line
mkdir /lus/grand/projects/covid-ct/arodriguez/wgs_test/HG00138/output/30x/svision_out
cd /lus/grand/projects/covid-ct/arodriguez/wgs_test/HG00138/output/30x/svision_out
SVision -o /lus/grand/projects/covid-ct/arodriguez/wgs_test/HG00138/output/30x/svision_out -b /lus/grand/projects/covid-ct/arodriguez/wgs_test/HG00138/output/30x/HG00138.bam -m /lus/grand/projects/covid-ct/arodriguez/tools/svision/SVision/models/svision-cnn-model.ckpt -g /lus/grand/projects/covid-ct/arodriguez/wgs_test/reference/GRCh38_CRAM/GRCh38_full_analysis_set_plus_decoy_hla.fa -n HG00138 -s 5 --graph --qname -t 32

Results will need to be evaluated similar to this.