VEP annotates SNVs with wrong gene #1814

ytc0413 · 2024-12-13T07:11:33Z

Describe the issue

VEP annotates five GJB4 missense variants (SNVs on chromosome 1) in the attached input VCF file as downstream SNVs of the GJB5 gene. These five SNVs were confirmed to be GJB4 missense variants using ANNOVAR, ClinVar/dbSNP webpage, and UCSC Genome Browser. However, in the VEP output, the SYMBOL, Consequence, and HGNC ID columns indicate they are downstream variants of the GJB5 gene, while the Existing_variation column correctly displays the GJB4 variant rsIDs (rs776245625, rs146378222, rs200602523, rs375702737, rs373126632).
The attached input vcf file was modified from a DRAGEN-generated SNV VCF file. VEP version 110 was used for annotation with the cache file homo_sapiens_vep_110_GRCh38.tar.gz downloaded from https://ftp.ensembl.org/pub/release-110/variation/indexed_vep_cache/. I assume the cache file is from Ensembl only, as it lacks an origin label. I also used version 108 and its Homo sapiens merged cache file to annotate the same input vcf file. The annotation results of the two versions are identical.

Additional information

System

VEP version: v110 and v108
VEP Cache version: v110 and v108 merged (GRCh38)
Perl version: 5.28.1
OS: Ubuntu
tabix installed ? Yes

Full VEP command line (for v110)

$VEP_PATH --cache --offline \
    -i $INPUT_VCF_PATH \
    --format vcf \
    --fork 4 \
    --check_existing \
    --assembly GRCh38 \
    --e \
    --pick \
    --sift \
    --polyphen \
    --regulatory \
    --force_overwrite \
    --dir_cache $VEP_CACHE_DIR \
    --fasta $VEP_FASTA \
    --vcf \
    -o ${SAMPLE_ID}_snv_vep.vcf

Full error message

No warnings and error message

Data files (if applicable)

They include:

The input file:input.vcf.gz
The output file: v108_merged_output.vcf.gz
v110_output.vcf.gz

The text was updated successfully, but these errors were encountered:

nakib103 · 2024-12-13T18:23:21Z

Hi @ytc0413,

Thanks for your query!

The consequence for gene GJB5 for those variants are downstream_gene_variant which is correct as the variants are downstream to these gene.

The reason you are not seeing line with missense_variant effect on GJB4 because you are using --pick option which tells VEP to pick only one consequence per variant and it is picking the line with GJB5. You can either remove --pick option or provide a criteria that would work for you (see --pick_order and this doc for more info).

Best regards,
Nakib

ytc0413 · 2024-12-15T14:01:32Z

Hi @nakib103

Thank you for your reply!

The GJB4 annotations appear after I removed --pick option.
I found the GJB5 annotations were chosen because of the tsl option. I also tried the following two --pick_order criteria, both of which picked the GJB4 annotations for the variants on chromosome 1.

Order1
--pick --pick_order mane_select,mane_plus_clinical,canonical,appris,biotype,rank

Order2
--pick --pick_order biotype,rank,mane_select,mane_plus_clinical,canonical,appris,tsl,ccds,length

Will Order2 output the non-canonical transcript annotation if its consequence is more severe than that of the canonical transcript?
If I want to prioritize the gene that the variant is located within and its MANE Select/canonical transcript, which ordered set would you recommend?

Thank you.

Best regards,
Ashley

nakib103 · 2024-12-16T10:35:03Z

Hi @ytc0413,

Great that you were able to see the desired gene in the output!

Yes, the Order2 would output the non-canonical transcript annotation if it has the most severe consequence.

According to your criteria Order1 is better suited than Order2.

Best regards,
Nakib

ytc0413 · 2024-12-21T14:20:47Z

Hi @nakib103

Thank you for your reply!

I still have a few questions about pick option criteria on this page https://asia.ensembl.org/info/docs/tools/vep/script/vep_other.html.

What does "canonical status of transcript" refer to? Does it mean the Ensembl Canonical transcript described here: https://asia.ensembl.org/info/genome/genebuild/canonical.html?
What is the CCDS status of a transcript?
If I use RefSeq Cache, will pick apply the APPRIS, TSL, and CCDS criteria?

Best regards,
Ashley

nakib103 · 2025-01-02T11:52:52Z

Hi @ytc0413,

Yes, canonical transcript are the ones described in that docs. A transcript annotated as Canonical rank higher than a transcript that is not.
Similar to Canonical, the CCDS status means a transcript having a CCDS annotation.
With RefSeq cache you cannot use those criteria.

Best regards,
Nakib

nakib103 self-assigned this Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VEP annotates SNVs with wrong gene #1814

VEP annotates SNVs with wrong gene #1814

ytc0413 commented Dec 13, 2024

nakib103 commented Dec 13, 2024

ytc0413 commented Dec 15, 2024

nakib103 commented Dec 16, 2024

ytc0413 commented Dec 21, 2024

nakib103 commented Jan 2, 2025 •

edited

Loading

VEP annotates SNVs with wrong gene #1814

VEP annotates SNVs with wrong gene #1814

Comments

ytc0413 commented Dec 13, 2024

Describe the issue

Additional information

System

Full VEP command line (for v110)

Full error message

Data files (if applicable)

nakib103 commented Dec 13, 2024

ytc0413 commented Dec 15, 2024

nakib103 commented Dec 16, 2024

ytc0413 commented Dec 21, 2024

nakib103 commented Jan 2, 2025 • edited Loading

nakib103 commented Jan 2, 2025 •

edited

Loading