Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VEP annotates SNVs with wrong gene #1814

Open
ytc0413 opened this issue Dec 13, 2024 · 5 comments
Open

VEP annotates SNVs with wrong gene #1814

ytc0413 opened this issue Dec 13, 2024 · 5 comments
Assignees

Comments

@ytc0413
Copy link

ytc0413 commented Dec 13, 2024

Describe the issue

VEP annotates five GJB4 missense variants (SNVs on chromosome 1) in the attached input VCF file as downstream SNVs of the GJB5 gene. These five SNVs were confirmed to be GJB4 missense variants using ANNOVAR, ClinVar/dbSNP webpage, and UCSC Genome Browser. However, in the VEP output, the SYMBOL, Consequence, and HGNC ID columns indicate they are downstream variants of the GJB5 gene, while the Existing_variation column correctly displays the GJB4 variant rsIDs (rs776245625, rs146378222, rs200602523, rs375702737, rs373126632).
The attached input vcf file was modified from a DRAGEN-generated SNV VCF file. VEP version 110 was used for annotation with the cache file homo_sapiens_vep_110_GRCh38.tar.gz downloaded from https://ftp.ensembl.org/pub/release-110/variation/indexed_vep_cache/. I assume the cache file is from Ensembl only, as it lacks an origin label. I also used version 108 and its Homo sapiens merged cache file to annotate the same input vcf file. The annotation results of the two versions are identical.

Additional information

System

  • VEP version: v110 and v108
  • VEP Cache version: v110 and v108 merged (GRCh38)
  • Perl version: 5.28.1
  • OS: Ubuntu
  • tabix installed ? Yes

Full VEP command line (for v110)

$VEP_PATH --cache --offline \
    -i $INPUT_VCF_PATH \
    --format vcf \
    --fork 4 \
    --check_existing \
    --assembly GRCh38 \
    --e \
    --pick \
    --sift \
    --polyphen \
    --regulatory \
    --force_overwrite \
    --dir_cache $VEP_CACHE_DIR \
    --fasta $VEP_FASTA \
    --vcf \
    -o ${SAMPLE_ID}_snv_vep.vcf

Full error message

No warnings and error message

Data files (if applicable)

They include:

@nakib103 nakib103 self-assigned this Dec 13, 2024
@nakib103
Copy link
Contributor

Hi @ytc0413,

Thanks for your query!

The consequence for gene GJB5 for those variants are downstream_gene_variant which is correct as the variants are downstream to these gene.

The reason you are not seeing line with missense_variant effect on GJB4 because you are using --pick option which tells VEP to pick only one consequence per variant and it is picking the line with GJB5. You can either remove --pick option or provide a criteria that would work for you (see --pick_order and this doc for more info).

Best regards,
Nakib

@ytc0413
Copy link
Author

ytc0413 commented Dec 15, 2024

Hi @nakib103

Thank you for your reply!

The GJB4 annotations appear after I removed --pick option.
I found the GJB5 annotations were chosen because of the tsl option. I also tried the following two --pick_order criteria, both of which picked the GJB4 annotations for the variants on chromosome 1.

Order1
--pick --pick_order mane_select,mane_plus_clinical,canonical,appris,biotype,rank

Order2
--pick --pick_order biotype,rank,mane_select,mane_plus_clinical,canonical,appris,tsl,ccds,length

Will Order2 output the non-canonical transcript annotation if its consequence is more severe than that of the canonical transcript?
If I want to prioritize the gene that the variant is located within and its MANE Select/canonical transcript, which ordered set would you recommend?

Thank you.

Best regards,
Ashley

@nakib103
Copy link
Contributor

Hi @ytc0413,

Great that you were able to see the desired gene in the output!

Yes, the Order2 would output the non-canonical transcript annotation if it has the most severe consequence.

According to your criteria Order1 is better suited than Order2.

Best regards,
Nakib

@ytc0413
Copy link
Author

ytc0413 commented Dec 21, 2024

Hi @nakib103

Thank you for your reply!

I still have a few questions about pick option criteria on this page https://asia.ensembl.org/info/docs/tools/vep/script/vep_other.html.

  1. What does "canonical status of transcript" refer to? Does it mean the Ensembl Canonical transcript described here: https://asia.ensembl.org/info/genome/genebuild/canonical.html?

  2. What is the CCDS status of a transcript?

  3. If I use RefSeq Cache, will pick apply the APPRIS, TSL, and CCDS criteria?

Best regards,
Ashley

@nakib103
Copy link
Contributor

nakib103 commented Jan 2, 2025

Hi @ytc0413,

  1. Yes, canonical transcript are the ones described in that docs. A transcript annotated as Canonical rank higher than a transcript that is not.
  2. Similar to Canonical, the CCDS status means a transcript having a CCDS annotation.
  3. With RefSeq cache you cannot use those criteria.

Best regards,
Nakib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants