Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variants missing from VEP cache #1821

Open
suzyhh opened this issue Dec 30, 2024 · 1 comment
Open

Variants missing from VEP cache #1821

suzyhh opened this issue Dec 30, 2024 · 1 comment
Assignees

Comments

@suzyhh
Copy link

suzyhh commented Dec 30, 2024

Hello,

I've been running VEP v113 using the cache and docker image, and I've found some variants that are present in gnomAD v4.1 but are absent from the cache.

Example 1: 20:45891598:CCTG:C the co-located variant is flagged in Ensembl, I have now added in --failed 1 to my VEP command and I expect this variant will now have AFs.

Example 2: 8:144392334:T:TGGGGGTGCAAGGTGA this variant is present in gnomAD but is not in dbSNP

Example 3: 1:9244919:CCCCAGGCA:C this variant is present in gnomAD and ClinVar, but is not in dbSNP

On this page: https://useast.ensembl.org/info/docs/tools/vep/script/vep_cache.html#cache in the "Data in the cache" section, it lists the sources of variants in the VEP cache, and this includes dbSNP, ClinVar, and gnomAD v4.1. I would therefore assume that any variant in any of these sources should be present in the cache, however what I've observed is that the variant has to be present in dbSNP to be in the cache - which is the expected behaviour?

System

  • VEP version: 113.2 docker
  • VEP Cache version: 113

Full VEP command line

        vep --cache --offline \
        -i ~{vcf} \
        --dir_cache /opt/vep/.vep \
        --dir_plugins /opt/vep/.vep/Plugins/ \
        --vcf --compress_output bgzip \
        --merged \
        --fasta $refFasta \
        --assembly GRCh38 \
        --no_stats \
        --fork ~{fork} \
        --buffer_size ~{buffer} \
        --no_escape \
        --check_existing --failed 1 \
        --hgvs --hgvsg \
        --af --af_gnomadg \
        --protein --uniprot \
        --symbol \
        --numbers \
        --allele_number \
        --sift b --polyphen b \
        --pubmed \
        --show_ref_allele \
        --variant_class \
        --transcript_version \
        --mane \
        --flag_pick_allele_gene --pick_order rank,mane_select,mane_plus_clinical,canonical,appris,tsl,biotype,ccds,length \
        --flag_gencode_primary \
        --exclude_predicted \
        --use_given_ref \
        --plugin MaxEntScan,/opt/vep/.vep/fordownload,NCSS,SWA \
        --plugin SpliceAI,snv=~{spliceaiSnv},indel=~{spliceaiIndel} \
        --plugin SpliceRegion \
        --plugin NearestExonJB,max_range=100 \
        --plugin REVEL,file=/opt/vep/.vep/new_tabbed_revel_grch38.tsv.gz,no_match=1 \
        --plugin SpliceDistance \
        --plugin UTRAnnotator,file=/opt/vep/.vep/fordownload/uORF_5UTR_GRCh38_PUBLIC.txt \
        --plugin GnomadPli,file=~{gnomadv4Pli} \
        --plugin CADD,snv=~{caddSnv},indels=~{caddIndel} \
        --plugin AlphaMissense,file=~{alphaMissense} \
        --custom file=${hgmdZip},short_name=HGMD,format=vcf,type=exact,coords=0,fields=CLASS%PHEN \
        --custom file=~{clinVarVcf},short_name=ClinVar,format=vcf,type=exact,coords=0,fields=CLNSIG%CLNREVSTAT%CLNDN \
        --force_overwrite \
        -o ~{outName}_snvs.vep.vcf.gz

I would really like to avoid having to add the gnomAD AFs as a custom annotation as I imagine this is going to add quite a bit of processing time to the annotation, but we're missing AFs from quite a few variants which is having a knock-on effect for our variant filtering processes.

Thanks!
Suzy

@dglemos dglemos self-assigned this Jan 6, 2025
@dglemos
Copy link
Contributor

dglemos commented Jan 6, 2025

Hi @suzyhh,
That is correct, variants must be present in dbSNP to be included in the cache. For VEP cache version 113 the corresponding dbSNP version is 156.
Unfortunately, the only way to obtain the gnomAD annotation is by using the custom annotation.

Testing the example variants with custom annotation, I can find the following matches:

However, I cannot find gnomAD data for 1:9244919:CCCCAGGCA:C. Can you please send the expected data ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants