forked from zwdzwd/transvar
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added option --print-protein and --print-protein-pretty
- Loading branch information
Wanding Zhou
committed
Apr 27, 2016
1 parent
ed5b837
commit 8834d01
Showing
13 changed files
with
330 additions
and
115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
*************************** | ||
Inspect variant sequences | ||
*************************** | ||
|
||
The `--print-protein` and `--print-protein-pretty` options displays the full variant protein sequence in the `variant_protein_seq` field of the info when the genomic variant hits a protein-coding transcript. | ||
|
||
Missense substitution | ||
####################### | ||
|
||
.. code:: bash | ||
$ transvar ganno -i 'chr1:g.115256530G>A' --ensembl --print-protein | ||
:: | ||
|
||
chr1:g.115256530G>A ENST00000369535 (protein_coding) NRAS - | ||
chr1:g.115256530G>A/c.181C>T/p.Q61* inside_[cds_in_exon_3] | ||
CSQN=Nonsense;variant_protein_seq=MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQ | ||
VVIDGETCLLDILDTAG*;codon_pos=115256528-115256529-115256530;ref_codon_seq=CAA; | ||
aliases=ENSP00000358548;source=Ensembl | ||
|
||
`--print-protein-pretty` output is more human-readable and highlight the mutation in brackets. | ||
|
||
.. code:: bash | ||
$ transvar ganno --ccds -i 'chr3:g.178936091G>A' --print-protein-pretty | ||
:: | ||
|
||
chr3:g.178936091G>A CCDS43171 (protein_coding) PIK3CA + | ||
chr3:g.178936091G>A/c.1633G>A/p.E545K inside_[cds_in_exon_9] | ||
CSQN=Missense;dbsnp=rs104886003(chr3:178936091G>A);variant_protein_seq=MPPRPS | ||
SGELWGIHLMPPRILVECLLPNGMIVTLECLREATLITIKHELFKEARKYPLHQLLQDESSYIFVSVTQEAEREEFF | ||
DETRRLCDLRLFQPFLKVIEPVGNREEKILNREIGFAIGMPVCEFDMVKDPEVQDFRRNILNVCKEAVDLRDLNSPH | ||
SRAMYVYPPNVESSPELPKHIYNKLDKGQIIVVIWVIVSPNNDKQKYTLKINHDCVPEQVIAEAIRKKTRSMLLSSE | ||
QLKLCVLEYQGKYILKVCGCDEYFLEKYPLSQYKYIRSCIMLGRMPNLMLMAKESLYSQLPMDCFTMPSYSRRISTA | ||
TPYMNGETSTKSLWVINSALRIKILCATYVNVNIRDIDKIYVRTGIYHGGEPLCDNVNTQRVPCSNPRWNEWLNYDI | ||
YIPDLPRAARLCLSICSVKGRKGAKEEHCPLAWGNINLFDYTDTLVSGKMALNLWPVPHGLEDLLNPIGVTGSNPNK | ||
ETPCLELEFDWFSSVVKFPDMSVIEEHANWSVSREAGFSYSHAGLSNRLARDNELRENDKEQLKAISTRDPLSEIT_ | ||
_[E>K]__QEKDFLWSHRHYCVTIPEILPKLLLSVKWNSRDEVAQMYCLVKDWPPIKPEQAMELLDCNYPDPMVRGF | ||
AVRCLEKYLTDDKLSQYLIQLVQVLKYEQYLDNLLVRFLLKKALTNQRIGHFFFWHLKSEMHNKTVSQRFGLLLESY | ||
CRACGMYLKHLNRQVEAMEKLINLTDILKQEKKDETQKVQMKFLVEQMRRPDFMDALQGFLSPLNPAHQLGNLRLEE | ||
CRIMSSAKRPLWLNWENPDIMSELLFQNNEIIFKNGDDLRQDMLTLQIIRIMENIWQNQGLDLRMLPYGCLSIGDCV | ||
GLIEVVRNSHTIMQIQCKGGLKGALQFNSHTLHQWLKDKNKGEIYDAAIDLFTRSCAGYCVATFILGIGDRHNSNIM | ||
VKDDGQLFHIDFGHFLDHKKKKFGYKRERVPFVLTQDFLIVISKGAQECTKTREFERFQEMCYKAYLAIRQHANLFI | ||
NLFSMMLGSGMPELQSFDDIAYIRKTLALDKTEQEALEYFMKQMNDAHHGGWTTKMDWIFHTIKQHALN*;codon_ | ||
pos=178936091-178936092-178936093;ref_codon_seq=GAG;source=CCDS | ||
|
||
|
||
The alphabet transformation option `--aa3` applies here as well. | ||
|
||
.. code:: bash | ||
$ transvar ganno -i 'chr1:g.115256530G>A' --ensembl --print-protein-pretty --aa3 | ||
:: | ||
|
||
chr1:g.115256530G>A ENST00000369535 (protein_coding) NRAS - | ||
chr1:g.115256530G>A/c.181C>T/p.Gln61X inside_[cds_in_exon_3] | ||
CSQN=Missense;variant_protein_seq=MetThrGluTyrLysLeuValValValGlyAlaGlyGlyValG | ||
lyLysSerAlaLeuThrIleGlnLeuIleGlnAsnHisPheValAspGluTyrAspProThrIleGluAspSerTyr | ||
ArgLysGlnValValIleAspGlyGluThrCysLeuLeuAspIleLeuAspThrAlaGly__[GluGluTyrSerAl | ||
aMetArgAspGlnTyrMetArgThrGlyGluGlyPheLeuCysValPheAlaIleAsnAsnSerLysSerPheAlaA | ||
spIleAsnLeuTyrArgGluGlnIleLysArgValLysAspSerAspAspValProMetValLeuValGlyAsnLys | ||
CysAspLeuProThrArgThrValAspThrLysGlnAlaHisGluLeuAlaLysSerTyrGlyIleProPheIleGl | ||
uThrSerAlaLysThrArgGlnGlyValGluAspAlaPheTyrThrLeuValArgGluIleArgGlnTyrArgMetL | ||
ysLysLeuAsnSerSerAspAspGlyThrGlnGlyCysMetGlyLeuProCysValValMet>X];codon_pos=1 | ||
15256528-115256529-115256530;ref_codon_seq=CAA;aliases=ENSP00000358548;source | ||
=Ensembl | ||
|
||
Deletion | ||
############ | ||
|
||
.. code:: bash | ||
$ transvar canno --ccds -i 'CCDS8856:c.769_771delGGG' --print-protein-pretty | ||
:: | ||
|
||
CCDS8856:c.769_771delGGG CCDS8856 (protein_coding) AAAS - | ||
chr12:g.53703427_53703429delCCC/c.769_771delGGG/p.G257delG inside_[cds_in_exon_8] | ||
CSQN=InFrameDeletion;left_align_gDNA=g.53703424_53703426delCCC;unaligned_gDNA | ||
=g.53703424_53703426delCCC;left_align_cDNA=c.766_768delGGG;unalign_cDNA=c.769 | ||
_771delGGG;left_align_protein=p.G256delG;unalign_protein=p.G257delG;variant_p | ||
rotein_seq=MCSLGLFPPPPPRGQVTLYEHNNELVTGSSYESPPPDFRGQWINLPVLQLTKDPLKTPGRLDHGTR | ||
TAFIHHREQVWKRCINIWRDVGLFGVLNEIANSEEEVFEWVKTASGWALALCRWASSLHGSLFPHLSLRSEDLIAEF | ||
AQVTNWSSCCLRVFAWHPHTNKFAVALLDDSVRVYNASSTIVPSLKHRLQRNVASLAWKPLSASVLAVACQSCILIW | ||
TLDPTSLSTRPSSGCAQVLSHPGHTPVTSLAWAPSG__[G_deletion]__RLLSASPVDAAIRVWDVSTETCVPL | ||
PWFRGGGVTNLLWSPDGSKILATTPSAVFRVWEAQMWTCERWPTLSGRCQTGCWSPDGSRLLFTVLGEPLIYSLSFP | ||
ERCGEGKGCVGGAKSATIVADLSETTIQTPDGEERLGGEAHSMVWDPSGERLAVLMKGKPRVQDGKPVILLFRTRNS | ||
PVFELLPCGIIQGEPGAQPQLITFHPSFNKGALLSVGWSTGRIAHIPLYFVNAQFPRFSPVLGRAQEPPAGGGGSIH | ||
DLPLFTETSPTSAPWDPLPGPPPVLPHSPHSHL*;source=CCDS | ||
|
||
Insertion | ||
############ | ||
|
||
.. code:: bash | ||
$ transvar ganno -i 'chr2:g.69741762_69741763insTGC' --ccds --print-protein-pretty | ||
:: | ||
|
||
chr2:g.69741762_69741763insTGC CCDS1893 (protein_coding) AAK1 - | ||
chr2:g.69741780_69741782dupCTG/c.1614_1616dupGCA/p.Q546dupQ inside_[cds_in_exon_12] | ||
CSQN=InFrameInsertion;left_align_gDNA=g.69741762_69741763insTGC;unalign_gDNA= | ||
g.69741762_69741763insTGC;left_align_cDNA=c.1596_1597insCAG;unalign_cDNA=c.16 | ||
14_1616dupGCA;left_align_protein=p.Y532_Q533insQ;unalign_protein=p.Q539dupQ;v | ||
ariant_protein_seq=MKKFFDSRREQGGSGLGSGSSGGGGSTSGLGSGYIGRVFGIGRQQVTVDEVLAEGGFA | ||
IVFLVRTSNGMKCALKRMFVNNEHDLQVCKREIQIMRDLSGHKNIVGYIDSSINNVSSGDVWEVLILMDFCRGGQVV | ||
NLMNQRLQTGFTENEVLQIFCDTCEAVARLHQCKTPIIHRDLKVENILLHDRGHYVLCDFGSATNKFQNPQTEGVNA | ||
VEDEIKKYTTLSYRAPEMVNLYSGKIITTKADIWALGCLLYKLCYFTLPFGESQVAICDGNFTIPDNSRYSQDMHCL | ||
IRYMLEPDPDKRPDIYQVSYFSFKLLKKECPIPNVQNSPIPAKLPEPVKASEAAAKKTQPKARLTDPIPTTETSIAP | ||
RQRPKAGQTQPNPGILPIQPALTPRKRATVQPPPQAAGSSNQPGLLASVPQPKPQAPPSQPLPQTQAKQPQAPPTPQ | ||
QTPSTQAQGLPAQAQATPQHQQQLFLKQQQQQQQPPPAQQQPAGTFYQQQQAQTQQFQAVHPATQKPAIAQFPVVSQ | ||
GGSQQQLMQNFYQQQQQQQQQQQQQQ__[insert_Q]__LATALHQQQLMTQQAALQQKPTMAAGQQPQPQPAAAP | ||
QPAPAQEPAIQAPVRQQPKVQTTPPPAVQGQKVGSLTPPSSPKTQRAGHRRILSDVTHSAVFGVPASKSTQLLQAAA | ||
AEASLNKSKSATTTPSGSPRTSQQNVYNPSEGSTWNPFDDDNFSKLTAEELLNKDFAKLGEGKHPEKLGGSAESLIP | ||
GFQSTQGDAFATTSFSAGTAEKRKGGQTVDSGLPLLSVSDPFIPLQVPDAPEKLIEGLKSPDTSLLLPDLLPMTDPF | ||
GSTSDAVIEKADVAVESLIPGLEPPVPQRLPSQTESVTSNRTDSLTGEDSLLDCSLLSNPTTDLLEEFAPTAISAPV | ||
HKAAEDSNLISGFDVPEGSDKVAEDEFDPIPVLITKNPQGGHSRNSSGSSESSLPNLARSLLLVDQLIDL*;phase | ||
=2;source=CCDS | ||
|
||
|
||
Frameshift sequence | ||
###################### | ||
|
||
.. code:: bash | ||
$ transvar canno --ccds -i 'CCDS8856:c.769_770delGG' --print-protein-pretty | ||
:: | ||
|
||
CCDS8856:c.769_770delGG CCDS8856 (protein_coding) AAAS - | ||
chr12:g.53703428_53703429delCC/c.770_771delGG/p.G257Afs*65 inside_[cds_in_exon_8] | ||
CSQN=Frameshift;left_align_gDNA=g.53703424_53703425delCC;unaligned_gDNA=g.537 | ||
03425_53703426delCC;left_align_cDNA=c.766_767delGG;unalign_cDNA=c.769_770delG | ||
G;variant_protein_seq=MCSLGLFPPPPPRGQVTLYEHNNELVTGSSYESPPPDFRGQWINLPVLQLTKDPL | ||
KTPGRLDHGTRTAFIHHREQVWKRCINIWRDVGLFGVLNEIANSEEEVFEWVKTASGWALALCRWASSLHGSLFPHL | ||
SLRSEDLIAEFAQVTNWSSCCLRVFAWHPHTNKFAVALLDDSVRVYNASSTIVPSLKHRLQRNVASLAWKPLSASVL | ||
AVACQSCILIWTLDPTSLSTRPSSGCAQVLSHPGHTPVTSLAWAPSG__[frameshift_GRLLSASPVDAAIRVW | ||
DVSTETCVPLPWFRGGGVTNLLWSPDGSKILATTPSAVFRVWEAQMWTCERWPTLSGRCQTGCWSPDGSRLLFTVLG | ||
EPLIYSLSFPERCGEGKGCVGGAKSATIVADLSETTIQTPDGEERLGGEAHSMVWDPSGERLAVLMKGKPRVQDGKP | ||
VILLFRTRNSPVFELLPCGIIQGEPGAQPQLITFHPSFNKGALLSVGWSTGRIAHIPLYFVNAQFPRFSPVLGRAQE | ||
PPAGGGGSIHDLPLFTETSPTSAPWDPLPGPPPVLPHSPHSHL*>AAALSFTRGCCYPGMGCLNRDLCPPSLVPRRW | ||
GDQPALVPRRQQNPGYHSFSCLSSLGGPDVDL*];source=CCDS | ||
|
||
|
||
.. code:: bash | ||
$ transvar canno -i 'CCDS54438:c.409_421del' --ccds --print-protein-pretty | ||
:: | ||
|
||
CCDS54438:c.409_421del CCDS54438 (protein_coding) ATG16L1 + | ||
chr2:g.234183368_234183380del13/c.409_421del13/p.T137Lfs*5 inside_[cds_in_exon_5] | ||
CSQN=Frameshift;left_align_gDNA=g.234183367_234183379del13;unaligned_gDNA=g.2 | ||
34183368_234183380del13;left_align_cDNA=c.408_420del13;unalign_cDNA=c.409_421 | ||
del13;variant_protein_seq=MSSGLRAADFPRWKRHISEQLRRRDRLQRQAFEEIILQYNKLLEKSDLHSV | ||
LAQKLQAEKHDVPNRHEIRRRQARLQKELAEAAKEPLPVEQDDDIEVIVDETSDHTEETSPVRAISRAATRRSVSSF | ||
PVPQDNVD__[frameshift_THPGSGKEVRVPATALCVFDAHDGEVNAVQFSPGSRLLATGGMDRRVKLWEVFGE | ||
KCEFKGSLSGSNAGITSIEFDSAGSYLLAASNDFASRIWTVDDYRLRHTLTGHSGKVLSAKFLLDNARIVSGSHDRT | ||
LKLWDLRSKVCIKTVFAGSSCNDIVCTEQCVMSGHFDKKIRFWDIRSESIVREMELLGKITALDLNPERTELLSCSR | ||
DDLLKVIDLRTNAIKQTFSAPGFKCGSDWTRVVFSPDGSYVAAGSAEGSLYIWSVLTGKVEKVLSKQHSSSINAVAW | ||
SPSGSHVVSVDKGCKAVLWAQY*>LVKK*];source=CCDS | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,9 @@ | ||
#!/usr/bin/env python | ||
|
||
# The MIT License | ||
# | ||
# Copyright (c) 2016 | ||
# Wanding Zhou | ||
# | ||
# Copyright (c) 2014, 2015 The University of Texas MD Anderson Cancer Center | ||
# Wanding Zhou, Tenghui Chen and Ken Chen | ||
|
@@ -25,7 +28,7 @@ | |
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
# SOFTWARE. | ||
# | ||
# Contact: Ken Chen <[email protected]> | ||
# Contact: Wanding Zhou <[email protected]> | ||
|
||
import os | ||
import sys | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.