-
Notifications
You must be signed in to change notification settings - Fork 3
GeneBed
Wang Yunfei edited this page Feb 8, 2017
·
2 revisions
- GeneBed is a class for UCSC GenePred format.
- GeneBed is the class for dealing with gene coordinates that have multiple exons.
- Other gene format, such as GFF and GTF can be converted to GeneBed format.
Definition of GeneBed:
class GeneBed(Bed):
'''UCSC GenePred format.'''
39 lines: def __init__(self,x,description=None):-----------------------------------------------------------------------------------
3 lines: def __str__(self):-------------------------------------------------------------------------------------------------------
3 lines: def toBed(self):---------------------------------------------------------------------------------------------------------
6 lines: def getExon(self,i):-----------------------------------------------------------------------------------------------------
9 lines: def getIntron(self,i):---------------------------------------------------------------------------------------------------
28 lines: def __getUTRs(self,end='left'):------------------------------------------------------------------------------------------
9 lines: def getUTR5(self):-------------------------------------------------------------------------------------------------------
9 lines: def getUTR3(self):-------------------------------------------------------------------------------------------------------
16 lines: def overlapLength(self,B):-----------------------------------------------------------------------------------------------
4 lines: def exons(self):---------------------------------------------------------------------------------------------------------
4 lines: def introns(self):-------------------------------------------------------------------------------------------------------
18 lines: def getCDS(self):--------------------------------------------------------------------------------------------------------
6 lines: def getcDNALength(self):-------------------------------------------------------------------------------------------------
6 lines: def getSeq(self,fn=USERHOME+"/Data/hg19/hg19.2bit"):---------------------------------------------------------------------
9 lines: def getWig(self,fn):-----------------------------------------------------------------------------------------------------
Example: Get the exons of genes
from ngslib import IO
for gene in IO.BioReader("test/test.tab","genepred"):
for exon in gene.exons():
print exon
Output:
chr1 134212701 134213049 NM_001195025:exon_1 0.00 + chr1 134221529 134221650 NM_001195025:exon_2 0.00 + chr1 134222782 134222806 NM_001195025:exon_3 0.00 + chr1 134224273 134224425 NM_001195025:exon_4 0.00 + chr1 134224707 134224773 NM_001195025:exon_5 0.00 + chr1 134226534 134226654 NM_001195025:exon_6 0.00 + chr1 134227135 134227268 NM_001195025:exon_7 0.00 + chr1 134227897 134230065 NM_001195025:exon_8 0.00 + chr1 134212701 134213049 NM_028778:exon_1 0.00 + ......
Example: Get the 5' UTRs
for gene in IO.BioReader("test/test.tab","genepred"):
utr5 = gene.getUTR5()
if utr5: # some genes don't have 5' UTR
print utr5
Output: (Note, UTRs are still GeneBed instances because some UTRs cover more than 1 exon)
NM_001195025:UTR5 chr1 + 134212701 134212806 134212701 134212806 1 134212701, 134212806, NM_028778:UTR5 chr1 + 134212701 134212806 134212701 134212806 1 134212701, 134212806, NM_008922:UTR5 chr1 - 33725856 33726603 33725856 33726603 2 33725856,33726468, 33725865,33726603, NM_027671:UTR5 chr1 - 8794024 9289811 8794024 9289811 5 8794024,8872241,8989266,9193259,9288213, 8794051,8872295,8989330,9193341,9289811, NM_175370:UTR5 chr1 - 58749257 58752833 58749257 58752833 2 58749257,58752745, 58749289,58752833, NM_175642:UTR5 chr1 - 25883605 25886552 25883605 25886552 2 25883605,25886306, 25883620,25886552, NM_178884:UTR5 chr1 - 75502799 75503027 75502799 75503027 1 75502799, 75503027, NM_198680:UTR5 chr1 - 109056354 109057691 109056354 109057691 2 109056354,109057656, 109056379,109057691, NM_199021:UTR5 chr1 - 125942020 125942136 125942020 125942136 1 125942020, 125942136,