Skip to content
Wang Yunfei edited this page Feb 8, 2017 · 2 revisions

GeneBed

  • GeneBed is a class for UCSC GenePred format.
  • GeneBed is the class for dealing with gene coordinates that have multiple exons.
  • Other gene format, such as GFF and GTF can be converted to GeneBed format.

Definition of GeneBed:

class GeneBed(Bed):
    '''UCSC GenePred format.'''
  39 lines:     def __init__(self,x,description=None):-----------------------------------------------------------------------------------
   3 lines:     def __str__(self):-------------------------------------------------------------------------------------------------------
   3 lines:     def toBed(self):---------------------------------------------------------------------------------------------------------
   6 lines:     def getExon(self,i):-----------------------------------------------------------------------------------------------------
   9 lines:     def getIntron(self,i):---------------------------------------------------------------------------------------------------
  28 lines:     def __getUTRs(self,end='left'):------------------------------------------------------------------------------------------
   9 lines:     def getUTR5(self):-------------------------------------------------------------------------------------------------------
   9 lines:     def getUTR3(self):-------------------------------------------------------------------------------------------------------
  16 lines:     def overlapLength(self,B):-----------------------------------------------------------------------------------------------
   4 lines:     def exons(self):---------------------------------------------------------------------------------------------------------
   4 lines:     def introns(self):-------------------------------------------------------------------------------------------------------
  18 lines:     def getCDS(self):--------------------------------------------------------------------------------------------------------
   6 lines:     def getcDNALength(self):-------------------------------------------------------------------------------------------------
   6 lines:     def getSeq(self,fn=USERHOME+"/Data/hg19/hg19.2bit"):---------------------------------------------------------------------
   9 lines:     def getWig(self,fn):-----------------------------------------------------------------------------------------------------

Example: Get the exons of genes

from ngslib import IO
for gene in IO.BioReader("test/test.tab","genepred"):
    for exon in gene.exons():
        print exon

Output:

chr1	134212701	134213049	NM_001195025:exon_1	0.00 	+
chr1	134221529	134221650	NM_001195025:exon_2	0.00 	+
chr1	134222782	134222806	NM_001195025:exon_3	0.00 	+
chr1	134224273	134224425	NM_001195025:exon_4	0.00 	+
chr1	134224707	134224773	NM_001195025:exon_5	0.00 	+
chr1	134226534	134226654	NM_001195025:exon_6	0.00 	+
chr1	134227135	134227268	NM_001195025:exon_7	0.00 	+
chr1	134227897	134230065	NM_001195025:exon_8	0.00 	+
chr1	134212701	134213049	NM_028778:exon_1	0.00 	+
......

Example: Get the 5' UTRs

for gene in IO.BioReader("test/test.tab","genepred"):
    utr5 = gene.getUTR5()
    if utr5: # some genes don't have 5' UTR
        print utr5

Output: (Note, UTRs are still GeneBed instances because some UTRs cover more than 1 exon)

NM_001195025:UTR5	chr1	+	134212701	134212806	134212701	134212806	1	134212701,	134212806,
NM_028778:UTR5	chr1	+	134212701	134212806	134212701	134212806	1	134212701,	134212806,
NM_008922:UTR5	chr1	-	33725856	33726603	33725856	33726603	2	33725856,33726468,	33725865,33726603,
NM_027671:UTR5	chr1	-	8794024	9289811	8794024	9289811	5	8794024,8872241,8989266,9193259,9288213,	8794051,8872295,8989330,9193341,9289811,
NM_175370:UTR5	chr1	-	58749257	58752833	58749257	58752833	2	58749257,58752745,	58749289,58752833,
NM_175642:UTR5	chr1	-	25883605	25886552	25883605	25886552	2	25883605,25886306,	25883620,25886552,
NM_178884:UTR5	chr1	-	75502799	75503027	75502799	75503027	1	75502799,	75503027,
NM_198680:UTR5	chr1	-	109056354	109057691	109056354	109057691	2	109056354,109057656,	109056379,109057691,
NM_199021:UTR5	chr1	-	125942020	125942136	125942020	125942136	1	125942020,	125942136,
Clone this wiki locally