Skip to content
Wang Yunfei edited this page Feb 9, 2017 · 1 revision

DB class (To be continued)

DB class is used to fetch data from various data types. Currently we have two uniform interfaces for all the supporting data formats.

  • fetch: Fetch elements from a specific genomic region. It returns a list of objects.
  • pileup: Pileup all the elements in a specific genomic regions. It returns an array or a string.

Supported formats

  1. Annotation files: Any text files with chromsome, start and end information in each line, such as bed, genepred, gff and bedgraph.
  2. Sequences files: Sequences files in Fasta or 2bit format.
  3. Bam files: Sorted bam files.
  4. BigWig files: BigWig files created from wig files.

Uniform interfaces

from ngslib import DB
with DB(infile,dbtype) as tdb:
    for elem in tdb.fetch(chrom=None,start=None ,stop = None,strand = ".",zerobased=True,**kwargs):
        print elem
    pileuped = tdb.pileup(chrom=None,start=None ,stop = None,strand = ".",zerobased=True,**kwargs)
    print pileuped[0:10]

Parameters:

  • chrom: None means fetch all the chromosomes. Otherwise specify the chromosome name as a string. Return an empty list if chromosome is not found.
  • start: None means the start of the chromosome. Otherwise specify the start as a integer. start position is included.
  • stop: None means the end of the chromosome. Otherwise specify the start as a integer. stop position is not included.
  • strand: "." means fetch or pileup on both strands. Otherwise specify the strand to either '+' or '-'.
  • zerosbased: By default is True, which means the index of the first base is 0. Otherwise 1.
  • **kwargs: Reserved for future use. Additional parameters for some special needs of some data formats.

Examples

1. Fetch sequences from Fasta or 2bit file format

from ngslib import DB
with DB('hg19.fa','fasta') as fadb: # or fadb = DB('hg19.2bit','2bit')
    seq = fadb.fetch('chr1',100,200,'-') # pileup will be the same in this case
    print seq.upper()

Output:

TTAGGGTTAGGGTTAGGGTTAGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTA

2. Fetch/Pileup reads from Bam file.

from ngslib import DB
with DB('test.bam','bam') as bamfh:
    # fetch
    for read in bamfh.fetch('chr2',10000,12000):
        print read
    # pileup
    depth = bamfh.pileup('chr2',10000,12000)
    print depth.shape,max(depth)
Clone this wiki locally