-
Notifications
You must be signed in to change notification settings - Fork 3
Wang Yunfei edited this page Feb 9, 2017
·
1 revision
DB class is used to fetch data from various data types. Currently we have two uniform interfaces for all the supporting data formats.
- fetch: Fetch elements from a specific genomic region. It returns a list of objects.
- pileup: Pileup all the elements in a specific genomic regions. It returns an array or a string.
- Annotation files: Any text files with chromsome, start and end information in each line, such as bed, genepred, gff and bedgraph.
- Sequences files: Sequences files in Fasta or 2bit format.
- Bam files: Sorted bam files.
- BigWig files: BigWig files created from wig files.
from ngslib import DB
with DB(infile,dbtype) as tdb:
for elem in tdb.fetch(chrom=None,start=None ,stop = None,strand = ".",zerobased=True,**kwargs):
print elem
pileuped = tdb.pileup(chrom=None,start=None ,stop = None,strand = ".",zerobased=True,**kwargs)
print pileuped[0:10]
- chrom: None means fetch all the chromosomes. Otherwise specify the chromosome name as a string. Return an empty list if chromosome is not found.
- start: None means the start of the chromosome. Otherwise specify the start as a integer. start position is included.
- stop: None means the end of the chromosome. Otherwise specify the start as a integer. stop position is not included.
- strand: "." means fetch or pileup on both strands. Otherwise specify the strand to either '+' or '-'.
- zerosbased: By default is True, which means the index of the first base is 0. Otherwise 1.
- **kwargs: Reserved for future use. Additional parameters for some special needs of some data formats.
from ngslib import DB
with DB('hg19.fa','fasta') as fadb: # or fadb = DB('hg19.2bit','2bit')
seq = fadb.fetch('chr1',100,200,'-') # pileup will be the same in this case
print seq.upper()
Output:
TTAGGGTTAGGGTTAGGGTTAGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTA
from ngslib import DB
with DB('test.bam','bam') as bamfh:
# fetch
for read in bamfh.fetch('chr2',10000,12000):
print read
# pileup
depth = bamfh.pileup('chr2',10000,12000)
print depth.shape,max(depth)