-
Notifications
You must be signed in to change notification settings - Fork 3
FastaFile
Wang Yunfei edited this page Feb 9, 2017
·
2 revisions
- FastaFile is an alternative way of TwoBitFile to read huge genome files. It is included in the pysam package.
- Pros: Do not need to convert Fasta file into other format, but just a simple index file created by Samtools.
- Cons: Taking too much space (4 times to 2bit files); Sequence line should have the same length except the last one. Doesn't have chromosome size interface.
Definition of FastaFile class:
class FastaFile(object):
'''
Fasta file fast reader. Usually used for huge genome fasta files.
Usage:
Open file:
fio=FastaFile("K12.fa")
Get Sequence:
fio.getSeq(chrom="K12",start=100,stop=200,strand="+")
Close file:
fio.close()
Parameters:
chrom=None: return empty string.
start=None: start at first position
stop=None: stop at the end of record.
strand: default "+"
'''
6 lines: def __init__(self,fname=USERHOME+"/Data/hg19/hg19.fa"):---------
6 lines: def getSeq(self,chrom,start=None,stop=None,strand="+"):---------
5 lines: def close(self):------------------------------------------------
4 lines: def __del__(self):----------------------------------------------
Example: get sequences from fa file.
Note: FastaFile will create an index for the fasta file like *.fa.fai when the it is loaded the first time. This step may take a while if the fasta file is huge.
from ngslib import FastaFile
fio=FastaFile("test.fa")
seq = fio.getSeq(chrom="K12",start=100,stop=200,strand="+")
print seq
fio.close() # file will be closed automatically if forget to close it here.
Output:
AATATGAAGTTCTTTAGCATAACAAGGATCTGCCTTTGTAAAAGAAaaagaaagaaagagcgaaagaaagaaaAGAACTGAGGACAGCATTCTTTTCTCT