-
Notifications
You must be signed in to change notification settings - Fork 3
SeqUtils
Wang Yunfei edited this page Feb 9, 2017
·
1 revision
- wFormatFasta.py
- wGCContent.py
- wGetSeqByName.py
- wGetSeqByPosition.py
- In Fasta file, always set all sequence lines with fixed length less than 120 characters. (We set it to 100bp for fast calculation.)
- Sequence names with spaces or tabs are not recommended. The contents after spaces or tabs are usually omitted by most programs. (Use "_" or "|" to replace spaces and tabs if possible.)
- Sequences with spaces or tabs are not allowed.
- Coordinates in Fasta file are 1-bases.
- We accept input from "stdin" or "pipe" (set input file name to "stdin"), and by default the output is "stdout".
Example: python wFormatFasta.py -i test.fa -l 100 -o test_formated.fa
usage: wFormatFasta.py [-h] -i input.fa [-l length] [-o output.fa] Format sequences in Fasta file to fixed length. Options: -h, --help show this help message and exit -i input.fa, --input input.fa Fasta file name. Can be "stdin". -l length, --length length Length of each line. Default is 100. -o output.fa, --output output.fa Output file name. Default is stdout. dependency ngslib
Example: python wGCContent.py -i test.test -o test.gc
usage: wGCContent.py [-h] -i input.fa [-o output.gc] Calculate GC content of Fasta file. Options: -h, --help show this help message and exit -i input.fa, --input input.fa Fasta file name. Can be "stdin". -o output.gc, --output output.gc GC content file name. Default = stdout. dependency ngslib
Example: python wGetSeqByName.py -i test.fa -n names.lst -o test_with_names.fa
usage: wGetSeqByName.py [-h] -i input.fa -n names.lst [-o output.fa] Get sequences by a list of names. Options: -h, --help show this help message and exit -i input.fa, --input input.fa Fasta file name. -n names.lst, --names names.lst A file with sequence names. Can be "stdin". -o output.fa, --output output.fa Output file name. Default is "stdout". dependency ngslib
Example: python wGetSeqByCoordinates.py -i test.fa -r 'chr1:-:-'
usage: wGetSeqByCoordinates.py [-h] -i input.fa [-r chr1:100-200:+] [-c chrom] [-s start] [-e end] [-t strand] [-o output.fa] Get a fragment from a Fasta file. Options: -h, --help show this help message and exit -i input.fa, --input input.fa Fasta file name. -r chr1:100-200:+, --region chr1:100-200:+ Chromosome region. Leave it empty if not applicable, i.e. "chr1:100-:-". -c chrom, --chrom chrom chromosome name. -s start, --start start start coordinate. Default: begining of the chromosome. -e end, --end end end coordiante. Default: end of the chromosome. -t strand, --strand strand strand. Default: "+" -o output.fa, --output output.fa Output file name. Default: stdout dependency ngslib