Sequence tools

Some useful sequence tools written in Python - Now updated for Python3 :

replace_heads.py

Given a txt file of new headers, replaces existing headers in an mf file.

heditor.py

Edits Genbank headers to include species name and accession number only.

find_seq_overlap.py

Gets position of overlapping sequences on a given target sequence.

gbk_to_proteome.py

Gets all protein sequences from a gbk file.

 splice.py

To get sequence based on positions.

longest_orf.py

To find the longest orf from all frames and identify presence of in-frame stop codons.

get_seq_go.py

Get sequence from a multifasta file based on id.

reverse_complement.py

Reverse complements a sequence.

phd_to_fasta.py

Gets fasta sequence from phd file.

mulif_to_singlef.py

Separates multi fasta file to single fasta files.

remove_duplicate_fasta.py

Removes identical sequences (with identical headers).

extract_exonerate_output.py

Separates exonerate output into gff, orf and cds.

exonerate_highest_score.py

Gets highest scoring sequence alignment from exonerate output (when you don't want to use bestn).

nhmmer_or_hmmsearch_to_fasta.py

Gets fasta sequence for protein hits from hmmsearch or hit regions(dna) from nhmmer.

unique_headers.py

Removes characters ":,.()%*" from fasta file headers, truncates header after first white space and adds a unique id.

unique_ids.py

Removes existing headers in a fasta file and replaces them with unique ids for each sequence ( >1, >2, >3 ...)

split_fasta_ntimes.py

Splits multi-fasta file into smaller multi-fasta files by N sequences

get_by_size.py

Extracts all sequences from an mf file that are >= a desired length.

compare_txt.py

Compares two text files (e.g can use to compare accession ids) and prints any ids that are not in both text files.

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
Remove-duplicate-fasta		Remove-duplicate-fasta
add_fasta_headers		add_fasta_headers
compare_txt		compare_txt
exonerate_highest_score		exonerate_highest_score
extract_exonerate		extract_exonerate
fasta2ali		fasta2ali
find_seq_overlap		find_seq_overlap
gbk_to_proteome		gbk_to_proteome
get_by_size		get_by_size
get_seq_go		get_seq_go
gff_gtf_stats		gff_gtf_stats
heditor		heditor
longest_orf		longest_orf
multif_to_singlef		multif_to_singlef
nhmmer_or_hmmsearch_to_fasta		nhmmer_or_hmmsearch_to_fasta
phd_to_fasta		phd_to_fasta
replace_headers		replace_headers
reverse_complement		reverse_complement
splice_sequence		splice_sequence
split_fasta_ntimes		split_fasta_ntimes
unique_headers		unique_headers
unique_ids		unique_ids
README.md		README.md