Some useful sequence tools written in Python - Now updated for Python3 :
replace_heads.py
Given a txt file of new headers, replaces existing headers in an mf file.
heditor.py
Edits Genbank headers to include species name and accession number only.
find_seq_overlap.py
Gets position of overlapping sequences on a given target sequence.
gbk_to_proteome.py
Gets all protein sequences from a gbk file.
splice.py
To get sequence based on positions.
longest_orf.py
To find the longest orf from all frames and identify presence of in-frame stop codons.
get_seq_go.py
Get sequence from a multifasta file based on id.
reverse_complement.py
Reverse complements a sequence.
phd_to_fasta.py
Gets fasta sequence from phd file.
mulif_to_singlef.py
Separates multi fasta file to single fasta files.
remove_duplicate_fasta.py
Removes identical sequences (with identical headers).
extract_exonerate_output.py
Separates exonerate output into gff, orf and cds.
exonerate_highest_score.py
Gets highest scoring sequence alignment from exonerate output (when you don't want to use bestn).
nhmmer_or_hmmsearch_to_fasta.py
Gets fasta sequence for protein hits from hmmsearch or hit regions(dna) from nhmmer.
unique_headers.py
Removes characters ":,.()%*" from fasta file headers, truncates header after first white space and adds a unique id.
unique_ids.py
Removes existing headers in a fasta file and replaces them with unique ids for each sequence ( >1, >2, >3 ...)
split_fasta_ntimes.py
Splits multi-fasta file into smaller multi-fasta files by N sequences
get_by_size.py
Extracts all sequences from an mf file that are >= a desired length.
compare_txt.py
Compares two text files (e.g can use to compare accession ids) and prints any ids that are not in both text files.