Instructions for the sequence phylogeny activity:
-
Take some time to get to know your data.
-
Count how many fasta files (DNA seqs from different organisms) you have in each file.
-
Combine all of the sequences into one large fasta file
-
Make a subdirectory in the parent directory called “raw data”
-
Move your original fasts into the raw data folder (do this so you keep raw data safe while running other commands)
-
Use the combined fasta file to grab only the headers of each fasta into a file that will be a list of the fasta headers (names). Name this file “headers.txt”
-
Now, count how many sequences made it into your combined fasta to make sure all the sequences did in fact get added. (You should get the same # you got in part 2)
- Add this count to the headers.txt file. Make sure you dont wipe all the headers out of that file.
- Go to this website https://www.genome.jp/tools-bin/mafft Use the alignment tool to align your fasta files (the all_seqs file) and generate a phylogenetic tree.
Challenge: Use the challenge seqs folder and from that parent directory use one bash script to go into each subfolder and extract the # of fastas in each file in each folder. Then save the output from each into a file in the parent directory. (Hint - use a for loop).