-
Notifications
You must be signed in to change notification settings - Fork 0
Using arlecore_Fst_batch.pl script #1
Comments
An example of the count file format can be found at: http://www.mothur.org/wiki/Count_File. |
Great, thanks for the information. To clarify with a simplified example, let’s say I’m interested in two microbial populations that have 2 ‘core’ genes (with the first 2 strains from Pop1 and the remaining 3 strains from Pop2). arlecore_Fst_batch.pl -count mothur_countfile.txt -min 2 1 nt_alignment1_pal2nal.fasta nt_alignment2_pal2nal.fasta > Fst_res.txt Representative_Sequence total Pop1 Pop2 fig|395491.1.peg.6517 1 1 0 |
You almost have the count file correct. The 'total' and 'Pop[12]' columns look good; however, the 'Represenative_Sequence' values should be the taxon names (e.g., 'fig|395491.1' or 'Methanosarcina_mazei_Go1'). The PEG IDs will make the count file specific for 1 gene cluster, but the point of the script is to run it on many gene clusters. This is where the '-delimiter' flag comes in: to remove the PEG IDs from the gene cluster fasta sequence names, you should can use '-delimiter ".peg." '. This will strip the PEG ID off the sequence names so that they can be mapped to the taxon names in the count file. |
Ah ok, makes perfect sense! After adjusting my files and running the script as follows, I'm getting an error (maybe something wrong with delimiter flag?): arlecore_Fst_batch.pl -count mothur_countfile_1.txt -min 2 1 -delimiter ".peg." 1387_nt_alignment_pal2nal.fasta 3357_nt_alignment_pal2nal.fasta > Fst_res_test.txt /home/gtl-shared-2/ckling2-big/ITEP/clusterDbAnalysis/1387_nt_alignment_pal2nal.fasta Did_not_pass_-min Removing group: Nstrains because all sequences have been removed. mothur > quit() Here are truncated versions of my count file and first fasta alignment: Representative_Sequence total Cstrains Nstrains
|
Are you missing the '>' from the sequence names in your fasta files? That would explain the error, since the script wouldn't be able to find any sequences in the fasta files. |
Nope, the '>' symbols are there, they just got removed when I posted them on github (This may or may not be useful, but the script seems to work fine for a single gene cluster with PEG IDs included in the count file). |
OK. The bug should be fixed. Keep in mind that the script was tested with ARLECORE v 3.5.1.3 (17.09.11), and other versions may not work with the default *ars and *arp files produced by the script. |
Everything seems to be working now, thanks for all the assistance! |
Hi Nick, In the help file of the script it says that a custom .ars file can be provided with the -ars flag. I have created my own .ars file in the windows version of Arlequin, but even with different settings the script only prints out pairwise Fst estimates. Is there a straightforward way to perform additional Arlecore tests (aside from Fst) on multiple gene alignments (e.g. Tajima's D) using the arlecore_Fst_batch.pl script? |
I for my requirements, I only needed the script to parse out the Fst values (and P values) from the arlecore output. Making the parser all-encompassing for all info in the arlecore would be a lot more involved. However, if you need just one item such as Tajima's D, I could probably code that without too much trouble. |
It would be great to get estimates of Tajima's D (and P values), so if it's not a big project, I would definitely get some good use out of that code. |
I am attempting to use the arlecore_Fst_batch.pl script. I am, however, getting stuck trying to figure out how to make an appropriate count file in the Mothur format. If using ITEP to find 'core' genes and make alignments for input into arlecore, what do the rows and the "total" column represent in the Mothur count file?
Thanks in advance!
The text was updated successfully, but these errors were encountered: