Using arlecore_Fst_batch.pl script #1

cklinge2 · 2014-08-08T19:17:12Z

I am attempting to use the arlecore_Fst_batch.pl script. I am, however, getting stuck trying to figure out how to make an appropriate count file in the Mothur format. If using ITEP to find 'core' genes and make alignments for input into arlecore, what do the rows and the "total" column represent in the Mothur count file?

Thanks in advance!

nick-youngblut · 2014-08-08T19:30:40Z

An example of the count file format can be found at: http://www.mothur.org/wiki/Count_File.
Each column except for ’total’ will be considered a different population by arlecore. The 'total' column is just a sum of each taxon's abundances in all populations. Each row is a difference taxon (genome), so the first value in each row should be a taxon name (FIG ID if using ITEP). The '-delimiter' flag in arlecore_Fst_batch.pl can be used to remove any additional info in gene names (e.g., PEG ID or annotations).

cklinge2 · 2014-08-12T16:55:37Z

Great, thanks for the information.

To clarify with a simplified example, let’s say I’m interested in two microbial populations that have 2 ‘core’ genes (with the first 2 strains from Pop1 and the remaining 3 strains from Pop2).
Would the count file look something like this, with identical columns between the 2 gene clusters?

arlecore_Fst_batch.pl -count mothur_countfile.txt -min 2 1 nt_alignment1_pal2nal.fasta nt_alignment2_pal2nal.fasta > Fst_res.txt

nick-youngblut · 2014-08-12T17:35:00Z

You almost have the count file correct. The 'total' and 'Pop[12]' columns look good; however, the 'Represenative_Sequence' values should be the taxon names (e.g., 'fig|395491.1' or 'Methanosarcina_mazei_Go1'). The PEG IDs will make the count file specific for 1 gene cluster, but the point of the script is to run it on many gene clusters. This is where the '-delimiter' flag comes in: to remove the PEG IDs from the gene cluster fasta sequence names, you should can use '-delimiter ".peg." '. This will strip the PEG ID off the sequence names so that they can be mapped to the taxon names in the count file.

cklinge2 · 2014-08-12T18:35:01Z

Ah ok, makes perfect sense!

After adjusting my files and running the script as follows, I'm getting an error (maybe something wrong with delimiter flag?):

arlecore_Fst_batch.pl -count mothur_countfile_1.txt -min 2 1 -delimiter ".peg." 1387_nt_alignment_pal2nal.fasta 3357_nt_alignment_pal2nal.fasta > Fst_res_test.txt

/home/gtl-shared-2/ckling2-big/ITEP/clusterDbAnalysis/1387_nt_alignment_pal2nal.fasta Did_not_pass_-min
Mothur error: '
Removing group: Cstrains because all sequences have been removed.

Removing group: Nstrains because all sequences have been removed.
[ERROR]: fig|395491.21.peg.1816__1 is not in your count table. Please correct.

mothur > quit()
' at /usr/local/bin/arlecore_Fst_batch.pl line 387.

Here are truncated versions of my count file and first fasta alignment:

Representative_Sequence total Cstrains Nstrains
fig|395491.21 1 0 1
fig|395491.17 1 1 0

fig|395491.21.peg.1816
GTGCAGCAGAACATCGCCCATCTGCCGGCCGCCGACCGCGAGGCGATCGCAGCCTATCTG
AAGGCGGTGCCGGGCCAT---------------
fig|395491.17.peg.10165
GTGCAGCAGAACATCGCCCATCTGCCGACCGCCGACCGCGAGGCGATCGCCGCCTATCTG
AAGGCCGTGCCGGGACGC---------------

nick-youngblut · 2014-08-12T18:44:00Z

Are you missing the '>' from the sequence names in your fasta files? That would explain the error, since the script wouldn't be able to find any sequences in the fasta files.

cklinge2 · 2014-08-12T18:56:20Z

Nope, the '>' symbols are there, they just got removed when I posted them on github (This may or may not be useful, but the script seems to work fine for a single gene cluster with PEG IDs included in the count file).

nick-youngblut · 2014-08-12T23:58:13Z

OK. The bug should be fixed. Keep in mind that the script was tested with ARLECORE v 3.5.1.3 (17.09.11), and other versions may not work with the default *ars and *arp files produced by the script.

cklinge2 · 2014-08-13T15:17:20Z

Everything seems to be working now, thanks for all the assistance!

cklinge2 · 2014-10-27T16:03:59Z

Hi Nick,

In the help file of the script it says that a custom .ars file can be provided with the -ars flag. I have created my own .ars file in the windows version of Arlequin, but even with different settings the script only prints out pairwise Fst estimates. Is there a straightforward way to perform additional Arlecore tests (aside from Fst) on multiple gene alignments (e.g. Tajima's D) using the arlecore_Fst_batch.pl script?

nick-youngblut · 2014-11-06T18:48:38Z

I for my requirements, I only needed the script to parse out the Fst values (and P values) from the arlecore output. Making the parser all-encompassing for all info in the arlecore would be a lot more involved. However, if you need just one item such as Tajima's D, I could probably code that without too much trouble.

cklinge2 · 2014-11-06T19:14:03Z

It would be great to get estimates of Tajima's D (and P values), so if it's not a big project, I would definitely get some good use out of that code.
Many thanks!

cklinge2 closed this as completed Aug 13, 2014

cklinge2 reopened this Oct 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using arlecore_Fst_batch.pl script #1

Using arlecore_Fst_batch.pl script #1

cklinge2 commented Aug 8, 2014

nick-youngblut commented Aug 8, 2014

cklinge2 commented Aug 12, 2014

nick-youngblut commented Aug 12, 2014

cklinge2 commented Aug 12, 2014

nick-youngblut commented Aug 12, 2014

cklinge2 commented Aug 12, 2014

nick-youngblut commented Aug 12, 2014

cklinge2 commented Aug 13, 2014

cklinge2 commented Oct 27, 2014

nick-youngblut commented Nov 6, 2014

cklinge2 commented Nov 6, 2014

Using arlecore_Fst_batch.pl script #1

Using arlecore_Fst_batch.pl script #1

Comments

cklinge2 commented Aug 8, 2014

nick-youngblut commented Aug 8, 2014

cklinge2 commented Aug 12, 2014

nick-youngblut commented Aug 12, 2014

cklinge2 commented Aug 12, 2014

nick-youngblut commented Aug 12, 2014

cklinge2 commented Aug 12, 2014

nick-youngblut commented Aug 12, 2014

cklinge2 commented Aug 13, 2014

cklinge2 commented Oct 27, 2014

nick-youngblut commented Nov 6, 2014

cklinge2 commented Nov 6, 2014