Skip to content
Samuel Hamann edited this page May 5, 2021 · 4 revisions

This method estimates Thetas values and other neutrality statistics. Please see ANGSD for full details on this method.

Basic Usage

To run this method, use the following command

angsd-wrapper Thetas Thetas_Config

where Thetas_Config is the full path to the configuration file for the theta calculations. Depending upon what values are assigned within the Thetas_Config file, this wrapper will either use the user-specified site allele frequency likelihood and site frequency spectrum files, or it will generate new files by calling the SFS wrapper. For instance, if a user had previously used the SFS wrapper to generate files, they could supply the filepaths to the Thetas config file and avoid overusing computing resources.

Without supplied filepaths, the OVERRIDE flag (by default set to true) and the existence of other data within the SFS directory determine if new data is generated. If overriding is allowed or the project's SFS directory doesn't contain valuable files, then the wrapper will call the SFS wrapper to calculate new SFS values and continue its diversity calculations. Otherwise, it will exit without potentially destroying the previous data.

In all cases, FOLD should be set according to the expected SFS format, as either 0 (the default and for unfolded spectra) or 1 (for folded spectra).

Input files

All inputs should be specified in Thetas_Config.

Common Variables

This method does make use of Common_Config, those that are used are listed below:

Variable Function
SAMPLE_LIST
GROUP_SAMPLES on dev
A list of samples to be used in calculations
SAMPLE_INBREEDING
GROUP_INBREEDING on dev
A list of inbreeding coefficients, where each line here corresponds to a line in SAMPLE_LIST or GROUP_SAMPLES on dev
ANC_SEQ Path to ancestral sequence
PROJECT Name given to all outputs in ANGSD-wrapper
SCRATCH Place to store files, the full path is SCRATCH/PROJECT/Thetas
REGIONS Limit the scope of ANGSD-wrapper to certain regions
UNIQUE_ONLY Use uniquely mapped reads only
MIN_BASEQUAL Minimum base quality score
BAQ Adjust Q scores around indels
MIN_IND Minimum number of individuals needed to use this site
GT_LIKELIHOOD Estimates genotype likelihoods
MIN_MAPQ Minimum base mapping quality
N_CORES Number of cores to use, please do not set above the limits of your system
DO_MAJORMINOR Estimate major/minor alleles
DO_MAF Calculate per-site frequencies

Method-Specific Variables

These variables are specific to this method:

Variable Function
SFS The site frequency spectrum file. Will be auto-generated if the filepath is empty or doesn't exist, or can be supplied by the DerivedSFS file from the SFS wrapper's output
SAF The site allele frequency likelihood index file. Will be auto-generated if the filepath is empty or doesn't exist, or can be supplied by the 'SFSOut.saf.idx' file from the SFS wrapper's output
FOLD This flag determines if the thetas are calculated assuming a folded or unfolded SFS, and is set to 0 as a default.

Method Parameters

The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:

Parameter Function
DO_SAF Creates a site frequency spectrum
OVERRIDE If true, will recalculate files that already exist
SLIDING_WINDOW Enable sliding window analysis
WIN Window size for sliding window analysis
STEP Step size for sliding window analysis

Output files

Naming Scheme Contents
PROJECT_Diversity.thetas.gz Diversity statistics
PROJECT_Diversity.thetas.idx Index of diversity statistics
PROJECT_Diversity.thetas.idx.pestPG Final Thetas estimations
PROJECT_Thetas.graph.me Final Thetas visualization for our Shiny interface

Visualization

PROJECT_Thetas.graph.me can be visualized with the Shiny graphing interface. A web browser with a graphical user interface is required.