-
Notifications
You must be signed in to change notification settings - Fork 4
Thetas
This method estimates Thetas values and other neutrality statistics. Please see ANGSD for full details on this method.
To run this method, use the following command
angsd-wrapper Thetas Thetas_Config
where Thetas_Config
is the full path to the configuration file for the theta calculations. Depending upon what values are assigned within the Thetas_Config
file, this wrapper will either use the user-specified site allele frequency likelihood and site frequency spectrum files, or it will generate new files by calling the SFS wrapper. For instance, if a user had previously used the SFS wrapper to generate files, they could supply the filepaths to the Thetas config file and avoid overusing computing resources.
Without supplied filepaths, the OVERRIDE
flag (by default set to true
) and the existence of other data within the SFS
directory determine if new data is generated. If overriding is allowed or the project's SFS directory doesn't contain valuable files, then the wrapper will call the SFS wrapper to calculate new SFS values and continue its diversity calculations. Otherwise, it will exit without potentially destroying the previous data.
In all cases, FOLD
should be set according to the expected SFS format, as either 0
(the default and for unfolded spectra) or 1
(for folded spectra).
All inputs should be specified in Thetas_Config
.
This method does make use of Common_Config
, those that are used are listed below:
Variable | Function |
---|---|
SAMPLE_LIST GROUP_SAMPLES on dev
|
A list of samples to be used in calculations |
SAMPLE_INBREEDING GROUP_INBREEDING on dev
|
A list of inbreeding coefficients, where each line here corresponds to a line in SAMPLE_LIST or GROUP_SAMPLES on dev
|
ANC_SEQ |
Path to ancestral sequence |
PROJECT |
Name given to all outputs in ANGSD-wrapper |
SCRATCH |
Place to store files, the full path is SCRATCH/PROJECT/Thetas
|
REGIONS |
Limit the scope of ANGSD-wrapper to certain regions |
UNIQUE_ONLY |
Use uniquely mapped reads only |
MIN_BASEQUAL |
Minimum base quality score |
BAQ |
Adjust Q scores around indels |
MIN_IND |
Minimum number of individuals needed to use this site |
GT_LIKELIHOOD |
Estimates genotype likelihoods |
MIN_MAPQ |
Minimum base mapping quality |
N_CORES |
Number of cores to use, please do not set above the limits of your system |
DO_MAJORMINOR |
Estimate major/minor alleles |
DO_MAF |
Calculate per-site frequencies |
These variables are specific to this method:
Variable | Function |
---|---|
SFS |
The site frequency spectrum file. Will be auto-generated if the filepath is empty or doesn't exist, or can be supplied by the DerivedSFS file from the SFS wrapper's output |
SAF |
The site allele frequency likelihood index file. Will be auto-generated if the filepath is empty or doesn't exist, or can be supplied by the 'SFSOut.saf.idx' file from the SFS wrapper's output |
FOLD |
This flag determines if the thetas are calculated assuming a folded or unfolded SFS, and is set to 0 as a default. |
The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:
Parameter | Function |
---|---|
DO_SAF |
Creates a site frequency spectrum |
OVERRIDE |
If true , will recalculate files that already exist |
SLIDING_WINDOW |
Enable sliding window analysis |
WIN |
Window size for sliding window analysis |
STEP |
Step size for sliding window analysis |
Naming Scheme | Contents |
---|---|
PROJECT_Diversity.thetas.gz |
Diversity statistics |
PROJECT_Diversity.thetas.idx |
Index of diversity statistics |
PROJECT_Diversity.thetas.idx.pestPG |
Final Thetas estimations |
PROJECT_Thetas.graph.me |
Final Thetas visualization for our Shiny interface |
PROJECT_Thetas.graph.me
can be visualized with the Shiny graphing interface. A web browser with a graphical user interface is required.