-
Notifications
You must be signed in to change notification settings - Fork 0
EIAV Genotyping Tools
The EIAV extension of Lentivirus-GLUE provides functionality for genotyping EIAV sequences via maximum likelihood. Genotyping can be performed on any sequence of adequate length (typically >300 nucleotides are required for confident assignment). Any genomic region can be genotyped using the approach implemented in EIAV-GLUE.
Classification is based on maximum likelihood clade assignment (MLCA) as implemented in GLUE. Sequences are classified into genotypes defined via phylogenetic analysis of full-length reference genome sequences.
Lentivirus-GLUE employs a robust genotyping method called Maximum Likelihood Clade Assignment (MLCA) to assign EIAV sequences to genotypes and lineages.
MLCA is based on the Evolutionary Placement Algorithm (EPA), a feature of the highly optimized RAxML software. RAxML typically generates complete phylogenetic trees from multiple sequence alignments, but EPA allows for efficient clade assignment by placing new sequences onto an existing reference tree without recalculating the entire phylogeny. This efficiency makes EPA well-suited for virus sequence clade assignment, forming the foundation of the MLCA method integrated into GLUE.
In GLUE, the MLCA process is implemented using the maxLikelihoodGenotyper
and maxLikelihoodPlacer
modules.
The genotyping process in Lentivirus-GLUE can be executed through the command-line interface. Below is an example of using the MLCA genotyping module:
Mode path: /project/lentivirus
GLUE> module eiavMaxLikelihoodGenotyper genotype sequence -w "sequenceID = 'AF170362'"
This command processes the sequences in the specified FASTA file and outputs the assigned genotype for each sequence:
+==============================+====================+
| queryName | genotypeFinalClade |
+==============================+====================+
| ncbi-nuccore-equine/AF170362 | AL_TREE_EIAV_Am |
+==============================+====================+
The MLCA algorithm operates in three stages: alignment, placement, and neighbor-weighting. Each stage plays a crucial role in accurately assigning query sequences to predefined clades.
-
Alignment Stage:
The first step involves aligning the query sequences to a reference set of EIAV sequences. This is achieved using the MAFFT software, specifically the--add
and--keeplength
options, which integrate query sequences into the existing multiple sequence alignment without altering the original alignment's structure. Each query sequence is aligned independently, ensuring that the alignment computations remain isolated for each sequence. -
Placement Stage:
In the placement stage, the extended alignment from the previous step is combined with a fixed reference tree. For each query sequence, the algorithm identifies potential placements on the tree that maximize the likelihood of the extended tree structure. Using RAxML's EPA subsystem, the algorithm inserts the query sequence at various points on the tree, optimizing the branch lengths and positions to find the most likely placements. A small set of high-likelihood placements is retained for further analysis. -
Neighbor-Weighting Stage:
The final stage of the MLCA algorithm is neighbor-weighting, which summarizes the placement results by calculating clade weightings for each query sequence. The algorithm evaluates the evolutionary distance between the query sequence and its closest neighboring reference sequences. Since these neighbors are already assigned to specific clades, their proximity provides evidence for the query sequence's clade assignment. The closer the neighbor, the stronger the evidence. The algorithm then assigns the query sequence to the clade if the calculated weight exceeds a predefined threshold.This neighbor-weighting mechanism relies on the evolutionary distances in the phylogenetic tree, where shorter branch lengths indicate closer genetic relationships. By focusing on nearby reference sequences, the algorithm effectively assigns query sequences to the most appropriate clades based on genetic similarity.
The integration of MLCA within Lentivirus-GLUE offers a powerful and efficient tool for EIAV genotyping. By leveraging the EPA feature of RAxML and the structured approach of MLCA, the method provides a high level of accuracy and computational efficiency, making it well-suited for large-scale sequence analysis in both research and clinical settings.