-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update mutation model to mimic an HKY85 model #21
Comments
Here is a simplified version of the above idea, including a new term that accounts for functional constraint. It is inspired by equations 1-3 from a recent where:
Specifically, we define where:
One way that the above idea violates the HYK model is that viruses can only get a maximum of one mutation per gene per day, since there is a maximum of one mutation event per day. Though, the mutation rate in antigen is quite low ( Erick pointed out that a more accurate approach would be to take a gene, cycle over each nucleotide, and apply the rate matrix (e.g., |
I think this all looks good. The only thing that I can add is that in general, if one has a continuous-time Markov chain, there is a notion of a "jump chain" which is the discrete transition story after a single event. See, e.g. section 4 of this. The definition is quite simple: the probability of moving to another state I believe that's exactly the statement being made here... if not let's discuss more. If we're on the right track, what is the right computing formalism that will allow us to use these complex models without too much overhead? |
Thanks, Erick. Good to know about "jump chain". I also believe that's the statement being made here. Thien and I just discussed an idea for how to implement this in the code without too much overhead. Either she or I will post something soon. |
In$\mu$ . In the original version of
antigen
, there is a mutation rateantigen
, which did not model viral sequences, mutations changed a virus's antigenic phenotype. In our updated version, which does model sequences, each mutation also results in a single nucleotide change.When a mutation event occurs in$\kappa$ . This approach is similar to nucleotide-mutation models used in phylogenetics, but differs in a few ways.
antigen
, we need to decide how to mutate the sequence. Our current strategy involves two basic steps. First, we randomly choose a site to mutate in the gene (e.g., C12). Second, we randomly choose a nucleotide-level mutation given the wildtype nucleotide and a pre-specified transition/transversion ratioThe goal of this issue is to update the mutation model in$x$ mutates to nucleotide $y$ is:
antigen
so that mutations occur at rates proportional to ones from the HKY85 mutation model. In this model, the rate at which nucleotidewhere$\pi_y$ gives the expected frequency of nucleotide $y$ in the absence of selection. Our current mutation model in $\pi_y$ parameters in choosing $y$ . Second, by choosing a random site to mutate in the first step from above, it ignores that a site's mutation rate depends on the wildtype nucleotide $x$ .
antigen
differs from the HKY85 model in two ways. First, it does not account forBelow is an idea for how to update the mutation model in$\mu$ parameter with $Q_{xy}$ parameters. However, in $\mu$ accounts for both mutation and selection. The virus population in a host is represented by a single virus with a defined antigenic phenotype and gene sequence. According to the literature, mutations seem to rarely fix in hosts. Thus, $\mu$ could be interpreted as the rate at which a mutation first occurs, then increases to an appreciable frequency, and then is involved in a transmission event. Since it is difficult to model each component of this process, the below idea retains the concept of $\mu$ , but ensures that mutations occur at rates proportional to the HKY85 model. Specifically, when a mutation event occurs, the new model uses the following workflow to determine how to mutate the nucleotide sequence.
antigen
. A simple idea is to replace the singleantigen
,Step 1: choose a site to mutate: In the HKY85 model, the probability that a site mutates depends on its wildtype nucleotide. Specifically, the rate at which a nucleotide$x$ mutates to any of the other three nucleotides is:
where the expression sums over each possible mutant nucleotide. Under this model, if there is a single nucleotide mutation somewhere in a gene sequence, then the probability that the mutation involves a wildtype nucleotide of$x$ is:
where$n_x$ is the number of counts of nucleotide $x$ in the gene sequence, and the expression in the denominator sums over each of the four nucleotides.
We will mimic this process in the following way. When there is a mutation event in$P_x$ . Then, we will make a list of all sites in the gene sequence with the randomly selected wildtype nucleotide (e.g., C12, C24, C28, etc.), and randomly choose one of those sites to mutate (e.g., C12). This list could be stored in a dictionary and updated for computational efficiency.
antigen
, we will first randomly select one of the four nucleotides to be the wildtype nucleotide in the mutation, with selection probabilities ofStep 2: choose a mutant nucleotide: Once we have chosen a site to mutate, we will randomly select a mutant nucleotide$y$ with probabilities proportional to mutation rates $Q_{xy}$ .
Using these two steps,$Q_{xy}$ values in $Q_{xy}$ values in HKY85, unless I am mistaken. What do you all think? Any other ideas?
antigen
should be proportional toThe text was updated successfully, but these errors were encountered: