Diagonal elements of the transition matrix are non-zero and not-equal #3

ellisonbg · 2017-03-17T04:24:25Z

Hi, I love the overall approach of this work and how you have written it up as a blog post and posted the code on GitHub.

Your treatment sets the diagonal elements of the transition matrix to zero. You got questions about this assumption and replied in the blog comments, but I think you are still missing an important aspect of this.

The (omitted) diagonal components of the transition matrix give the probability of sticking with that language. In a given row, the sum of the non-diagonal components gives the total probability of leaving that language. The point that I want to make is the the total rate at which people are leaving individual languages are not the same. Thus, the omitted diagonal components are not simply a scale times the identity matrix.

The impact that this has on the conclusions is likely dramatic.

The row-wise normalizations you have done are wrt the total number of transitions in that row. For a popular language such as C/C++/Java/Python, those raw numbers represent an extremely small percentage of the overall language population. Thus for those rows, the omitted normalized diagonal element would be $1-\epsilon$, with the other elements of the row summing to a very small number $\epsilon$.

Likewise, for less popular languages such as Go (I am not picking on Go here, I love it as language!) the quoted transition counts would represent a more significant fraction of the existing population. This would cause the diagonal component to be significantly smaller than 1.0 after normalization.

I want to emphasize that I love the application of this type of model to these questions. It would be great to see it applied in a context where both the existing population of users (diagonal components) and all of the transitions where available. Cheers, Brian.

vmarkovtsev · 2017-03-17T07:59:26Z

@ellisonbg What do you think about #4 ?

erikbern · 2017-03-17T13:23:44Z

yeah you could probably improve the analysis – feel free to give it a shot :)

frobnitzem · 2017-03-18T03:22:32Z

I agree with @ellisonbg. The fix is not easy because it requires computing the rates of switching rather than the total number of switches. Say there's a 2x2 matrix:

(stay with Java) (Java -> C)
(C -> Java) (stay with C)

Then, if there were 500 switches from Java to C per year and 5000 Java programmers who were also bloggers that year, then the first row of the rate matrix should be [ 9/10, 1/10 ].
Now, suppose there were 100 switches from C to Java per year and 400 C programmers where were also bloggers that year, then the second row should be [ 1/4, 3/4 ].

The eigenvector of the matrix (just use numpy.eig and look for eigenvalue 1) will then tell you the future language of those bloggers. Different total numbers may distort those calculations. The rates can be estimated by limiting the counts to the past year or two.

One way to "guestimate" the proper diagonal elements could be to assume that the total number of bloggers is proportional to the total number of "I'm leaving this language" posts (with constant P across languages).

For the test data, this would give:

500/(500 + 500P) (500p)/(500 + 500P)
100/(100+100P) 100P/(100 + 100P)

ellisonbg changed the title ~~Diagonal elements of the transition matrix non-zero and not-equal~~ Diagonal elements of the transition matrix are non-zero and not-equal Mar 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diagonal elements of the transition matrix are non-zero and not-equal #3

Diagonal elements of the transition matrix are non-zero and not-equal #3

ellisonbg commented Mar 17, 2017

vmarkovtsev commented Mar 17, 2017

erikbern commented Mar 17, 2017

frobnitzem commented Mar 18, 2017

Diagonal elements of the transition matrix are non-zero and not-equal #3

Diagonal elements of the transition matrix are non-zero and not-equal #3

Comments

ellisonbg commented Mar 17, 2017

vmarkovtsev commented Mar 17, 2017

erikbern commented Mar 17, 2017

frobnitzem commented Mar 18, 2017