-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on the calculation of ranks #57
Comments
I'm guessing this does random resolution of ties, and that's fine. I just ran into an example where this leads to essentially random rank histograms though, but we can talk about that later. |
I guess this is related to breaking ties for ranks. This is especially crucial in discrete parameters. See : https://github.com/hyunjimoon/SBC/wiki/SBC-FAQ#rank-smoothing |
@hyunjimoon is right. Also appears you are working with a somewhat old version of the code - the function (and especially the tie breaking) is documented in more recent versions. Could you be more specific about the use case where this causes problems? My understanding was that the tie-breaking is pretty safe as ties just imply lack of information on ordering (and hence that randomizing cannot hurt), but I might easily be mistaken. Would the other approach on tie-braking (link from the FAQ) make more sense for your use case? Additionally, I just finished a vignette on how to connect new algorithms into the SBC package framework (https://hyunjimoon.github.io/SBC/articles/implementing_backends.html) Maybe trees require some additional support that is currently hard to achieve, but wanted to show, that it IMHO should not be impossible for you to work completely within the framework of this package (if you want to). |
Weird. I just installed from source and the code in my machine remains the same. But you are right that the version in this repo has more information
Sure, but I wouldn't call it 'causing problems' so much as 'I'm not sure I understand what this means' The situation is that I have a discrete functional which has very limited variation in the MCMC draws This ends up leading to loads of ties when one computes the ranks. I attach a file that contains 100 runs with the posterior draws and the simulated (prior) draws of the Robinson-Foulds distance to an anchor tree, which is the functional in question. To be clear, I think this is evidence that RF is a poor metric for SBC. But it might be interesting for you to have a look and think about what this means for discrete functionals in general. We can continue this discussion via email/Discourse too, if you want. |
Completely agree - since the RF in this case is almost a binary variable, it is close to the least informative variable one can use (not only for SBC). I think there are potential improvements for SBC with discrete variables if you can either:
Because if both hold, then you could probably use the probabilities for individual categories as continuous values for SBC and thus increase information content. And I think one can do something useful even if just one is true. Both 1) and 2) commonly hold when you marginalize out discrete variables in Stan programs, but from quickly skimming the Wiki page for RF distance, I would be surprised if you could get 2)... Maybe 1) one would be possible with a carefully chosen base tree? In any case, this is just an unconfirmed hunch and I didn't test the ideas actually work better than ranks in practice. Thinking more thoroughly about this and doing some experiments is on my SBC TODO list, but I admit it is currently in a relatively low position... If you don't get both probabilities as continous, but the range of the discrete outcomes is bounded, one idea is that one could use something like the chi-squared test to check that prior and all the posteriors bundled together are the same. I have no idea/data whether this would have more "power" than the rank approach. Another idea would be to convert the discrete values into numerical probabilities + uncertainty for each fit and somehow aggregate those and compare to the prior. @TeemuSailynoja might have more to say about this, but not sure if he's working on discrete variables as well. |
This is why I've been interested in metrics that compare the prior and posterior samples as a whole. Much information gets lost in rank. Another way could be approximating parameters of interest with continuous hyperparameters (e.g. gaussian mixture) and comparing ranks. I wonder whether hyperparameters' rank can substitute ranks of a parameter in this case. This approach is analogized as comparing p instead of y for y ~ Bin(n, p); compare mu1,2 instead of theta if theta is discrete for mu ~ Gaussian mixture (mu1, mu2, sigma). This pr is pushing forward in this aspect (especially this file). I welcome all forms of feedback and collaboration in here/discourse/mail! |
I'm adapting some of your (awesome!) code for use with trees and I stumbled upon something I don't quite understand. What is
doing, exactly?
This seems to imply that ranks would be random, which I don't undestand.
The text was updated successfully, but these errors were encountered: