-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Samples as a parameterization #170
Comments
Sorry. It's not clear to me what you want here. Specifically are you thinking of storing the samples as the persistent representation? Sure, you can do that, but then you need to specify how to extract the pdf() and cdf() from the samples, i.e., by doing something like a kernel density estimate. But if you are going to do that, why not sure the kde parameters as the persistent representation. So, probably we want to clarify a couple of things here: So, what we need to decide to implement this is
|
This was discussed in a recent RAIL TT tag-up, and the conclusion was that there could be reconstruction options like there are for quantiles. One possibility is to go all-in on it being essentially a discrete distribution, which, though not very attractive, is trivially self-consistent and would meet the needs of real estimators that output samples (e.g. SOM). Another is to make a KDE (with a bandwidth determined by Scott's Rule or another algorithm) that corresponds to what more users will expect when interpreting/propagating the samples from such a method, though it might be tricky to make self-consistent as Eric noted above; probably it would be safer to consider a KDE parameterization to be distinct from a samples parameterization. |
This is a duplicate with #33, but the conversation there was more stale, so I'll keep this issue open and close that one. |
We also require the samples parameterization in one of the Bayesian Pipelines Topical Team cosmology projects: LSSTDESC/bayesian-pipelines-cosmology#5 |
Sometimes the distribution is really defined by a set of samples, which especially changes how the CDF/PPF would be calculated. It's also relevant to converting to many other distributions that could logically be instantiated by fitting to samples but currently don't do so (spline being a notable exception).
Once this is ready, it should be immediately propagated to the PIT metric, RAIL's trainZ estimator, and the SOM summarizer/estimator/classifier, among others.
EDIT: Another important application of this is mentioned in #180:
The text was updated successfully, but these errors were encountered: