Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added generalized gamma distribution #59
base: main
Are you sure you want to change the base?
added generalized gamma distribution #59
Changes from 6 commits
5943ada
fda72ee
0b50d88
76c23e5
7bedd6f
6532541
4aa1ade
50ca01f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with the math, but I am concerned the logcdf function may have some numerical instability. I attempted to fit a model to simulated data with the
pm.Censored
API and it was a bit slow, and had a few divergences, but it did correctly identify the true parameters(I tested this with the latest aeppl fix for the
pm.Censored
API. code is in the last section of this notebook).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see some obvious reparametrization. Unless there is some loggammaincc function we could use?
If you are concerned I would suggest you do a grid evaluation of the logcdf and evaluate against an arbitrary precision software like Mathematica.
That will confirm or dispell your concerns of stability, and then we can try to look for an alternative. It could also be the instability is not in the function itself but its gradients.
Do you have a minimum example where you found the sampling issues that you can share?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Mathematica a paid software? I had trouble finding anything to test against that would guarantee numerical precision, I'm pretty new to this
here's a minimum (kind of) example where there are sampling issues (very slow sampling. The distribution is quick when not using the
pm.Censored
api)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using the main version of Aeppl? We haven't done a new release yet, but there was a recent patch that makes the gradients of censored logps more stable: aesara-devs/aeppl#156
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have jumbled the names. In the past I've used https://www.wolframcloud.com/ with a free account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note is that everything is fine and samples quickly without censored data and without the
censored
api involved (i.e. just regular fitting on non-censored data)Also, sampling might've actually gotten slower with less censoring (when using the
censored
api)? I didnt sample to completion, I just looked at the initial guess after 1 minute of samplingFor testing other distributions:
The fact that the fitting gets slower with less censoring seems really off to me, and also surprised to see the Gamma distribution suffers from similar issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: just looked at the log-cdf of the gamma distribution and its implemented very similar to how I implemented the generalized gamma, so makes sense its slow - I guess this points to the gammainc_der op being an issue more than the parameterization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's quite possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that fitting with the distribution works well without the Censored API and the regular Gamma distribution has the same problem in PyMC anyway, is it fair to call this ok and proceed with the PR (once tests are passing) since its experimental anyway?
It'd atleast be a good starting point to be improved upon, and the current parameterization I chose is fairly interpretable, and I still have some use cases for it atleast (if it ever plays nicely with pm.censored I'll have a ton of use cases for it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely! I'll try to review it again soon.