-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convergence criteria #137
Comments
The current approach doesn't deal with line search in a consistent way, because each call to |
This is great - yes that makes sense to me. luckily we're actually deterministic otherwise.
This is amazing. It'll be until after re submission that I'll be able to do this - would happily assist if you had the bandwidth to PR. |
I'll just note that this update will probably help with some issues I seem to be running into with the spike analysis. I suspect this has to do with the tolerance, learning rate, memory and other things that may easier to control with a custom However, when we run those same models a single round of 100K iterations (everything else the same), we can see some of the models have certainly over-fit to their data. I wonder how exactly the tolerance works with penalties? could it be the case that penalties added to the total cost is effecting the potential for the model to quit early? This could also be related to #133 - as a ridge does seem to again, stabilize things. More testing will be needed here. |
Relevant to this issue, it should be noted that the primary difference between single, and multi-step models optimization is the FISTA acceleration. In the single step models, the learning rate is reset at each step. When acceleration is turned off, these two approaches yield identical results. |
We still need to add some sort of check on the convergence. To do this, we'll add a
I think what we want to do is simply check if the |
This issue is to note that
jaxopt
the package we use for optimizing our model, is merging intooptax
. While this poses no problem for the software as it stands it would certainly be desirable to make the switch once they merge inProximalGradient
.This would solve current problems with the way we do training steps just to get a convergence line. This leads to non-deterministic results (i.e. 1000 iterations for one step != 100 iterations for 10 steps). and future development, more generally.
The text was updated successfully, but these errors were encountered: