Convergence criteria #137

jgallowa07 · 2024-02-28T02:08:38Z

This issue is to note that jaxopt the package we use for optimizing our model, is merging into optax. While this poses no problem for the software as it stands it would certainly be desirable to make the switch once they merge in ProximalGradient.

This would solve current problems with the way we do training steps just to get a convergence line. This leads to non-deterministic results (i.e. 1000 iterations for one step != 100 iterations for 10 steps). and future development, more generally.

The text was updated successfully, but these errors were encountered:

wsdewitt · 2024-02-28T23:16:16Z

The current approach doesn't deal with line search in a consistent way, because each call to run will initialize the step size. I wouldn't call this nondeterministic, but it is a deterministic function of how we partition iterations into calls to run. It would be preferable for our API to interface with the update method of ProximalGradient, rather than the run method, so we can record loss trajectories in a consistent way. We could define our own run command that iterates calls to update until state.error = tol or state.iteration = maxiter, and outputs loss trajectory data from each iterate.

jgallowa07 · 2024-03-01T00:42:47Z

I wouldn't call this nondeterministic, but it is a deterministic function of how we partition iterations into calls to run

This is great - yes that makes sense to me. luckily we're actually deterministic otherwise.

We could define our own run command that iterates calls to update until state.error = tol or state.iteration = maxiter, and outputs loss trajectory data from each iterate.

This is amazing. It'll be until after re submission that I'll be able to do this - would happily assist if you had the bandwidth to PR.

jgallowa07 · 2024-03-05T14:59:47Z

I'll just note that this update will probably help with some issues I seem to be running into with the spike analysis.

I suspect this has to do with the tolerance, learning rate, memory and other things that may easier to control with a custom update() loop - but It seems the model fits are not so robust after many training iterations. For context, we run the spike models for 30 independent rounds of 1000 iterations each (30K, total). These have the default tolerance set to 1e-4.

However, when we run those same models a single round of 100K iterations (everything else the same),

we can see some of the models have certainly over-fit to their data. I wonder how exactly the tolerance works with penalties? could it be the case that penalties added to the total cost is effecting the potential for the model to quit early?

This could also be related to #133 - as a ridge does seem to again, stabilize things. More testing will be needed here.

jgallowa07 · 2024-03-19T20:11:47Z

Relevant to this issue, it should be noted that the primary difference between single, and multi-step models optimization is the FISTA acceleration. In the single step models, the learning rate is reset at each step. When acceleration is turned off, these two approaches yield identical results.

jgallowa07 · 2024-03-19T20:30:39Z

We still need to add some sort of check on the convergence. To do this, we'll add a state property to the Model object. State, from jaxopt gives the following properties:

class ProxGradState(NamedTuple):
  """Named tuple containing state information."""
  iter_num: int
  stepsize: float
  error: float
  aux: Optional[Any] = None
  velocity: Optional[Any] = None
  t: float = 1.0

I think what we want to do is simply check if the iter_num is less than the max iterations requested .... This will tell us if the condition has been met and the model exited upon meeting the specified tolerance threshold

jgallowa07 added the enhancement New feature or request label Feb 28, 2024

jgallowa07 changed the title ~~jaxopt -> optax~~ Convergence criteria Mar 13, 2024

jgallowa07 mentioned this issue Mar 19, 2024

Constraining the scale of the non-linearity. #143

Closed

jgallowa07 mentioned this issue Mar 20, 2024

Added optional fista acceleration, and convergence property to models #144

Merged

3 tasks

jgallowa07 closed this as completed Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convergence criteria #137

Convergence criteria #137

jgallowa07 commented Feb 28, 2024

wsdewitt commented Feb 28, 2024 •

edited

Loading

jgallowa07 commented Mar 1, 2024

jgallowa07 commented Mar 5, 2024

jgallowa07 commented Mar 19, 2024

jgallowa07 commented Mar 19, 2024

Convergence criteria #137

Convergence criteria #137

Comments

jgallowa07 commented Feb 28, 2024

wsdewitt commented Feb 28, 2024 • edited Loading

jgallowa07 commented Mar 1, 2024

jgallowa07 commented Mar 5, 2024

jgallowa07 commented Mar 19, 2024

jgallowa07 commented Mar 19, 2024

wsdewitt commented Feb 28, 2024 •

edited

Loading