soften monotonicity constraint #70

wsdewitt · 2020-07-08T19:53:50Z

In 1D GE models, a monotonic I-spline is used to map from the latent space to the output space. In a GGE model, the moral equivalent of this is to stipulate each output dimension in y is monotonic in its corresponding latent space dimension in z.

The current monotonicity implementation projects all the weights in g(z) to the non-negative orthant after each gradient step*, which is sufficient for monotonicity, but not necessary, and probably a stronger condition than we want. The downside is that it cannot accommodate tradeoffs: every directional derivative is positive, so there are no directions in feature space that increase binding at the expense of folding (or vice versa).

A weaker condition is to stipulate that the diagonal elements of the Jacobian are non-negative: ∂y_i/∂z_i ≥ 0. This keeps the biophysical interpretation of the latent space dimensions intact, but allows phenotype trade-offs.

It's not immediately obvious to me how to implement this; it's not a simple box constraint. Maybe it could be done as a soft penalty: relu(-∂y_1/∂z_1) + relu(-∂y_2/∂z_2).

*As a side issue, Erick noticed that the projection happens before the gradient step, which is probably a bug:

torchdms/torchdms/analysis.py

Lines 153 to 154 in e0bdb4c

    
                       param.data.clamp_(0) 
        
           optimizer.step()

matsen · 2020-07-08T19:59:23Z

My brain's full right now, but I note that we do have access to gradients during training, so we could muck with them before doing a gradient step.

wsdewitt · 2020-07-08T21:13:11Z

It might be easy to implement this in an architecture where there are distinct g_bind and g_fold networks, as proposed in #53. The intra-g weights could be clamped >= 0, but the sparse inter-g weights could be unconstrained.

Another thing to note is that, even for 1D monotonicity, clamping to positive weights is sufficient but not necessary, and limits expressiveness even within the space of monotonic functions. This paper proposes modeling the derivative of a monotonic function with a non-negative neural network, and using quadrature to evaluate the monotonic function (defined as the network's antiderivative). This seems quite analogous to how monotonic I-splines are defined as integrals in a non-negative M-spline basis.

(As a cute technical detail, you get to use Fenyman's trick for backprop)

matsen · 2020-07-09T00:55:57Z

Yes, you're totally right. That paper looked cool, but more than we need. Remember that we're getting quite nice performance in one dimension with 25 monotonic hardtanh's!

But I like the idea of combining this with #53 the best.

wsdewitt · 2020-07-30T23:15:16Z

Another approach for monotonicity is to simply penalize violations of it, as in nearly isotonic regression. From Hastie, Tibs, and Wainwright:

Note: the underscore "+" notation is the positive part (essentially a relu)

matsen assigned wsdewitt Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

soften monotonicity constraint #70

soften monotonicity constraint #70

wsdewitt commented Jul 8, 2020

matsen commented Jul 8, 2020

wsdewitt commented Jul 8, 2020 •

edited

Loading

matsen commented Jul 9, 2020

wsdewitt commented Jul 30, 2020 •

edited

Loading

soften monotonicity constraint #70

soften monotonicity constraint #70

Comments

wsdewitt commented Jul 8, 2020

matsen commented Jul 8, 2020

wsdewitt commented Jul 8, 2020 • edited Loading

matsen commented Jul 9, 2020

wsdewitt commented Jul 30, 2020 • edited Loading

wsdewitt commented Jul 8, 2020 •

edited

Loading

wsdewitt commented Jul 30, 2020 •

edited

Loading