-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
soften monotonicity constraint #70
Comments
My brain's full right now, but I note that we do have access to gradients during training, so we could muck with them before doing a gradient step. |
It might be easy to implement this in an architecture where there are distinct g_bind and g_fold networks, as proposed in #53. The intra-g weights could be clamped >= 0, but the sparse inter-g weights could be unconstrained. Another thing to note is that, even for 1D monotonicity, clamping to positive weights is sufficient but not necessary, and limits expressiveness even within the space of monotonic functions. This paper proposes modeling the derivative of a monotonic function with a non-negative neural network, and using quadrature to evaluate the monotonic function (defined as the network's antiderivative). This seems quite analogous to how monotonic I-splines are defined as integrals in a non-negative M-spline basis. (As a cute technical detail, you get to use Fenyman's trick for backprop) |
Yes, you're totally right. That paper looked cool, but more than we need. Remember that we're getting quite nice performance in one dimension with 25 monotonic hardtanh's! But I like the idea of combining this with #53 the best. |
In 1D GE models, a monotonic I-spline is used to map from the latent space to the output space. In a GGE model, the moral equivalent of this is to stipulate each output dimension in y is monotonic in its corresponding latent space dimension in z.
The current monotonicity implementation projects all the weights in g(z) to the non-negative orthant after each gradient step*, which is sufficient for monotonicity, but not necessary, and probably a stronger condition than we want. The downside is that it cannot accommodate tradeoffs: every directional derivative is positive, so there are no directions in feature space that increase binding at the expense of folding (or vice versa).
A weaker condition is to stipulate that the diagonal elements of the Jacobian are non-negative: ∂y_i/∂z_i ≥ 0. This keeps the biophysical interpretation of the latent space dimensions intact, but allows phenotype trade-offs.
It's not immediately obvious to me how to implement this; it's not a simple box constraint. Maybe it could be done as a soft penalty: relu(-∂y_1/∂z_1) + relu(-∂y_2/∂z_2).
*As a side issue, Erick noticed that the projection happens before the gradient step, which is probably a bug:
torchdms/torchdms/analysis.py
Lines 153 to 154 in e0bdb4c
The text was updated successfully, but these errors were encountered: