-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Start trying to explain convergence failures #22
base: main
Are you sure you want to change the base?
Conversation
@Robbybp Maybe we have high condition number in nn due to: function is not as smooth as the alamo polynomials + higher chance for numerical instability due to number of parameters + overtrained nn provides divergent values for small variations in the input? |
The divergence for Subset 1 occurs at iteration 13 of the full space formulation solve. Highest residual occurs for the bypass_rejoin enthalpy mixing equations. Though I would like to do the sophisticated analysis on making the Jacobian a square matrix, finding its block triangular form, and calculate the condition number for each resulting block to identify the culprit. I'll try to do this. |
Really nice results, thanks. I'm impressed that the ALAMO and implicit function condition numbers track well.
I don't think this is unfortunate or unexpected. It's just interesting. It's interesting that the NN formulation sometimes tracks the full-space, and sometimes doesn't. Can you plot this for a case where full-space, NN, and implicit fail? |
These are the plots for 4 unsuccessful instances. The implicit function failed due to eval errors, that's why the curve for this formulation stops early. Notice now the magnitude of the condition number. Tomorrow I'll get started on block triangularization of the Jacobian. |
Thanks. From these plots, it seems like we jump into a "bad region" of high condition number, then never recover. I wonder if there is some difference between the jump that occurs around iteration 15 for the failing vs. successful cases. |
Hi Robert, I'm running this code:
The output for col_partition (similar output to row_partition) is:
How can I reconstruct the block triangular form of the Jacobin matrix with this information? |
Typically, I would get the condition number of diagonal blocks with the following: results = solver.solve(m, tee=True)
# FIX DEGREES OF FREEDOM HERE!
nlp = PyomoNLP(m)
igraph = IncidenceGraphInterface(m, include_inequality=False)
vblocks, cblocks = igraph.block_triangularize()
submatrices = [
nlp.extract_submatrix_jacobian(vb, cb) for vb, cb in zip(vblocks, cblocks)
]
for i, submat in enumerate(submatrices):
cond = np.linalg.cond(submat.toarray())
# Print or store condition number as necessary Hope this helps. In your particular example, I'm a little confused about why |
Thanks! Is there a way to make the color bar scale logarithmic? |
Can you run the following commands please?
|
No, the code worked. I stopped the full space solve at iteration 50, and had to deactivate the pressure equality to have 898 variables and 898 constraints. I did not have to fix any degree of freedom after stopping the solve. Thank you!! I'll try that code. |
Shouldn't we have 895 constraints and variables after fixing degrees of freedom in the full-space model? |
I see what happened. No, I did not fix any degree of freedom. That's why I have 898 variables in that Jacobian. I had 899 constraints (895 equalities, 4 inequalities), and I mistakenly deactivated the pressure equality to make them match. I forgot that the Jacobian build only needed the 895 equality constraints, which in turn would need 895 variables only, with our 3 DOF fixed. Thanks. |
Just tested the condition number parameter sweep plot with your latest modifications. Looks great! Very good agreement between unsuccessful instances and high condition numbers. |
if args.optimize:
P = 1947379.0
X = 0.94
m = make_optimization_model(X, P, initialize=True)
solver = config.get_optimization_solver(iters=35)
print(X,P)
results = solver.solve(m, tee=True)
m.fs.reformer_bypass.split_fraction[0, "bypass_outlet"].fix(m.fs.reformer_bypass.split_fraction[0, "bypass_outlet"].value)
m.fs.reformer_mix.steam_inlet.flow_mol.fix(m.fs.reformer_mix.steam_inlet.flow_mol[0].value)
m.fs.feed.outlet.flow_mol.fix(m.fs.feed.outlet.flow_mol[0].value)
nlp = PyomoNLP(m)
igraph = IncidenceGraphInterface(m, include_inequality=False)
vblocks, cblocks = igraph.block_triangularize()
for i, (vblock, cblock) in enumerate(zip(vblocks, cblocks)):
submatrix = nlp.extract_submatrix_jacobian(vblock, cblock)
cond = np.linalg.cond(submatrix.toarray())
if cond > 1e10:
print(f"block {i}: {cond}")
for var in vblock:
print(f"Variable: {var.name}")
for con in cblock:
print(f"Constraint: {con.name}") Just did this test for the full space formulation, and got these results:
Are you aware of any formal theorem that might say something like "The condition number of a block matrix is at least the largest condition number of one its blocks"? Or something similar? I wonder if there is formal theory that might help explain these results even more. Also, would applying this procedure be a good method to identify what unit operation we have to replace with surrogates/implicit functions? Simply see at every iteration what block is giving the highest condition number? |
I thought this would be fairly straightforward, but tried for a little bit and didn't come up with anything. This post https://mathoverflow.net/questions/265887/bounding-the-minimum-singular-value-of-a-block-triangular-matrix provides bounds on the minimum singular value, but they involve the off-diagonal block. It looks like, while the determinant of a block triangular matrix can easily be characterized in terms of the diagonal blocks, the singular values cannot. |
To run the full space sweep with condition number calculation do:
To plot the condition number grid do:
|
for key in config.PARAM_SWEEP_KEYS: | ||
if key not in ("X", "P", "Termination"): | ||
for key in df_keys: | ||
if key not in ("X", "P", "Termination", "Condition Number"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In these cases where we hit an error, we probably do want to use INVALID
for the condition number, right? The solution hasn't been loaded into the model, so any condition number we calculate won't be meaningful (it will just be computed at the initial guess). (Also, we're not calculating and appending condition number here, so I think this will raise an error when we try to convert to data frame.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this Robert, I made the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging this PR, I'd like to separate data generation from plotting. I'm imagining a script like generate_iterate_data.py
that runs the optimization for a specified instance or subset of instances, and writes out a CSV file for each instance with all the information we need from each iterate. Then we can have a separate script like plot_iterate_data.py
with options for what to plot.
for iters in range(1, full_subset[2] + 1): | ||
full_list.append(full(X=full_subset[0], P=full_subset[1], iters=iters)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to re-run the optimization to compute condition number each iteration. We can use ConditioningCallback
to make this easier:
solver = config.get_optimization_solver()
from svi.cyipopt import ConditioningCallback
# Set an intermediate callback that calculates condition number each iteration
callback = ConditioningCallback()
solver.config.intermediate_callback = callback
solver.solve(model, tee=True)
full_list = callback.condition_numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Thank you, I was modifying this script to include the ConditioningCallback, I'll push it soon. I did not include it in run_fullspace_sweep.py
because it took too long to run the 64 instances, but I'll include it in condition_num_plots.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as personal reminder of our conversation, an example would be to plot iterations vs constraint residual of sum of mole fractions in reformer recuperator.
This is a fairly significant change to this PR, so I'm not in a hurry to get this merged. If I were you, I'd focus on constructing the KKT matrix and collecting useful information from it for now. |
To get condition number vs iteration plots: For unsuccessful instances: |
Sounds good. |
…ts that take a --model argument
…ion to include condition numbers in iterate data
@@ -0,0 +1,194 @@ | |||
# ___________________________________________________________________________ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this script is superseded by generate_iterate_data.py
and plot_iterate_condition_numbers.py
.
@@ -0,0 +1,85 @@ | |||
# ___________________________________________________________________________ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script has not been superseded by any of my (more) recent commits to this PR.
@@ -0,0 +1,143 @@ | |||
# ___________________________________________________________________________ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script has not been superseded by any of my (more) recent commits to this PR.
I initially intended to compute condition numbers of the Jacobian and KKT matrix, but I got distracted by computing errors in surrogate models. With this branch, we can produce a new "surrogate error" file with:
This option computes the maximum relative error between a simulation of the surrogate model and a simulation of full-space reactor model at the point where optimization problem stops (whether or not it converged successfully). Using this option, I can get the following plots:
The transparent blocks in these grids (e.g. outside the training region with the NN) correspond to cases where either the full-space reactor model or the reactor surrogate fail to simulate. These would be worth investigating further.