-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple rhs ma57 #58
Conversation
int *iwork = new int[n]; | ||
double *work = new double[NRHS*n]; | ||
int lwork=NRHS*N; | ||
FNAME(ma57cd)( &job, &n, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code was doing one rhs at the time to make make sure the solve is with iterative solve.
When doing multiple rhs, ma57cd does not perform iterative refinement, or at least this was the case a couple of years ago. Please confirm this has been changed and iterative refinement is performed on multiple rhs. Otherwise the accuracy of PIPS' linear solves will be severely affected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly you might be able to do the iterative refinement outside of the ma57 in a multiple right hand side manner. I think you just need to compute the multiple right hand side residual with a multiple right hand side multiply (which will also be faster than single ones). Compute the norms of all of the residuals and then do a (possibly smaller) multiple right hand side solve on the columns selected for a second solve. The one part I don't understand is if the multiply is done somehow with extended precision inside of ma57 but could not be done this way outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we could/should do it outside. ma57 manual does not say anything about extended precision. this would be the short story. The long story is that that part of the code is responsible for the majority of the execution time (and I believe for the numerical issues on large problems now that I see that in PIPS-NLP there is no iterative solve done). In PIPS-IPM we've managed to drastically reduce execution time with https://epubs.siam.org/doi/10.1137/130908737, which essentially exploits the sparsity of these multiple rhs -- the accuracy was not bad at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cnpetra I do iterative refinement (IR) outside. That is inspired by IPOPT, which also uses ma57cd and do IR outside. By doing this, the key point is that we can do IR based on the full KKT residual, instead of the residual of the augmented system. Currently PIPS-NLP has an option to choose which criteria to use (or use both). Note that I added this outside IR after I run the benchmark with CUTEr tests. Doing IR on the full system helps the convergence, but, probably it helps only 5 problems out of 900 problems.
I noticed that in my current implementation, I only do IR for the KKT system which comes with one rhs, I didn't do IR when we solve multiple rhs to build the Schur complement, i.e., in void sLinsys::addTermToDenseSchurCompl(sData *prob, DenseSymMatrix& SC)
I agree with you and @BarrySmith that we should add this when we build the Schur complement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SC linear algebra is a different animal than Ipopt/pipsnlpserial. Without ir for multiple rhs, the solve for the correction step in your outer IR is done with a matrix that can have potentially large errors in its components, which makes me doubt that the outer ir is as effective as it should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just realized that we didn't do IR for builing SC when Michel suggested to solving multiple rhs by MA57. I totally agree with you and we should have IR when buillding SC. This probably explains that why pipsnlp ends up getting slightly different steps for large problem with different number of processes..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glad we're on the same page
I guess you mean ma57dd, which uses iterative refinement and only supports
single rhs. Ma57cd doesn't do iterative refinement and hence solve multiple
rhs in a single call can be much faster.
BTW, I believe that I did add a similar correction step in pipsnlp, for
correcting the residual of augmented system, or correcting the step based
on the residual of KKT system ( suggested by IPOPT)
NY
…On Thu, Aug 15, 2019, 6:06 PM Cosmin G Petra ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In PIPS-NLP/Core/LinearSolvers/Ma57Solver/Ma57Solver.C
<#58 (comment)>
:
> @@ -410,10 +408,22 @@ void Ma57Solver::solve(GenMatrix& rhs_in)
assert(n==N);
// we need checks on the residuals, can't do that with multiple RHS
- for (int i = 0; i < NRHS; i++) {
- SimpleVector v(rhs[i],N);
- solve(v);
- }
+
+ double *drhs = rhs.elements();
+ //std::cout << "[MA57] N: " << N << " NRHS: " << NRHS << std::endl;
+ int job = 1; // Solve using A
+ int one = 1;
+
+ int *iwork = new int[n];
+ double *work = new double[NRHS*n];
+ int lwork=NRHS*N;
+ FNAME(ma57cd)( &job, &n,
the code was doing one rhs at the time to make make sure the solve is with
iterative solve.
When doing multiple rhs, ma57cd does not iterative refinement, or at least
this was the case a couple of years ago. Please confirm this has been
changed and iterative refinement is performed on multiple rhs. Otherwise
the accuracy of PIPS' linear solves will be severely affected
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#58?email_source=notifications&email_token=ACNGUPG77HIE7X5MOHL5MATQEXHPHA5CNFSM4IMBQBV2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCBXMFCA#pullrequestreview-275694216>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACNGUPBUUZSBYRAW5275LXLQEXHPHANCNFSM4IMBQBVQ>
.
|
yes @nychiang , I stand corrected, ma57dd. I was confused by the fact that PIPS-NLP's MA57Solver class does not seem to call ma57dd, while I remember exactly that it is in PIPS-IPM and helping a lot in terms of numerical stability in the linear solves. Am I missing something? Any idea why iterative refinement is not in PIPS-NLP? The lack of it would explain some of the numerical issues #40 |
@michel2323 I really think we should address #59 in this PR |
Letting the pr go through— can’t do any more harm |
This and using
ifort
for MA57 considerably speeds up the backsolve. On Theta it is 40%, on other CPUs even more.