-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW: Use properties of linear operators to speed up linesearch #61
NEW: Use properties of linear operators to speed up linesearch #61
Conversation
Hello @carterbox! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-05-22 01:26:06 UTC |
@carterbox, first we need to handle #52, i.e. separate grad and cost functions from operators. Then it becomes clear that different line search implementations can also be separated. Instead of giving them the cost function, we can send the operator. We may have line_search(fwd,...), line_search_sqr(fwd,...), etc. At the beginning of such line search function we precompute temporarily variable, like p1=fwd(psi), p2=fwd(dpsi),...etc. and then use them to find direction |
@nikitinvv I don't understand your idea fully. The cost and gradient are paired, but the operator and gradient are not? So it doesn't make sense to provide an operator and gradient. Please provide a full example API of what you are proposing. |
@carterbox ok my point is that the line search should be defined at the same place as the cost and grad functions because its optimal calculation depends on the cost function and operators. So there are two ways. Either you have the line search as a method in each operator (like it is now with cost and grad) or have all functions: grad,cost,line search in opt.py where they should take operators as a parameter. |
The problem with moving ptychography cost and grad functions to
Unless you can think of a graceful way to encapsulate these irregularities into the opt interface (?), we should add a I guess this is the kind of irregularity that causes that both scipy.optimize and ODL both use the cost/grad interface (with optional line search). |
@carterbox ok, that sounds reasonable. In my opinion, for consistency with definitions of cost and grad functions, we should have line search functions also as a part of operators? Then cg solver will take cost, grad, and line search functions as parameters. Up to you |
@carterbox Have you observed very often changes in gamma step sizes during iterations? Are they stabilizing after a certain number of iterations, or we have something like 0.5,0.25,0.5, 1e-7,.0.5? If they are not changed much then we can try to avoid the line search |
Yes, this is the approach that seems to make the most sense. I'm just afraid of complexity. 😨
I stopped monitoring this parameter after I thought the gradients were correct. Should this parameter go to the INFO logger or the DEBUG logger? |
For @nikitinvv 's derivation, the linear term of the quadratic is:
But for mine, the linear term is:
I think they are not equivalent and mine is correct. Full derivation here. |
For the catalyst dataset, it seems like the initial step length for the probe is too small because it is always For typical optimization problems we wouldn't need this, but we are constantly switching between many sub-problems. |
sure, I just forgot to write 2,
|
For having normal gamma steps you should find a constant r, s.t. rR^HRf~(of the order)f, i.e. r \approx<R^HRf,f>/<R^HRf,R^HRf>. Then add this r to the gradient step: /gamma rR^H(Rf-g) ~ f, i.e. here gamma should not be too low. Typically, this constant should only depend on data sizes. I always find it experimentally (varying sizes n, ntheta,nscan.. and see dependence). For example, in laminography I have smth like 1/ntheta/n/n. See 'Norm test' in https://github.com/nikitinvv/lamcg/blob/master/tests/test_adjoint.py. I remember I have also played with it in ptychography. |
Waiting for #67 to be completed. |
Closing this because I think tuning the step size so line searches are avoided or minimal is enough. |
Purpose
Related to #60. In this PR, I'm trying to reduce the time spent on the backtracking line search in exchange for memory.
Approach
Our Operators are linear operators e.g.
F(a + c * b) = F(a) + c * F(b)
where c is a scalar, so we can speed up the line search by cachingF(x)
andF(d)
whered
is the search direction andF()
is the function being minimized.General
This might work quite well if the ptychography cost functions easily split into linear and non-linear parts. The cost function for a single mode could be broken this way:
However, when we have probe with multiple-incoherent modes, then life becomes difficult. 😢
Ptycho-specific
We could write a specific line searcher for ptychography by refactoring the intensity sum as follows:
where
a = fwd(x)
,b = fwd(d)
, andc = step
. This allows us to cache three arrays and change the step size without calling the forward operator every time. I guess this is fine because the code is modular.Pre-Merge Checklists
Submitter
yapf
to format python code.Reviewer