NEW: Use properties of linear operators to speed up linesearch #61

carterbox · 2020-05-08T22:31:06Z

Purpose

Related to #60. In this PR, I'm trying to reduce the time spent on the backtracking line search in exchange for memory.

Approach

Our Operators are linear operators e.g. F(a + c * b) = F(a) + c * F(b) where c is a scalar, so we can speed up the line search by caching F(x) and F(d) where d is the search direction and F() is the function being minimized.

General

This might work quite well if the ptychography cost functions easily split into linear and non-linear parts. The cost function for a single mode could be broken this way:

def nonlinear(x):
    return gaussian_cost(data, np.square(np.abs(x)))

def linear(x, ...):
    return fwd(x, ...)

However, when we have probe with multiple-incoherent modes, then life becomes difficult. 😢

def nonlinear(x, ...):
    intensity = 0
    for mode in modes:
        intensity += np.square(np.abs(fwd(x, mode, ...)))    
    return gaussian_cost(data, intensity)

def linear(x):
    return x

Ptycho-specific

We could write a specific line searcher for ptychography by refactoring the intensity sum as follows:

= sum(square(abs(a + cb)))
= sum((a + cb) * conj(a + cb))
= sum(square(abs(a))) + c**2 * sum(square(abs(b))) + sum(c * (conj(a) * b + a * conj(b)))

where a = fwd(x), b = fwd(d), and c = step. This allows us to cache three arrays and change the step size without calling the forward operator every time. I guess this is fine because the code is modular.

Pre-Merge Checklists

Submitter

Write a helpfully descriptive pull request title.
Organize changes into logically grouped commits with descriptive commit messages.
Document all new functions.
Write tests for new functions or explain why they are not needed.
Build the documentation successfully
Use yapf to format python code.

Reviewer

Actually read all of the code.
Run the new code yourself.
Write a summary of the changes as you understand them.
Thank the submitter.

pep8speaks · 2020-05-19T16:13:59Z

Hello @carterbox! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-22 01:26:06 UTC

nikitinvv · 2020-05-19T16:49:11Z

@carterbox, first we need to handle #52, i.e. separate grad and cost functions from operators. Then it becomes clear that different line search implementations can also be separated. Instead of giving them the cost function, we can send the operator. We may have line_search(fwd,...), line_search_sqr(fwd,...), etc. At the beginning of such line search function we precompute temporarily variable, like p1=fwd(psi), p2=fwd(dpsi),...etc. and then use them to find direction

carterbox · 2020-05-19T19:02:35Z

@nikitinvv I don't understand your idea fully. The cost and gradient are paired, but the operator and gradient are not? So it doesn't make sense to provide an operator and gradient. Please provide a full example API of what you are proposing.

nikitinvv · 2020-05-19T19:54:35Z

@carterbox ok my point is that the line search should be defined at the same place as the cost and grad functions because its optimal calculation depends on the cost function and operators. So there are two ways. Either you have the line search as a method in each operator (like it is now with cost and grad) or have all functions: grad,cost,line search in opt.py where they should take operators as a parameter.

carterbox · 2020-05-20T18:02:15Z

The problem with moving ptychography cost and grad functions to tike.opt is that the ptychography forward operator is not the forward model because

The the farplane wave is converted to intensity before it is compared with the data
We are updating each probe mode sequentially instead of simultaneously

Unless you can think of a graceful way to encapsulate these irregularities into the opt interface (?), we should add a linesearch method to the Ptycho operator (if you think it is going to provide speedups of more than 10x).

I guess this is the kind of irregularity that causes that both scipy.optimize and ODL both use the cost/grad interface (with optional line search).

nikitinvv · 2020-05-20T18:15:28Z

@carterbox ok, that sounds reasonable. In my opinion, for consistency with definitions of cost and grad functions, we should have line search functions also as a part of operators? Then cg solver will take cost, grad, and line search functions as parameters. Up to you

nikitinvv · 2020-05-20T18:19:24Z

@carterbox Have you observed very often changes in gamma step sizes during iterations? Are they stabilizing after a certain number of iterations, or we have something like 0.5,0.25,0.5, 1e-7,.0.5? If they are not changed much then we can try to avoid the line search

carterbox · 2020-05-20T19:13:09Z

Then cg solver will take cost, grad, and line search functions as parameters

Yes, this is the approach that seems to make the most sense. I'm just afraid of complexity. 😨

Have you observed very often changes in gamma step sizes during iterations?

I stopped monitoring this parameter after I thought the gradients were correct. Should this parameter go to the INFO logger or the DEBUG logger?

carterbox · 2020-05-22T01:24:43Z

For @nikitinvv 's derivation, the linear term of the quadratic is:

sum_j ( G(x_j).real * G(x_j).real + 2 * G(d_j).imag * G(d_j).imag )

But for mine, the linear term is:

2 * sum_j( G(x_j).real * G(d_j).real + G(x_j).imag * G(d_j).imag )

I think they are not equivalent and mine is correct. Full derivation here.

carterbox · 2020-05-22T02:07:05Z

(tike) bash-4.2$ python tike-recon.py catalyst combined 1 --folder output/line
scan range is (0, 651.836181640625), (0, 1135.3802490234375).
scan positions are (1, 1848, 2), float32
data is (1, 1848, 128, 128), float32
probe is (1, 1, 1, 7, 128, 128), complex64
2020-05-21 20:51:00 INFO combined for 1,848 - 128 by 128 frames for 10 iterations.
2020-05-21 20:51:00 INFO object and probe rescaled by 41.459629
2020-05-21 20:51:01 DEBUG step 0; length = 1.526e-05; cost = 1.869670e+10
2020-05-21 20:51:01 DEBUG step 1; length = 1.526e-05; cost = 1.713484e+10
2020-05-21 20:51:02 DEBUG step 2; length = 3.052e-05; cost = 1.697497e+10
2020-05-21 20:51:03 DEBUG step 3; length = 3.052e-05; cost = 1.697411e+10
2020-05-21 20:51:03 INFO     object cost is +1.69741e+10
2020-05-21 20:51:03 DEBUG step 0; length = 1.000e+00; cost = 5.643800e+09
2020-05-21 20:51:03 DEBUG step 1; length = 1.000e+00; cost = 3.995764e+09
2020-05-21 20:51:03 DEBUG step 2; length = 1.000e+00; cost = 3.782804e+09
2020-05-21 20:51:03 DEBUG step 3; length = 1.000e+00; cost = 3.710062e+09
2020-05-21 20:51:03 INFO      probe cost is +3.71006e+09

For the catalyst dataset, it seems like the initial step length for the probe is too small because it is always 1e0 and for the object, it is too large because it consistently shrinks to 1e-5 (approximately 16 backtracks). It may be worth our time to implement something that automatically adjusts the initial step size for the line search.

For typical optimization problems we wouldn't need this, but we are constantly switching between many sub-problems.

nikitinvv · 2020-05-22T04:00:41Z

For @nikitinvv 's derivation, the linear term of the quadratic is:
sum_j ( G(x_j).real * G(x_j).real + 2 * G(d_j).imag * G(d_j).imag )
But for mine, the linear term is:
2 * sum_j( G(x_j).real * G(d_j).real + G(x_j).imag * G(d_j).imag )
I think they are not equivalent and mine is correct. Full derivation here.

sure, I just forgot to write 2,
from my code:

                    p1 = data*0
                    p2 = data*0
                    p3 = data*0
                    for k in range(probe.shape[1]):
                        tmp1 = self.fwd(psi, scan, probe[:, k])
                        p1 += cp.abs(tmp1)**2
                    tmp1 = self.fwd(psi, scan, probe[:, m])                        
                    tmp2 = self.fwd(psi, scan, dprb[:, m])                                                
                    p2 = cp.abs(tmp2)**2
                    p3 = 2*(tmp1.real*tmp2.real+tmp1.imag*tmp2.imag)

nikitinvv · 2020-05-22T04:28:00Z

(tike) bash-4.2$ python tike-recon.py catalyst combined 1 --folder output/line
scan range is (0, 651.836181640625), (0, 1135.3802490234375).
scan positions are (1, 1848, 2), float32
data is (1, 1848, 128, 128), float32
probe is (1, 1, 1, 7, 128, 128), complex64
2020-05-21 20:51:00 INFO combined for 1,848 - 128 by 128 frames for 10 iterations.
2020-05-21 20:51:00 INFO object and probe rescaled by 41.459629
2020-05-21 20:51:01 DEBUG step 0; length = 1.526e-05; cost = 1.869670e+10
2020-05-21 20:51:01 DEBUG step 1; length = 1.526e-05; cost = 1.713484e+10
2020-05-21 20:51:02 DEBUG step 2; length = 3.052e-05; cost = 1.697497e+10
2020-05-21 20:51:03 DEBUG step 3; length = 3.052e-05; cost = 1.697411e+10
2020-05-21 20:51:03 INFO     object cost is +1.69741e+10
2020-05-21 20:51:03 DEBUG step 0; length = 1.000e+00; cost = 5.643800e+09
2020-05-21 20:51:03 DEBUG step 1; length = 1.000e+00; cost = 3.995764e+09
2020-05-21 20:51:03 DEBUG step 2; length = 1.000e+00; cost = 3.782804e+09
2020-05-21 20:51:03 DEBUG step 3; length = 1.000e+00; cost = 3.710062e+09
2020-05-21 20:51:03 INFO      probe cost is +3.71006e+09

For the catalyst dataset, it seems like the initial step length for the probe is too small because it is always 1e0 and for the object, it is too large because it consistently shrinks to 1e-5 (approximately 16 backtracks). It may be worth our time to implement something that automatically adjusts the initial step size for the line search.

For typical optimization problems we wouldn't need this, but we are constantly switching between many sub-problems.

For having normal gamma steps you should find a constant r, s.t. rR^HRf~(of the order)f, i.e. r \approx<R^HRf,f>/<R^HRf,R^HRf>. Then add this r to the gradient step: /gamma rR^H(Rf-g) ~ f, i.e. here gamma should not be too low. Typically, this constant should only depend on data sizes. I always find it experimentally (varying sizes n, ntheta,nscan.. and see dependence). For example, in laminography I have smth like 1/ntheta/n/n. See 'Norm test' in https://github.com/nikitinvv/lamcg/blob/master/tests/test_adjoint.py. I remember I have also played with it in ptychography.

carterbox · 2020-06-02T19:34:10Z

Waiting for #67 to be completed.

carterbox · 2021-03-03T19:45:35Z

Closing this because I think tuning the step size so line searches are avoided or minimal is enough.

carterbox and others added 3 commits May 8, 2020 15:06

NEW: Reduce linesearch computation by caching linear parts

a3f7666

BUG: Need parenthesis for correct syntax

f36fa79

Add line search for square functions

df5ccf8

nikitinvv closed this May 19, 2020

nikitinvv reopened this May 19, 2020

REF: Update line_search_sqr

9614b14

DOC: Update conjugate gradient debug logger

381690e

carterbox mentioned this pull request Jun 1, 2020

REF: Change initial step sizes for line search #71

Merged

10 tasks

carterbox closed this Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEW: Use properties of linear operators to speed up linesearch #61

NEW: Use properties of linear operators to speed up linesearch #61

carterbox commented May 8, 2020

pep8speaks commented May 19, 2020 •

edited

Loading

nikitinvv commented May 19, 2020 •

edited

Loading

carterbox commented May 19, 2020

nikitinvv commented May 19, 2020

carterbox commented May 20, 2020

nikitinvv commented May 20, 2020

nikitinvv commented May 20, 2020

carterbox commented May 20, 2020 •

edited

Loading

carterbox commented May 22, 2020

carterbox commented May 22, 2020 •

edited

Loading

nikitinvv commented May 22, 2020

nikitinvv commented May 22, 2020 •

edited

Loading

carterbox commented Jun 2, 2020

carterbox commented Mar 3, 2021

NEW: Use properties of linear operators to speed up linesearch #61

NEW: Use properties of linear operators to speed up linesearch #61

Conversation

carterbox commented May 8, 2020

Purpose

Approach

General

Ptycho-specific

Pre-Merge Checklists

Submitter

Reviewer

pep8speaks commented May 19, 2020 • edited Loading

Comment last updated at 2020-05-22 01:26:06 UTC

nikitinvv commented May 19, 2020 • edited Loading

carterbox commented May 19, 2020

nikitinvv commented May 19, 2020

carterbox commented May 20, 2020

nikitinvv commented May 20, 2020

nikitinvv commented May 20, 2020

carterbox commented May 20, 2020 • edited Loading

carterbox commented May 22, 2020

carterbox commented May 22, 2020 • edited Loading

nikitinvv commented May 22, 2020

nikitinvv commented May 22, 2020 • edited Loading

carterbox commented Jun 2, 2020

carterbox commented Mar 3, 2021

pep8speaks commented May 19, 2020 •

edited

Loading

nikitinvv commented May 19, 2020 •

edited

Loading

carterbox commented May 20, 2020 •

edited

Loading

carterbox commented May 22, 2020 •

edited

Loading

nikitinvv commented May 22, 2020 •

edited

Loading