Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subspace Freq domain statistic #92

Merged
merged 16 commits into from
Jun 15, 2017
Merged

Subspace Freq domain statistic #92

merged 16 commits into from
Jun 15, 2017

Conversation

cjhopp
Copy link
Member

@cjhopp cjhopp commented Apr 28, 2017

Rebase seemed to work....maybe.

Your original suggestion of reading in the test detectors, renaming them, and then re-writing them should work with some minor transposing of matrices. I'll push those in a bit.

@calum-chamberlain calum-chamberlain added this to the 1.1.0 milestone Jun 2, 2017
@cjhopp cjhopp changed the title svd() changes and new subspace_fc_plot() function Subspace Freq domain statistic Jun 8, 2017
@cjhopp
Copy link
Member Author

cjhopp commented Jun 8, 2017

Hey @calum-chamberlain, ever seen the current Travis fail for 3.5 on Linux? Some strange issue with an invalid resize at the end of channel_loop....Did I break it???

@calum-chamberlain
Copy link
Member

@cjhopp No I have not... Curious, going to try and re-run, and will try the tests locally also.
@d-chambers have you had a look at this? We were struggling to reason why @cjhopp had to decimate not from the zeroth index of the detection statistic in the frequency domain - it kind of makes sense, but I can't describe it - any insight or recommendations?

@cjhopp
Copy link
Member Author

cjhopp commented Jun 8, 2017

To tag on to that a bit, @d-chambers, I'm curious if you've been able to implement the aliasing in the frequency domain described by Harris (and, if so, were the time savings worth the trouble?). We've had a crack at it, but it isn't producing the correct results.

Dear Panda, No hard feelings. -Chet

@d-chambers
Copy link
Collaborator

hey @cjhopp, in detex I didn't implement the freq. domain decimation. I think when I got to that point I was in a hurry to graduate so I just had to get the code working.

Happy to take a look at your implementation though. Will try to add a few comments when I get a chance.

@d-chambers
Copy link
Collaborator

@cjhopp, where are you doing frequency domain decimation? I think I am missing it.

@cjhopp
Copy link
Member Author

cjhopp commented Jun 9, 2017

@d-chambers Sorry, I should have been more clear with that question! I didn't end up putting it in this PR as we couldn't make sense of it. From my point of view, the only extra step would be adding something like this: num_cor.reshape(num_cor.shape[0], Nc, M).sum(axis=1) here where M should be len(template) + len(data) - 1.

Hopefully, I can get another crack at it next week, but if you have any thoughts about how best to do it, I'm all ears.

@cjhopp
Copy link
Member Author

cjhopp commented Jun 9, 2017

This was as far as the implementation got.

def do_ffts(detector, stream, Nc):
min_fftlen = int(stream[0][0].data.shape[0] +
detector.data[0].shape[0] - Nc)
fftlen = 1 << min_fftlen.bit_length()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is optimal anymore... I know scipy doesn't require a power of 2 length (although the docs do say it is optimal I suspect it is no longer the case). Consider this (note I am using Anaconda which uses the MLK ):

import numpy as np
import scipy.fftpack

ar_len = 3600 * 400 + 12  # a resonable size for a seismic trace
nearest_2 = 1 << (ar_len - 1).bit_length()
nearest_fast = scipy.fftpack.next_fast_len(ar_len)

%time np.fft.fft(ar, n=nearest_2)  # takes 386 ms
%time np.fft.fft(ar, n=nearest_fast)  # takes 228 ms

Moreover, if you are only expecting real inputs (A fair assumption in seismology no?) consider using rfft:

%time np.fft.rfft(ar, n=nearest_2)  # takes 200 ms
%time np.fft.rfft(ar, n=nearest_fast)  # takes 123 ms

Using rfft with next_fast_length would give nearly a 4x speed up (similar on ifft)

@@ -573,6 +581,39 @@ def _detect(detector, st, threshold, trig_int, moveout=0, min_trig=0,
return detections


def do_ffts(detector, stream, Nc):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a private method? If so it should start with an underscore, if not probably needs a docstring. I think a good rule is to add docstring to all functions. For private functions/methods a simple sentence or two that explains what it does is fine, but all public methods should have fancy formatted docstrings sphinx can use.

@d-chambers
Copy link
Collaborator

Thanks for pointing that out, I am fairly new to a lot of the github functionality so the extra help is appreciated.

I left you two general comments in the review.

Interesting frequency domain normalization here-ish. Did you profile against time domain implementations (with bottleneck moving variance for example?) or maybe this issue makes it better to avoid bottleneck so float 32 can be used?

Regarding the frequency domain decimation; I haven't been able to try it out yet. I have a deadline today and I am on vacation next week, so it may be awhile before I can really dig into it. it is probably worth going ahead without it, we can also come back to it (it shouldn't affect any APIs). Might be a reshape issue? Maybe the order should be 'f' rather than "c"?

I sent @calum-chamberlain a notebook awhile ago with my current thoughts on the subspace implementation and general UX concepts I am partial to, I will send it to you as well. I like the idea of using xarray to store all data then using broadcasting to avoid python loops. I am currently trying to get a prototype working. I am not sure how hard it would be to implement such a data structure in eqcorrscan, I think calum is exploring that space.

@cjhopp
Copy link
Member Author

cjhopp commented Jun 11, 2017

@d-chambers Extremely useful comments, thanks! do_ffts() should definitely be private and I wasn't aware that the ffts were no longer sensitive to power-of-2 lengths.

The frequency domain normalization you reference was really just an attempt to recreate Harris' approach literally. I haven't profiled a time domain approach but I don't have any attachment to the way it is currently so I'll give bottleneck a shot tomorrow.

Re: the frequency decimation, I hadn't considered the array order, so it may very well be a C/Fortran order issue. I'll give that a try. I agree that it isn't imperative right away, though.

Also, thanks for sending over your notebook. I'll give it a closer read first thing in the morning. Calum is close to pushing some internal changes involving xarray, I believe.

Hope you have a nice bit of vacay this week!

@calum-chamberlain
Copy link
Member

calum-chamberlain commented Jun 11, 2017 via email

@cjhopp
Copy link
Member Author

cjhopp commented Jun 11, 2017

Benchmarks for _do_ffts()

# Next power of 2
%timeit _do_ffts(detector, stream, Nc)
1 loop, best of 3: 1.06 s per loop 
# next_fast_len()
%timeit _do_ffts(detector, stream, Nc)
1 loop, best of 3: 388 ms per loop
# next_fast_len() and rffts
%timeit _do_ffts(detector, stream, Nc)
1 loop, best of 3: 202 ms per loop

Then, for the detection statistic calculation:

# Full ifft
%timeit _det_stat_freq(det_freq, data_freq_sq, data_freq, fft_vars[3], Nc, fft_vars[4], fft_vars[5])
1 loop, best of 3: 837 ms per loop
# Using rfft
%timeit _det_stat_freq(det_freq, data_freq_sq, data_freq, fft_vars[3], Nc, fft_vars[4], fft_vars[5])
1 loop, best of 3: 180 ms per loop

Pushing changes now. Thanks for the feedback, @d-chambers!

Copy link
Member

@calum-chamberlain calum-chamberlain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good, two things to change then good to go IMHO

else:
inc = np.uint32(1)
# Stats must be same size for multiplexed or non multiplexed!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintain indent for comment here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment removed in some recent commit.

@@ -586,6 +582,41 @@ def _detect(detector, st, threshold, trig_int, moveout=0, min_trig=0,
return detections


def _do_ffts(detector, stream, Nc):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a doc-string for this please? I like to have them even for internal functions so we know what's going on (I probably don't have them for everything though!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On it.

@calum-chamberlain
Copy link
Member

Thanks @cjhopp - worth just merging it seeing as no changes made would have affected the code (all doc-strings). Press the button and update #100!

@cjhopp cjhopp merged commit d074481 into develop Jun 15, 2017
@cjhopp cjhopp deleted the cjhopp-svd-patch branch June 23, 2017 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants