-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subspace Freq domain statistic #92
Conversation
merge det_stat_freq into svd-patch
…rrscan into cjhopp-svd-patch
Hey @calum-chamberlain, ever seen the current Travis fail for 3.5 on Linux? Some strange issue with an invalid resize at the end of channel_loop....Did I break it??? |
@cjhopp No I have not... Curious, going to try and re-run, and will try the tests locally also. |
To tag on to that a bit, @d-chambers, I'm curious if you've been able to implement the aliasing in the frequency domain described by Harris (and, if so, were the time savings worth the trouble?). We've had a crack at it, but it isn't producing the correct results. Dear Panda, No hard feelings. -Chet |
hey @cjhopp, in detex I didn't implement the freq. domain decimation. I think when I got to that point I was in a hurry to graduate so I just had to get the code working. Happy to take a look at your implementation though. Will try to add a few comments when I get a chance. |
@cjhopp, where are you doing frequency domain decimation? I think I am missing it. |
@d-chambers Sorry, I should have been more clear with that question! I didn't end up putting it in this PR as we couldn't make sense of it. From my point of view, the only extra step would be adding something like this: Hopefully, I can get another crack at it next week, but if you have any thoughts about how best to do it, I'm all ears. |
This was as far as the implementation got. |
eqcorrscan/core/subspace.py
Outdated
def do_ffts(detector, stream, Nc): | ||
min_fftlen = int(stream[0][0].data.shape[0] + | ||
detector.data[0].shape[0] - Nc) | ||
fftlen = 1 << min_fftlen.bit_length() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is optimal anymore... I know scipy doesn't require a power of 2 length (although the docs do say it is optimal I suspect it is no longer the case). Consider this (note I am using Anaconda which uses the MLK ):
import numpy as np
import scipy.fftpack
ar_len = 3600 * 400 + 12 # a resonable size for a seismic trace
nearest_2 = 1 << (ar_len - 1).bit_length()
nearest_fast = scipy.fftpack.next_fast_len(ar_len)
%time np.fft.fft(ar, n=nearest_2) # takes 386 ms
%time np.fft.fft(ar, n=nearest_fast) # takes 228 ms
Moreover, if you are only expecting real inputs (A fair assumption in seismology no?) consider using rfft:
%time np.fft.rfft(ar, n=nearest_2) # takes 200 ms
%time np.fft.rfft(ar, n=nearest_fast) # takes 123 ms
Using rfft with next_fast_length would give nearly a 4x speed up (similar on ifft)
eqcorrscan/core/subspace.py
Outdated
@@ -573,6 +581,39 @@ def _detect(detector, st, threshold, trig_int, moveout=0, min_trig=0, | |||
return detections | |||
|
|||
|
|||
def do_ffts(detector, stream, Nc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a private method? If so it should start with an underscore, if not probably needs a docstring. I think a good rule is to add docstring to all functions. For private functions/methods a simple sentence or two that explains what it does is fine, but all public methods should have fancy formatted docstrings sphinx can use.
Thanks for pointing that out, I am fairly new to a lot of the github functionality so the extra help is appreciated. I left you two general comments in the review. Interesting frequency domain normalization here-ish. Did you profile against time domain implementations (with bottleneck moving variance for example?) or maybe this issue makes it better to avoid bottleneck so float 32 can be used? Regarding the frequency domain decimation; I haven't been able to try it out yet. I have a deadline today and I am on vacation next week, so it may be awhile before I can really dig into it. it is probably worth going ahead without it, we can also come back to it (it shouldn't affect any APIs). Might be a reshape issue? Maybe the order should be 'f' rather than "c"? I sent @calum-chamberlain a notebook awhile ago with my current thoughts on the subspace implementation and general UX concepts I am partial to, I will send it to you as well. I like the idea of using xarray to store all data then using broadcasting to avoid python loops. I am currently trying to get a prototype working. I am not sure how hard it would be to implement such a data structure in eqcorrscan, I think calum is exploring that space. |
@d-chambers Extremely useful comments, thanks! do_ffts() should definitely be private and I wasn't aware that the ffts were no longer sensitive to power-of-2 lengths. The frequency domain normalization you reference was really just an attempt to recreate Harris' approach literally. I haven't profiled a time domain approach but I don't have any attachment to the way it is currently so I'll give bottleneck a shot tomorrow. Re: the frequency decimation, I hadn't considered the array order, so it may very well be a C/Fortran order issue. I'll give that a try. I agree that it isn't imperative right away, though. Also, thanks for sending over your notebook. I'll give it a closer read first thing in the morning. Calum is close to pushing some internal changes involving xarray, I believe. Hope you have a nice bit of vacay this week! |
No xarray, still a ways from that guys, sorry! My branch is painfully misnamed.
Thanks though guys, really awesome to have more eyes and knowledge on things!
Calum Chamberlain
Postdoctoral Fellow, Seismology
Victoria University of Wellington
Sent from mobile.
…________________________________
From: Chet Hopp <[email protected]>
Sent: Sunday, June 11, 2017 10:01:44 AM
To: eqcorrscan/EQcorrscan
Cc: Calum Chamberlain; Mention
Subject: Re: [eqcorrscan/EQcorrscan] Subspace Freq domain statistic (#92)
@d-chambers<https://github.com/d-chambers> Extremely useful comments, thanks! do_ffts() should definitely be private and I wasn't aware that the ffts were no longer sensitive to power-of-2 lengths.
The frequency domain normalization you reference was really just an attempt to recreate Harris' approach literally. I haven't profiled a time domain approach but I don't have any attachment to the way it is currently so I'll give bottleneck a shot tomorrow.
Re: the frequency decimation, I hadn't considered the array order, so it may very well be a C/Fortran order issue. I'll give that a try. I agree that it isn't imperative right away, though.
Also, thanks for sending over your notebook. I'll give it a closer read first thing in the morning. Calum is close to pushing some internal changes involving xarray, I believe.
Hope you have a nice bit of vacay this week!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#92 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AKaGc9MwagVwfAnhkRA1mZmfpMp6BLx_ks5sC6z4gaJpZM4NK_Pf>.
|
Benchmarks for _do_ffts() # Next power of 2
%timeit _do_ffts(detector, stream, Nc)
1 loop, best of 3: 1.06 s per loop
# next_fast_len()
%timeit _do_ffts(detector, stream, Nc)
1 loop, best of 3: 388 ms per loop
# next_fast_len() and rffts
%timeit _do_ffts(detector, stream, Nc)
1 loop, best of 3: 202 ms per loop Then, for the detection statistic calculation: # Full ifft
%timeit _det_stat_freq(det_freq, data_freq_sq, data_freq, fft_vars[3], Nc, fft_vars[4], fft_vars[5])
1 loop, best of 3: 837 ms per loop
# Using rfft
%timeit _det_stat_freq(det_freq, data_freq_sq, data_freq, fft_vars[3], Nc, fft_vars[4], fft_vars[5])
1 loop, best of 3: 180 ms per loop Pushing changes now. Thanks for the feedback, @d-chambers! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks good, two things to change then good to go IMHO
else: | ||
inc = np.uint32(1) | ||
# Stats must be same size for multiplexed or non multiplexed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintain indent for comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment removed in some recent commit.
@@ -586,6 +582,41 @@ def _detect(detector, st, threshold, trig_int, moveout=0, min_trig=0, | |||
return detections | |||
|
|||
|
|||
def _do_ffts(detector, stream, Nc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a doc-string for this please? I like to have them even for internal functions so we know what's going on (I probably don't have them for everything though!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On it.
Rebase seemed to work....maybe.
Your original suggestion of reading in the test detectors, renaming them, and then re-writing them should work with some minor transposing of matrices. I'll push those in a bit.