Subspace Freq domain statistic #92

cjhopp · 2017-04-28T02:38:46Z

Rebase seemed to work....maybe.

Your original suggestion of reading in the test detectors, renaming them, and then re-writing them should work with some minor transposing of matrices. I'll push those in a bit.

merge det_stat_freq into svd-patch

…rrscan into cjhopp-svd-patch

cjhopp · 2017-06-08T06:07:34Z

Hey @calum-chamberlain, ever seen the current Travis fail for 3.5 on Linux? Some strange issue with an invalid resize at the end of channel_loop....Did I break it???

calum-chamberlain · 2017-06-08T21:09:30Z

@cjhopp No I have not... Curious, going to try and re-run, and will try the tests locally also.
@d-chambers have you had a look at this? We were struggling to reason why @cjhopp had to decimate not from the zeroth index of the detection statistic in the frequency domain - it kind of makes sense, but I can't describe it - any insight or recommendations?

cjhopp · 2017-06-08T21:45:11Z

To tag on to that a bit, @d-chambers, I'm curious if you've been able to implement the aliasing in the frequency domain described by Harris (and, if so, were the time savings worth the trouble?). We've had a crack at it, but it isn't producing the correct results.

Dear Panda, No hard feelings. -Chet

d-chambers · 2017-06-08T23:10:48Z

hey @cjhopp, in detex I didn't implement the freq. domain decimation. I think when I got to that point I was in a hurry to graduate so I just had to get the code working.

Happy to take a look at your implementation though. Will try to add a few comments when I get a chance.

d-chambers · 2017-06-09T00:12:48Z

@cjhopp, where are you doing frequency domain decimation? I think I am missing it.

cjhopp · 2017-06-09T00:56:51Z

@d-chambers Sorry, I should have been more clear with that question! I didn't end up putting it in this PR as we couldn't make sense of it. From my point of view, the only extra step would be adding something like this: num_cor.reshape(num_cor.shape[0], Nc, M).sum(axis=1) here where M should be len(template) + len(data) - 1.

Hopefully, I can get another crack at it next week, but if you have any thoughts about how best to do it, I'm all ears.

cjhopp · 2017-06-09T02:14:45Z

This was as far as the implementation got.

d-chambers · 2017-06-08T23:58:08Z

eqcorrscan/core/subspace.py

+def do_ffts(detector, stream, Nc):
+    min_fftlen = int(stream[0][0].data.shape[0] +
+                     detector.data[0].shape[0] - Nc)
+    fftlen = 1 << min_fftlen.bit_length()


I am not sure this is optimal anymore... I know scipy doesn't require a power of 2 length (although the docs do say it is optimal I suspect it is no longer the case). Consider this (note I am using Anaconda which uses the MLK ):

import numpy as np import scipy.fftpack ar_len = 3600 * 400 + 12 # a resonable size for a seismic trace nearest_2 = 1 << (ar_len - 1).bit_length() nearest_fast = scipy.fftpack.next_fast_len(ar_len) %time np.fft.fft(ar, n=nearest_2) # takes 386 ms %time np.fft.fft(ar, n=nearest_fast) # takes 228 ms

Moreover, if you are only expecting real inputs (A fair assumption in seismology no?) consider using rfft:

%time np.fft.rfft(ar, n=nearest_2) # takes 200 ms %time np.fft.rfft(ar, n=nearest_fast) # takes 123 ms

Using rfft with next_fast_length would give nearly a 4x speed up (similar on ifft)

d-chambers · 2017-06-08T23:58:44Z

eqcorrscan/core/subspace.py

@@ -573,6 +581,39 @@ def _detect(detector, st, threshold, trig_int, moveout=0, min_trig=0,
    return detections


+def do_ffts(detector, stream, Nc):


Should this be a private method? If so it should start with an underscore, if not probably needs a docstring. I think a good rule is to add docstring to all functions. For private functions/methods a simple sentence or two that explains what it does is fine, but all public methods should have fancy formatted docstrings sphinx can use.

d-chambers · 2017-06-09T16:46:03Z

Thanks for pointing that out, I am fairly new to a lot of the github functionality so the extra help is appreciated.

I left you two general comments in the review.

Interesting frequency domain normalization here-ish. Did you profile against time domain implementations (with bottleneck moving variance for example?) or maybe this issue makes it better to avoid bottleneck so float 32 can be used?

Regarding the frequency domain decimation; I haven't been able to try it out yet. I have a deadline today and I am on vacation next week, so it may be awhile before I can really dig into it. it is probably worth going ahead without it, we can also come back to it (it shouldn't affect any APIs). Might be a reshape issue? Maybe the order should be 'f' rather than "c"?

I sent @calum-chamberlain a notebook awhile ago with my current thoughts on the subspace implementation and general UX concepts I am partial to, I will send it to you as well. I like the idea of using xarray to store all data then using broadcasting to avoid python loops. I am currently trying to get a prototype working. I am not sure how hard it would be to implement such a data structure in eqcorrscan, I think calum is exploring that space.

cjhopp · 2017-06-11T09:01:44Z

@d-chambers Extremely useful comments, thanks! do_ffts() should definitely be private and I wasn't aware that the ffts were no longer sensitive to power-of-2 lengths.

The frequency domain normalization you reference was really just an attempt to recreate Harris' approach literally. I haven't profiled a time domain approach but I don't have any attachment to the way it is currently so I'll give bottleneck a shot tomorrow.

Re: the frequency decimation, I hadn't considered the array order, so it may very well be a C/Fortran order issue. I'll give that a try. I agree that it isn't imperative right away, though.

Also, thanks for sending over your notebook. I'll give it a closer read first thing in the morning. Calum is close to pushing some internal changes involving xarray, I believe.

Hope you have a nice bit of vacay this week!

calum-chamberlain · 2017-06-11T21:13:16Z

No xarray, still a ways from that guys, sorry! My branch is painfully misnamed. Thanks though guys, really awesome to have more eyes and knowledge on things! Calum Chamberlain Postdoctoral Fellow, Seismology Victoria University of Wellington Sent from mobile.

…

________________________________ From: Chet Hopp <[email protected]> Sent: Sunday, June 11, 2017 10:01:44 AM To: eqcorrscan/EQcorrscan Cc: Calum Chamberlain; Mention Subject: Re: [eqcorrscan/EQcorrscan] Subspace Freq domain statistic (#92) @d-chambers<https://github.com/d-chambers> Extremely useful comments, thanks! do_ffts() should definitely be private and I wasn't aware that the ffts were no longer sensitive to power-of-2 lengths. The frequency domain normalization you reference was really just an attempt to recreate Harris' approach literally. I haven't profiled a time domain approach but I don't have any attachment to the way it is currently so I'll give bottleneck a shot tomorrow. Re: the frequency decimation, I hadn't considered the array order, so it may very well be a C/Fortran order issue. I'll give that a try. I agree that it isn't imperative right away, though. Also, thanks for sending over your notebook. I'll give it a closer read first thing in the morning. Calum is close to pushing some internal changes involving xarray, I believe. Hope you have a nice bit of vacay this week! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#92 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AKaGc9MwagVwfAnhkRA1mZmfpMp6BLx_ks5sC6z4gaJpZM4NK_Pf>.

cjhopp · 2017-06-11T22:49:48Z

Benchmarks for _do_ffts()

# Next power of 2
%timeit _do_ffts(detector, stream, Nc)
1 loop, best of 3: 1.06 s per loop 
# next_fast_len()
%timeit _do_ffts(detector, stream, Nc)
1 loop, best of 3: 388 ms per loop
# next_fast_len() and rffts
%timeit _do_ffts(detector, stream, Nc)
1 loop, best of 3: 202 ms per loop

Then, for the detection statistic calculation:

# Full ifft
%timeit _det_stat_freq(det_freq, data_freq_sq, data_freq, fft_vars[3], Nc, fft_vars[4], fft_vars[5])
1 loop, best of 3: 837 ms per loop
# Using rfft
%timeit _det_stat_freq(det_freq, data_freq_sq, data_freq, fft_vars[3], Nc, fft_vars[4], fft_vars[5])
1 loop, best of 3: 180 ms per loop

Pushing changes now. Thanks for the feedback, @d-chambers!

calum-chamberlain

Nice, looks good, two things to change then good to go IMHO

calum-chamberlain · 2017-06-15T05:14:50Z

eqcorrscan/core/subspace.py

    else:
-        inc = np.uint32(1)
-    # Stats must be same size for multiplexed or non multiplexed!


Maintain indent for comment here.

Comment removed in some recent commit.

calum-chamberlain · 2017-06-15T05:20:42Z

eqcorrscan/core/subspace.py

@@ -586,6 +582,41 @@ def _detect(detector, st, threshold, trig_int, moveout=0, min_trig=0,
    return detections


+def _do_ffts(detector, stream, Nc):


Can we have a doc-string for this please? I like to have them even for internal functions so we know what's going on (I probably don't have them for everything though!)

calum-chamberlain · 2017-06-15T05:50:12Z

Thanks @cjhopp - worth just merging it seeing as no changes made would have affected the code (all doc-strings). Press the button and update #100!

cjhopp and others added 6 commits May 23, 2017 18:37

implemented frequency domain det_statistic calculation; still cleaning

901a515

Removing detection statistic and cython things

701dc88

Merge pull request #95 from eqcorrscan/cjhopp-det_stat_freq

bb4ddfd

merge det_stat_freq into svd-patch

working on literal implementation of Harris

dc08366

Aliasing now probably correctly implemented, still nonsensical stats

571261c

Merge branch 'develop' into cjhopp-svd-patch

5daa1f5

calum-chamberlain added this to the 1.1.0 milestone Jun 2, 2017

calum-chamberlain added the enhancement label Jun 2, 2017

cjhopp added 6 commits June 7, 2017 13:39

subspace tests passing for freq domain stat

34a8621

Merge branch 'cjhopp-svd-patch' of https://github.com/eqcorrscan/EQco…

c33e294

…rrscan into cjhopp-svd-patch

get rid of pandas, stupid

7d7222c

get rid of pandas, stupid

f3988ff

PEP8 and testing changes from working-subspace

5b61480

PEP8 and subspace tutorias test change

b1935e3

cjhopp changed the title ~~svd() changes and new subspace_fc_plot() function~~ Subspace Freq domain statistic Jun 8, 2017

resolving conflicts with develop

ccd3529

d-chambers reviewed Jun 9, 2017

View reviewed changes

cjhopp added 2 commits June 12, 2017 10:56

fft speedups per d-chambers

75b99b5

pep8

615dc81

cjhopp mentioned this pull request Jun 15, 2017

Add 'metric' argument to Party.decluster() #100

Merged

calum-chamberlain requested changes Jun 15, 2017

View reviewed changes

Real docstrings for subspace private functions

0b60356

calum-chamberlain approved these changes Jun 15, 2017

View reviewed changes

cjhopp merged commit d074481 into develop Jun 15, 2017

cjhopp deleted the cjhopp-svd-patch branch June 23, 2017 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subspace Freq domain statistic #92

Subspace Freq domain statistic #92

cjhopp commented Apr 28, 2017

cjhopp commented Jun 8, 2017

calum-chamberlain commented Jun 8, 2017

cjhopp commented Jun 8, 2017

d-chambers commented Jun 8, 2017

d-chambers commented Jun 9, 2017

cjhopp commented Jun 9, 2017

cjhopp commented Jun 9, 2017

d-chambers Jun 8, 2017

d-chambers Jun 8, 2017

d-chambers commented Jun 9, 2017

cjhopp commented Jun 11, 2017

calum-chamberlain commented Jun 11, 2017 via email

cjhopp commented Jun 11, 2017 •

edited

Loading

calum-chamberlain left a comment

calum-chamberlain Jun 15, 2017

cjhopp Jun 15, 2017

calum-chamberlain Jun 15, 2017

cjhopp Jun 15, 2017

calum-chamberlain commented Jun 15, 2017

		@@ -573,6 +581,39 @@ def _detect(detector, st, threshold, trig_int, moveout=0, min_trig=0,
		return detections


		def do_ffts(detector, stream, Nc):

		@@ -586,6 +582,41 @@ def _detect(detector, st, threshold, trig_int, moveout=0, min_trig=0,
		return detections


		def _do_ffts(detector, stream, Nc):

Subspace Freq domain statistic #92

Subspace Freq domain statistic #92

Conversation

cjhopp commented Apr 28, 2017

cjhopp commented Jun 8, 2017

calum-chamberlain commented Jun 8, 2017

cjhopp commented Jun 8, 2017

d-chambers commented Jun 8, 2017

d-chambers commented Jun 9, 2017

cjhopp commented Jun 9, 2017

cjhopp commented Jun 9, 2017

d-chambers Jun 8, 2017

Choose a reason for hiding this comment

d-chambers Jun 8, 2017

Choose a reason for hiding this comment

d-chambers commented Jun 9, 2017

cjhopp commented Jun 11, 2017

calum-chamberlain commented Jun 11, 2017 via email

cjhopp commented Jun 11, 2017 • edited Loading

calum-chamberlain left a comment

Choose a reason for hiding this comment

calum-chamberlain Jun 15, 2017

Choose a reason for hiding this comment

cjhopp Jun 15, 2017

Choose a reason for hiding this comment

calum-chamberlain Jun 15, 2017

Choose a reason for hiding this comment

cjhopp Jun 15, 2017

Choose a reason for hiding this comment

calum-chamberlain commented Jun 15, 2017

cjhopp commented Jun 11, 2017 •

edited

Loading