-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #3639 -- proper handling of CPLXSXP in dogroups #3689
Conversation
fd4b465
to
c17bb81
Compare
remove debugging helpers remove debugging helpers remove debugging helpers
c17bb81
to
bf4fb10
Compare
Codecov Report
@@ Coverage Diff @@
## master #3689 +/- ##
==========================================
- Coverage 98.25% 98.23% -0.03%
==========================================
Files 69 69
Lines 13088 13099 +11
==========================================
+ Hits 12860 12868 +8
- Misses 228 231 +3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #3689 +/- ##
==========================================
- Coverage 98.25% 98.24% -0.02%
==========================================
Files 69 69
Lines 13094 13122 +28
==========================================
+ Hits 12866 12892 +26
- Misses 228 230 +2
Continue to review full report at Codecov.
|
The code is working, just need to improve coverage (any ideas?) |
double *td = REAL(target); | ||
double *sd = REAL(source); | ||
for (int r=0; r<maxn; ++r) td[ansloc+r] = sd[igrp]; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is in the ngrpcols
loop, so covering it requires being able to group by
complex, which we can't (yet)
src/dogroups.c
Outdated
case CPLXSXP : | ||
COMPLEX(VECTOR_ELT(xSD, j))[0].r = NA_REAL; | ||
COMPLEX(VECTOR_ELT(xSD, j))[0].i = NA_REAL; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this is coming from a join, so we won't be able to cover it yet.
Minimal example to trigger the INTSXP
branch of this switch
(adapted test 1283.3):
dt1 <- data.table(a=rep(1:2, each=2), c=c(7L, 2L, 4L, 8L))
dt2 <- data.table(b=rep(2:3), c=c(2L, 6L), d=c(20L, 8L))
dt1[dt2, on = c(a = 'b'), .(a=a), by=.EACHI]
I believe it's because j
uses a column from the LHS of on
.
But joins aren't available for complex yet...
OK. The remaining codecov problems I think are outside the scope of this PR (see comments). so I guess we should just wait to merge this one until after #3701, at which point we can add tests here. Will also file a coverage commit for |
inst/tests/tests.Rraw
Outdated
DT3 = DT3[sample(nrow(DT3))] | ||
DT1[ , x_num := match(x, letters) + .1] | ||
DT3[ , x_num := match(x, letters) + .1] | ||
test(1540.36, DT1[DT3, .(x_num), by=.EACHI, on=c(x_num="x_num")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strangely enough, I couldn't get this test to hit the LGLSXP
branch:
DT1[ , x_lgl := x > = 'a']
DT3[ , x_lgl := x > 'a']
DT1[DT3, .(x_lgl), by=.EACHI, on=c(x_lgl="x_lgl")]
doesn't do it...
case CPLXSXP : | ||
COMPLEX(VECTOR_ELT(xSD, j))[0].r = NA_REAL; | ||
COMPLEX(VECTOR_ELT(xSD, j))[0].i = NA_REAL; | ||
break; | ||
case STRSXP : | ||
SET_STRING_ELT(VECTOR_ELT(xSD,j),0,NA_STRING); | ||
break; | ||
case VECSXP : | ||
SET_VECTOR_ELT(VECTOR_ELT(xSD,j),0,R_NilValue); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we nocov this branch (or set it to error even?)
IIUC hitting this branch would require first joining on VECSXP
column in on
, I don't think we have any plan to support list-to-list joins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't no cov it otherwise it'll drop off our todo list. I'd say don't error either because something somewhere might be using it. I'd say just leave it as-is for now until it can be covered at a later date.
src/assign.c
Outdated
@@ -1002,6 +1011,11 @@ void writeNA(SEXP v, const int from, const int n) | |||
for (int i=from; i<=to; ++i) vd[i] = NA_REAL; | |||
} | |||
} break; | |||
case CPLXSXP: { | |||
Rcomplex *vd = COMPLEX(v); | |||
Rcomplex NA_CPLX = {NA_REAL, NA_REAL}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit strange why we have to define this ourselves but I see it done like this in a few places in r-source
(e.g. subset.c
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I see 3 in src/main/subset.c too. Those are the only occurence of the string "NA_C" (to catch both NA_COMPLEX, NA_CPLX, and anything similar starting with C for complex) in the entire source tree of R. So since it's so rarely used, I guess it just hasn't been moved into a header. Which seems reasonable. More to the point, because it needs to be a variable, it can't be plain compiled constant like the other NA_* definitions.
Have we heard from many users requesting complex support? Some links to users and fields requesting complex support please, would keep me happy that it's worth doing. |
I'm one happy data.table user that uses complex numbers frequently and would be even happier if complex support is enhanced 😄. Perusing github I see there are some issues addressing it:
Not a lot (understandably), but non-zero. |
@eliocamp Great, thanks. I looked through them. It looks to me as if it's programmers checking that all types are supported for completeness reasons, rather than users in the real-world actually needing complex number support. I did find one field mentioned in those issues though :
What's your field that you use complex numbers in? |
Meteorology, but I use some signal processing methods to analyse fields. Complex Principal Component Analysis, for example. |
Good to know! Are there any articles that show how meteorology, signal processing and/or complex-PCA need complex numbers in the context of data.table-like operations like grouping? I did complex number theory in undergraduate applied math and much enjoyed it, but I've never used complex numbers since then. |
All looks good. I realize the diff coverage is under target but that's ok, because as Michael wrote, other PRs will cover those in future. |
in fact that's the main thinking. I imagine there are some use cases around Fourier series running around but yea, I haven't come across it myself in a while either |
@@ -77,7 +77,6 @@ static void subsetVectorRaw(SEXP ans, SEXP source, SEXP idx, const bool anyNA) | |||
case CPLXSXP : { | |||
Rcomplex *sp = COMPLEX(source); | |||
Rcomplex *ap = COMPLEX(ans); | |||
Rcomplex NA_CPLX = { NA_REAL, NA_REAL }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at it again, would just inlining = (Rcomplex){NA_REAL, NA_REAL}
do the trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think (Rcomplex){NA_REAL, NA_REAL}
will work. Because Rcomplex
is a struct
. The {NA_REAL, NA_REAL} is special syntax just for initialization of a variable/struct.
I've never read anything specific about data.table. Here's a paper about the method: https://journals.ametsoc.org/doi/abs/10.1175/1520-0450%281984%29023%3C1660%3ACPCATA%3E2.0.CO%3B2 This is the code I'm using now. It gets the Hilbert transform of the data and then does an EOF (Empirical Orthogonal Functions --what we meteorologists call PCA) for each season, dataset and height level. Because of #3639 I had to use the workaround of separating the real an imaginary parts and then joining them together after the grouping was done. ceof <- data %>%
.[, c("R", "I") := ReIm(spectral::analyticFunction(hgt.minus)),
by = .(lat, season, year, dataset, lev)] %>%
.[, {
d <- copy(.SD)[, hgt := R + I*1i]
list(ceof = list(
EOF(hgt ~ year | lon + lat, n = 1:2,
B = 5000, suffix = "cPC", data = d)
))
}, by = .(dataset, season, lev)]
|
Thanks @eliocamp, helps a lot! Continuing here: #3701 (comment) |
#3639
Haven't quite figured out the ins and outs of
CPLXSXP
andCOMPLEX()
usage yet but I think this gets at the kernel of the issue --dogroups
assumesSIZEOF
is either 4 or 8, not so forCPLXSXP
(16). Will come back to this.FWIW
grep -Er "==\s?4" src
only turns upreorder
anddogroups
, so I think this and #3688 should cover this class of issue.