-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace integers with explicit integers (1 -> 1L, etc.) #2573
Conversation
b329434
to
de4c4ec
Compare
Very nice! |
Doesn't make sense to require users to run Investigating, though I'm in the dark here for a moment. Sometimes strong typing would be nice 😬 |
Also, should probably turn the lines like:
into a function since they're so common -- @mattdowle IIUC this is what the @mattdowle edit : yes, it should be using |
R/data.table.R
Outdated
as.character(q[[1]]) %chin% "[" && is.numeric(q[[3]]) && | ||
length(q[[3]])==1 && q[[3]]>0 ) | ||
ans = cond && length(q)==3L && ( as.character(q[[1L]]) %chin% c("head", "tail") && | ||
(identical(q[[3L]], 1L) || identical(q[[3L]], 1L)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this can't just be q[[3L]] == 1L
?
de4c4ec
to
ee82f0f
Compare
R/data.table.R
Outdated
ans = cond && length(q) == 3L && | ||
length(q[[3L]]) == 1L && is.numeric(q[[3L]]) && ( | ||
as.character(q[[1L]]) %chin% c("head", "tail") && q[[3L]] == 1L || | ||
as.character(q[[1L]]) %chin% "[" && q[[3L]] > 0 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic now seems more natural to me (in particular, I see no reason to check is.numeric
and length
conditions only for the [
case and not for the head
/tail
case -- perhaps a few tests are in order?), and the code reads better too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of identical
may have had NA
in mind. Also, for the head / tail
case, is.numeric
and length
were being checked via identical()
. Yes more tests would be great. Seems like codecov agrees and is not-passing which is good (confirms it's a good exercise).
The line needs a comment saying what it's supposed to be doing, doesn't it!
Can you remove the added spaces around ==
please. To my eyes there is now lots of separate things all with space between them and it's harder to see the tree of logic. =
is already a different character to letters (and is not valid in symbols) so =
is already a visual separator. Removing the spaces around ==
allows you to budge the pieces together in this case, so that the remaining spaces are more helpful.
Isn't head and tail with Ok I see the n>1
now going to pass? That doesn't seem to be the intent of the original line (only n==1L
and n==1
were allowed, iiuc).q[[3L]]==1L
now.
An extra set of parens around the left and right of the last ||
would be clearer. There was a paren around the left of that ||
in the original but not on its right.
Codecov Report
@@ Coverage Diff @@
## master #2573 +/- ##
==========================================
+ Coverage 91.49% 92.94% +1.45%
==========================================
Files 63 61 -2
Lines 12229 12107 -122
==========================================
+ Hits 11189 11253 +64
+ Misses 1040 854 -186
Continue to review full report at Codecov.
|
@mattdowle Ran this to test speed:
Average duration from testing 10 times on version currently at
Compared to this branch:
About a 3% improvement... pretty damn good if you ask me! Certainly more than I expected. (I do have many of the external packages; perhaps we should add a
|
Nice result. What's the improvement in the |
Certainly easier to do, but I wasn't sure if there's any overhead in |
I'd be surprised if the overhead is more than negligible. Easy to check...
00:01:23 == 82.933 |
Hmm, slightly different for me:
anyway i'll run with |
Not as impressive this time (<1%):
(cobbled together from |
} | ||
stop("problem recycling column ",i,", try a simpler type") | ||
# } | ||
stop("argument ",i," (nrow ",nrows[i],") cannot be recycled without remainder to match longest nrow (",nr,")") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As near as I can tell this error is superseded by the above warning? Also eliminated the commented branch since it seems to be vestigial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. +1
@@ -22,18 +22,18 @@ inrange <- function(x,lower,upper,incbounds=TRUE) { | |||
subject = setDT(list(l=lower, u=upper)) | |||
ops = if (incbounds) c(4L, 2L) else c(5L, 3L) # >=,<= and >,< | |||
verbose = getOption("datatable.verbose") | |||
if (verbose) {last.started.at=proc.time()[3];cat("forderv(query) took ... ");flush.console()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattdowle what about two functions, timestart
and timetaken
? timestart
would accept a message as input, cat
it, flush.console()
s and return the proc.time()
; timetaken
takes the result of timestart
, cat
s it, and flush.console()
, and returns nothing.
(also perhaps this should be done as a separate PR?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes separate PR. Sometimes there are several nested timing blocks in play. Storing the started.at
time in a variable and then passing that (or the appropriate one) to timetaken
I think I like the balance of using a common function along with the flexibility of using it in different ways depending on the circumstances. i.e. current timetaken()
approach, but actually using it here. IIUC.
5746f5d
to
f1bdd23
Compare
On the contrary, I'd say the |
Looks like it's a correct fail on test 1187.6 I see locally. I'll leave to you. |
9f392fb
to
4c8162d
Compare
97b1f47
to
bf9fdf6
Compare
bf9fdf6
to
7c73e5b
Compare
R/foverlaps.R
Outdated
@@ -128,24 +129,15 @@ foverlaps <- function(x, y, by.x = if (!is.null(key(x))) key(x) else key(y), by. | |||
} | |||
# nomatch has no effect here, just for passing arguments consistently to `bmerge` | |||
.Call(Clookup, uy, nrow(y), indices(uy, y, yintervals, nomatch=0L, roll=roll), maxgap, minoverlap, mult, type, verbose) | |||
if (maxgap == 0L && minoverlap == 1L) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arunsrinivasan check out this diff. axed iintervals
as it appears unused. And axed the extra argument checks to simplify so long as the other options are unimplemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaelChirico good catch regarding iintervals
. Seems like it wasn't used right from the first commit of foverlaps
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you simplified code, but I left it there to serve as a placeholder when I'll come back to work on it... The first if
statement would always be run currently. So that's the only cost here, which is fine.. Would it be possible to revert this part to how it was please?
The PR overall looks great! Apologies for not being able to write earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arunsrinivasan I figured that's why you left it like that... only changed the logic to make testing easier, Codecov was complaining since it's impossible to get to the other branches to test them (all hail the mighty Codecov 🌞 ⛪️ ). I figured it'd be easy enough to revert to the old code when needed (that's what git's for right?)
as a compromise, I could leave the old code in, just commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not wrap those lines with # nocov start
and # nocov end
so as to skip coverage tests? See https://github.com/Rdatatable/data.table/blob/70db3e48738026388c6487e83ee902ad37256ca5/R/c.factor.R
Yes, commented out with a comment pointing to this PR would be okay too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simply because i didn't know that was possible! I'll add that to onAttach
and onLoad
as well... thanks!
af1670d
to
6828e3f
Compare
@mattdowle OK I think I've hammered away enough at the |
6828e3f
to
689e737
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic!
This could help memory usage and gc()
time, too (not just the 3-4% you found), by avoiding all those length 1 numeric vectors. Anyway, it puts data.table in the best position to benefit from the global-small-integer optimization on the way in R itself.
And increase of coverage by 0.33% (from 91.48% to 91.81%) is worth it by itself.
I've requested review from @arunsrinivasan re the foverlaps changes. Maybe those removed lines could be captured in an issue to implement or as a comment, if not already.
…f internal-to-R global small-integers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaelChirico overall great PR! I've commented under foverlaps
.
Apologies for not replying earlier.
R/foverlaps.R
Outdated
@@ -15,7 +15,7 @@ foverlaps <- function(x, y, by.x = if (!is.null(key(x))) key(x) else key(y), by. | |||
mult = match.arg(mult) | |||
if (type == "equal") | |||
stop("type = 'equal' is not implemented yet. But note that this is just the same as a normal data.table join y[x, ...], unless you are also interested in setting 'minoverlap / maxgap' arguments. But those arguments are not implemented yet as well.") | |||
if (maxgap > 0L || minoverlap > 1L) | |||
if (maxgap != 0L || minoverlap != 1L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxgap < 0 and minoverlap < 1 both are checked in lines 6 and 8. What's the purpose of this change?
R/foverlaps.R
Outdated
setkey(uy)[, `:=`(lookup = list(list(integer(0))), type_lookup = list(list(integer(0))), count=0L, type_count=0L)] | ||
if (verbose) {cat(round(proc.time()[3]-last.started.at,3),"secs\n");flush.console} | ||
setkey(uy)[, `:=`(lookup = list(list(integer(0L))), type_lookup = list(list(integer(0L))), count=0L, type_count=0L)] | ||
if (verbose) {cat(timetaken(last.started.at)); flush.console()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused as to how these changes are related to this PR (1 -> 1L like enhancements). But I'm guessing you've discussed these changes with Matt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arunsrinivasan the Codecov report was failing pretty miserably on the PR, since it seems to have uncovered a lot of un-tested lines (this one in particular would have come up from 3
becoming 3L
, for example
R/foverlaps.R
Outdated
@@ -128,24 +129,15 @@ foverlaps <- function(x, y, by.x = if (!is.null(key(x))) key(x) else key(y), by. | |||
} | |||
# nomatch has no effect here, just for passing arguments consistently to `bmerge` | |||
.Call(Clookup, uy, nrow(y), indices(uy, y, yintervals, nomatch=0L, roll=roll), maxgap, minoverlap, mult, type, verbose) | |||
if (maxgap == 0L && minoverlap == 1L) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaelChirico good catch regarding iintervals
. Seems like it wasn't used right from the first commit of foverlaps
.
R/foverlaps.R
Outdated
@@ -128,24 +129,15 @@ foverlaps <- function(x, y, by.x = if (!is.null(key(x))) key(x) else key(y), by. | |||
} | |||
# nomatch has no effect here, just for passing arguments consistently to `bmerge` | |||
.Call(Clookup, uy, nrow(y), indices(uy, y, yintervals, nomatch=0L, roll=roll), maxgap, minoverlap, mult, type, verbose) | |||
if (maxgap == 0L && minoverlap == 1L) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you simplified code, but I left it there to serve as a placeholder when I'll come back to work on it... The first if
statement would always be run currently. So that's the only cost here, which is fine.. Would it be possible to revert this part to how it was please?
The PR overall looks great! Apologies for not being able to write earlier.
Merge branch 'master' of https://github.com/Rdatatable/data.table into explicitIntegers # Conflicts: # R/data.table.R # inst/tests/tests.Rraw
Merge branch 'explicitIntegers' of https://github.com/Rdatatable/data.table into explicitIntegers # Conflicts: # inst/tests/tests.Rraw
R/test.data.table.R
Outdated
@@ -1,10 +1,10 @@ | |||
|
|||
# nocov start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a nocov start
here? Surely test.data.table
is tested; it's the main entry point to the test suite!
R/test.data.table.R
Outdated
@@ -56,6 +61,7 @@ compactprint <- function(DT, topn=2) { | |||
invisible() | |||
} | |||
|
|||
# nocov start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here. INT() is used in lots of tests. Why exclude it from code cov?
…live. Added nocov block instead.
R/foverlaps.R
Outdated
if (verbose) {cat(timetaken(last.started.at));flush.console()} | ||
olaps = .Call(Coverlaps, uy, xmatches, mult, type, nomatch, verbose) | ||
} | ||
# nocov start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
… newline in its return value; e.g. at the end of tests.Rraw it's used assuming no newline included
@mattdowle thanks for the detailed review, i learned a lot about codecov here :) |
Part of #2572, tried to make sure I respected "strong typing" (don't force
integer
on potentialnumeric
s).Command to find these: