-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow implicit var.value in formula #5825
Conversation
If tests are failing then it will be good to check how many revdeps are affected. |
@jangorecki Thanks. I was going to ask you about revdeps, but for the moment I will close this PR as it fails more than I expected. |
first step should be fixing tests and then merging PR into master. |
I see only two errors, I would encourage you to reopen and try to fix
|
(you can leave it open as a draft, and then change it to "ready to merge" when tests are passing) |
@tdhock Thanks! Let's see. |
@MichaelChirico |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5825 +/- ##
=======================================
Coverage 97.51% 97.51%
=======================================
Files 80 80
Lines 14979 14942 -37
=======================================
- Hits 14607 14571 -36
+ Misses 372 371 -1 ☔ View full report in Codecov by Sentry. |
Updates here.This is not more a draft. OptionsFrom the options above in main comment I implemented the third one, which is implementing new behaviour through new arguments, but keeping current behaviour as default and documenting it. Therefore, after the commit we get: Current behaviour> dcast(DT, ...~label, value.var = "val", sum)
Key: <.>
. Apr Feb Jan Mar May
<char> <num> <num> <num> <num> <num>
1: . 0 0 0 0 0
> dcast(DT, val ~ label, value.var = "val", sum)
Key: <val>
val Apr Feb Jan Mar May
<num> <num> <num> <num> <num> <num>
1: 0 0 0 0 0 0
> dcast(DT, ...~label, value.var = "val", length)
Key: <.>
. Apr Feb Jan Mar May
<char> <int> <int> <int> <int> <int>
1: . 1 1 1 1 1
> dcast(DT, val ~ label, value.var = "val", length)
Key: <val>
val Apr Feb Jan Mar May
<num> <int> <int> <int> <int> <int>
1: 0 1 1 1 1 1
> dcast(DT2, ...~label, value.var = "val", sum)
Key: <.>
. Apr Feb Jan Mar May
<char> <int> <int> <int> <int> <int>
1: . 4 2 1 3 5
> dcast(DT2, val ~ label, value.var = "val", sum)
Key: <val>
val Apr Feb Jan Mar May
<int> <int> <int> <int> <int> <int>
1: 1 0 0 1 0 0
2: 2 0 2 0 0 0
3: 3 0 0 0 3 0
4: 4 4 0 0 0 0
5: 5 0 0 0 0 5
> dcast(DT2, ...~label, value.var = "val", length)
Key: <.>
. Apr Feb Jan Mar May
<char> <int> <int> <int> <int> <int>
1: . 1 1 1 1 1
> dcast(DT2, val ~ label, value.var = "val", length)
Key: <val>
val Apr Feb Jan Mar May
<int> <int> <int> <int> <int> <int>
1: 1 0 0 1 0 0
2: 2 0 1 0 0 0
3: 3 0 0 0 1 0
4: 4 1 0 0 0 0
5: 5 0 0 0 0 1
> (out.dots <- dcast(out1, r + ... + indexU ~ indexU, fun = length))
Key: <r, var1, var2, index, a, indexU>
r var1 var2 index a indexU k l
<int> <int> <int> <char> <int> <char> <int> <int>
1: 1 1 5 a 1 k 1 0
2: 2 2 6 b 0 l 0 1
3: 3 3 7 a 1 k 1 0
4: 4 4 8 b 0 l 0 1
> (out.dots <- dcast(out1, r + var1 + var2 + index + a + b + indexU ~ indexU, fun = length))
Key: <r, var1, var2, index, a, b, indexU>
r var1 var2 index a b indexU k l
<int> <int> <int> <char> <int> <int> <char> <int> <int>
1: 1 1 5 a 1 0 k 1 0
2: 2 2 6 b 0 1 l 0 1
3: 3 3 7 a 1 0 k 1 0
4: 4 4 8 b 0 1 l 0 1
> dcast(DT, .~label, value.var = "val", sum)
Key: <.>
. Apr Feb Jan Mar May
<char> <num> <num> <num> <num> <num>
1: . 0 0 0 0 0
> dcast(DT, .~label, value.var = "val", length)
Key: <.>
. Apr Feb Jan Mar May
<char> <int> <int> <int> <int> <int>
1: . 1 1 1 1 1
> dcast(DT2, .~label, value.var = "val", sum)
Key: <.>
. Apr Feb Jan Mar May
<char> <int> <int> <int> <int> <int>
1: . 4 2 1 3 5
> dcast(DT2, .~label, value.var = "val", length)
Key: <.>
. Apr Feb Jan Mar May
<char> <int> <int> <int> <int> <int>
1: . 1 1 1 1 1 New behaviour> dcast(DT, ...~label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <val>
val Apr Feb Jan Mar May
<num> <num> <num> <num> <num> <num>
1: 0 0 0 0 0 0
> dcast(DT, val ~ label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <val>
val Apr Feb Jan Mar May
<num> <num> <num> <num> <num> <num>
1: 0 0 0 0 0 0
> dcast(DT, ...~label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <val>
val Apr Feb Jan Mar May
<num> <int> <int> <int> <int> <int>
1: 0 1 1 1 1 1
> dcast(DT, val ~ label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <val>
val Apr Feb Jan Mar May
<num> <int> <int> <int> <int> <int>
1: 0 1 1 1 1 1
> dcast(DT2, ...~label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <val>
val Apr Feb Jan Mar May
<int> <int> <int> <int> <int> <int>
1: 1 0 0 1 0 0
2: 2 0 2 0 0 0
3: 3 0 0 0 3 0
4: 4 4 0 0 0 0
5: 5 0 0 0 0 5
> dcast(DT2, val ~ label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <val>
val Apr Feb Jan Mar May
<int> <int> <int> <int> <int> <int>
1: 1 0 0 1 0 0
2: 2 0 2 0 0 0
3: 3 0 0 0 3 0
4: 4 4 0 0 0 0
5: 5 0 0 0 0 5
> dcast(DT2, ...~label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <val>
val Apr Feb Jan Mar May
<int> <int> <int> <int> <int> <int>
1: 1 0 0 1 0 0
2: 2 0 1 0 0 0
3: 3 0 0 0 1 0
4: 4 1 0 0 0 0
5: 5 0 0 0 0 1
> dcast(DT2, val ~ label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <val>
val Apr Feb Jan Mar May
<int> <int> <int> <int> <int> <int>
1: 1 0 0 1 0 0
2: 2 0 1 0 0 0
3: 3 0 0 0 1 0
4: 4 1 0 0 0 0
5: 5 0 0 0 0 1
> (out.dots <- dcast(out1, r + ... + indexU ~ indexU, fun = length, value.var.in.dots = TRUE))
Key: <r, var1, var2, index, a, b, indexU>
r var1 var2 index a b indexU k l
<int> <int> <int> <char> <int> <int> <char> <int> <int>
1: 1 1 5 a 1 0 k 1 0
2: 2 2 6 b 0 1 l 0 1
3: 3 3 7 a 1 0 k 1 0
4: 4 4 8 b 0 1 l 0 1
> (out.dots <- dcast(out1, r + var1 + var2 + index + a + b + indexU ~ indexU, fun = length, value.var.in.dots = TRUE))
Key: <r, var1, var2, index, a, b, indexU>
r var1 var2 index a b indexU k l
<int> <int> <int> <char> <int> <int> <char> <int> <int>
1: 1 1 5 a 1 0 k 1 0
2: 2 2 6 b 0 1 l 0 1
3: 3 3 7 a 1 0 k 1 0
4: 4 4 8 b 0 1 l 0 1
> dcast(DT, .~label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <.>
. Apr Feb Jan Mar May
<char> <num> <num> <num> <num> <num>
1: . 0 0 0 0 0
> dcast(DT, .~label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <.>
. Apr Feb Jan Mar May
<char> <int> <int> <int> <int> <int>
1: . 1 1 1 1 1
> dcast(DT2, .~label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <.>
. Apr Feb Jan Mar May
<char> <int> <int> <int> <int> <int>
1: . 4 2 1 3 5
> dcast(DT2, .~label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <.>
. Apr Feb Jan Mar May
<char> <int> <int> <int> <int> <int>
1: . 1 1 1 1 1 I include examples with a unique dot to show that they are not modified. RHS commentThis is solved too: > (DT = data.table(a=sample(10), b=2013:2014, variable=rep(c("c", "d"), each=10), value=runif(20)))
a b variable value
<int> <int> <char> <num>
1: 8 2013 c 0.27318542
2: 6 2014 c 0.94353915
3: 3 2013 c 0.82729946
4: 5 2014 c 0.89831191
5: 7 2013 c 0.78324213
6: 9 2014 c 0.32990720
7: 4 2013 c 0.43093672
8: 10 2014 c 0.90222205
9: 1 2013 c 0.30670624
10: 2 2014 c 0.97833887
11: 8 2013 d 0.91861714
12: 6 2014 d 0.67360817
13: 3 2013 d 0.39083139
14: 5 2014 d 0.03310679
15: 7 2013 d 0.56087016
16: 9 2014 d 0.94335232
17: 4 2013 d 0.90780129
18: 10 2014 d 0.33455651
19: 1 2013 d 0.91730701
20: 2 2014 d 0.56768771
a b variable value
> dcast(DT, a ~ ... + b, value.var="value")
Key: <a>
a c_2013 c_2014 d_2013 d_2014
<int> <num> <num> <num> <num>
1: 1 0.3067062 NA 0.9173070 NA
2: 2 NA 0.9783389 NA 0.56768771
3: 3 0.8272995 NA 0.3908314 NA
4: 4 0.4309367 NA 0.9078013 NA
5: 5 NA 0.8983119 NA 0.03310679
6: 6 NA 0.9435392 NA 0.67360817
7: 7 0.7832421 NA 0.5608702 NA
8: 8 0.2731854 NA 0.9186171 NA
9: 9 NA 0.3299072 NA 0.94335232
10: 10 NA 0.9022220 NA 0.33455651
> identical(dcast(DT, a ~ variable + value + b, value.var="value"), dcast(DT, a ~ ... + b, value.var="value", value.var.in.dots=TRUE))
[1] TRUE
> identical(dcast(DT, a ~ variable + value + b, value.var="value"), dcast(DT, a ~ ... + b, value.var="value", value.var.in.RHSdots=TRUE))
[1] TRUE SummaryWhat do you think? @jangorecki @tdhock @MichaelChirico @ben-schwen Thanks! Finally, if you find suitable to merge this PR (maybe after some modifications): TODO
|
thanks! this looks good to me so far. |
@tdhock I don't think suggesting for addition to CODEOWNERS is a good idea because being there requires some commitment from a person, and that we should not expect from every contributor, and especially a first time contributor.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Toby said, we need tests. Now codecov says only 21.73% of diff has been test covered.
I have just pushed some modifications to the code and added the tests.
Let's see how much of diff is test covered now. Tell me if something more is needed. I agree with @jangorecki is not a good idea including me as a CODEOWNER. I do not know most of the code involved in |
Thanks Iago, I don't think anyone active knows most of the dcast code (I certainly do not), and since you have been working on dcast, you probably know it as well as anyone, so I think it would definitely be helpful to the community, if you would be willing to volunteer as reviewer. Again no problem if you do not want to volunteer though. |
this looks great to me. I especially like your documentation and examples for the new arguments to dcast. |
….. in formula or not (current and default)
inst/tests/tests.Rraw
Outdated
DT2 = data.table(index = c("a","b"), x = 1:2, y = rnorm(2)) | ||
DT3 = data.table(year = c(rep(1986,4), rep(1987,3), rep(1988,2), 1989:1991), continent = rep(c("Europe","Asia"), each = 6), country = rep(c("Sweden","Germany","France","India","Japan","China"), each = 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend avoiding variable names with numbers in them like DT3
and DT3_dcasted2
-- can you please fix by changing numbers to informative names? (something that communicates expectation / intent would be appreciated)
also you may consider changing DT2 to DT, but moving the definition down below to where it is actually used (and after the tests involving the first DT)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I follow both of your suggestions. On one hand, reorganizing the code to have the definitions only when they will be used, and on the other hand with some names a bit more informative. I will push in a few seconds.
Thanks to reviewer Toby Dylan Hocking (@tdhock) and his comments.
in order to remove the repetition of `vars` redefinition and the `split_deparsing` calls through multiple `if_else`. Thanks again to Toby Dylan Hocking (@tdhock) for reviewing and suggesting this improvement.
instead of `isTRUE` and `isFALSE`.
nice 2- trick! I like the simplification /removal of repetition that you proposed. |
@tdhock |
Great thanks! I will merge and then we can see what happens to revdeps tomorrow. |
Closes #5824.
Introduction
This is a draft as, with this modification, a test fails (one test, for the moment), and you may decide the definite behaviour to keep.
As @tdhock remarks in #5824 (comment),
?dcast
saysbut this behaviour is not fully satisfied, at least when
...
is part of the LHS of the formula. This PR modifies just its behaviour in the LHS of the formula.RHS comment
Another issue is the "same " in RHS. Again from @tdhock comment we see that
but this could be considered in another issue/PR. Or in a later commit in this PR; in this case,the test including the code shown just above would fail too.
Options
value.var
columns are not present in the output ofdcast
even if they are included in LHS of formula through...
#5824 will remain. The issue could be closed documenting that...
do not includevalue.var
variables.revdep
s are affecteddcast
allowing to keep old behaviour or the new behaviour, choosing as defaultvalue.var
columns are not present in the output ofdcast
even if they are included in LHS of formula through...
#5824 is not an issue but an expected behaviour under the option default.revdep
s.Let's see some examples from the failing test and from the issue #5824..
Code/Examples
Setup
Previous/Current behaviour
New behaviour