Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow implicit var.value in formula #5825

Merged
merged 14 commits into from
May 3, 2024
Merged

Conversation

iagogv3
Copy link
Contributor

@iagogv3 iagogv3 commented Dec 13, 2023

Closes #5824.

Introduction

This is a draft as, with this modification, a test fails (one test, for the moment), and you may decide the definite behaviour to keep.

As @tdhock remarks in #5824 (comment), ?dcast says

There are
     two special variables: '.' represents no variable, while '...'
     represents all variables not otherwise mentioned in 'formula';

but this behaviour is not fully satisfied, at least when ... is part of the LHS of the formula. This PR modifies just its behaviour in the LHS of the formula.

RHS comment

Another issue is the "same " in RHS. Again from @tdhock comment we see that

> DT = data.table(a=sample(10), b=2013:2014, variable=rep(c("c", "d"), each=10), value=runif(20))
> dcast(DT, a ~ ... + b, value.var="value")
     a    c_2013    c_2014    d_2013    d_2014
 1:  1        NA 0.9480322        NA 0.9989579
 2:  2        NA 0.6857395        NA 0.3604702
 3:  3        NA 0.5899338        NA 0.3327320
 4:  4 0.4278920        NA 0.1448164        NA
 5:  5 0.1815088        NA 0.8726709        NA
 6:  6        NA 0.8010612        NA 0.7053530
 7:  7 0.0166915        NA 0.9647718        NA
 8:  8        NA 0.7038001        NA 0.6355682
 9:  9 0.2878514        NA 0.4578474        NA
10: 10 0.3723362        NA 0.7866855        NA

but this could be considered in another issue/PR. Or in a later commit in this PR; in this case,the test including the code shown just above would fail too.

Options

Let's see some examples from the failing test and from the issue #5824..

Code/Examples

Setup

> (DT = data.table(label= month.abb[1:5], val=0))
   label val
1:   Jan   0
2:   Feb   0
3:   Mar   0
4:   Apr   0
5:   May   0
> (DT2 = data.table(label= month.abb[1:5], val=1:5))
   label val
1:   Jan   1
2:   Feb   2
3:   Mar   3
4:   Apr   4
5:   May   5
> test <- data.table(index=c("a","b"), indexU=c("k","l"), var1=1:4, var2=5:8)
> (out1 <- dcast(test[, r := .I], r + ... + index ~ index, fun = length))
Key: <r, indexU, var1, var2, index>
       r indexU  var1  var2  index     a     b
   <int> <char> <int> <int> <char> <int> <int>
1:     1      k     1     5      a     1     0
2:     2      l     2     6      b     0     1
3:     3      k     3     7      a     1     0
4:     4      l     4     8      b     0     1

Previous/Current behaviour

> dcast(DT, ...~label, value.var = "val", sum)
   . Apr Feb Jan Mar May
1: .   0   0   0   0   0
> dcast(DT, val ~ label, value.var = "val", sum)
   val Apr Feb Jan Mar May
1:   0   0   0   0   0   0
> dcast(DT, ...~label, value.var = "val", length)
   . Apr Feb Jan Mar May
1: .   1   1   1   1   1
> dcast(DT, val ~ label, value.var = "val", length)
   val Apr Feb Jan Mar May
1:   0   1   1   1   1   1
> dcast(DT2, ...~label, value.var = "val", sum)
   . Apr Feb Jan Mar May
1: .   4   2   1   3   5
> dcast(DT2, val ~ label, value.var = "val", sum)
   val Apr Feb Jan Mar May
1:   1   0   0   1   0   0
2:   2   0   2   0   0   0
3:   3   0   0   0   3   0
4:   4   4   0   0   0   0
5:   5   0   0   0   0   5
> dcast(DT2, ...~label, value.var = "val", length)
   . Apr Feb Jan Mar May
1: .   1   1   1   1   1
> dcast(DT2, val ~ label, value.var = "val", length)
   val Apr Feb Jan Mar May
1:   1   0   0   1   0   0
2:   2   0   1   0   0   0
3:   3   0   0   0   1   0
4:   4   1   0   0   0   0
5:   5   0   0   0   0   1
> (out.dots <- dcast(out1, r + ... + indexU ~ indexU, fun = length))
   r var1 var2 index a indexU k l
1: 1    1    5     a 1      k 1 0
2: 2    2    6     b 0      l 0 1
3: 3    3    7     a 1      k 1 0
4: 4    4    8     b 0      l 0 1
> (out.dots <- dcast(out1, r + var1 + var2 + index + a + b + indexU ~ indexU, fun = length))
   r var1 var2 index a b indexU k l
1: 1    1    5     a 1 0      k 1 0
2: 2    2    6     b 0 1      l 0 1
3: 3    3    7     a 1 0      k 1 0
4: 4    4    8     b 0 1      l 0 1
> dcast(DT, .~label, value.var = "val", sum)
   . Apr Feb Jan Mar May
1: .   0   0   0   0   0
> dcast(DT, .~label, value.var = "val", length)
   . Apr Feb Jan Mar May
1: .   1   1   1   1   1
> dcast(DT2, .~label, value.var = "val", sum)
   . Apr Feb Jan Mar May
1: .   4   2   1   3   5
> dcast(DT2, .~label, value.var = "val", length)
   . Apr Feb Jan Mar May
1: .   1   1   1   1   1

New behaviour

> dcast(DT, ...~label, value.var = "val", sum)
   val Apr Feb Jan Mar May
1:   0   0   0   0   0   0
> dcast(DT, ...~label, value.var = "val", length)
   val Apr Feb Jan Mar May
1:   0   1   1   1   1   1
> dcast(DT2, ...~label, value.var = "val", sum)
   val Apr Feb Jan Mar May
1:   1   0   0   1   0   0
2:   2   0   2   0   0   0
3:   3   0   0   0   3   0
4:   4   4   0   0   0   0
5:   5   0   0   0   0   5
> dcast(DT2, ...~label, value.var = "val", length)
   val Apr Feb Jan Mar May
1:   1   0   0   1   0   0
2:   2   0   1   0   0   0
3:   3   0   0   0   1   0
4:   4   1   0   0   0   0
5:   5   0   0   0   0   1
> (out.dots <- dcast(out1, r + ... + indexU ~ indexU, fun = length))
   r var1 var2 index a b indexU k l
1: 1    1    5     a 1 0      k 1 0
2: 2    2    6     b 0 1      l 0 1
3: 3    3    7     a 1 0      k 1 0
4: 4    4    8     b 0 1      l 0 1
> dcast(DT, .~label, value.var = "val", sum)
   . Apr Feb Jan Mar May
1: .   0   0   0   0   0
> dcast(DT, .~label, value.var = "val", length)
   . Apr Feb Jan Mar May
1: .   1   1   1   1   1
> dcast(DT2, .~label, value.var = "val", sum)
   . Apr Feb Jan Mar May
1: .   4   2   1   3   5
> dcast(DT2, .~label, value.var = "val", length)
   . Apr Feb Jan Mar May
1: .   1   1   1   1   1

@iagogv3 iagogv3 requested a review from mattdowle as a code owner December 13, 2023 14:44
@iagogv3 iagogv3 marked this pull request as draft December 13, 2023 14:46
@jangorecki
Copy link
Member

If tests are failing then it will be good to check how many revdeps are affected.

@iagogv3
Copy link
Contributor Author

iagogv3 commented Dec 13, 2023

@jangorecki Thanks. I was going to ask you about revdeps, but for the moment I will close this PR as it fails more than I expected.

@iagogv3 iagogv3 closed this Dec 13, 2023
@tdhock
Copy link
Member

tdhock commented Dec 14, 2023

first step should be fixing tests and then merging PR into master.
then revdep checks will be run automatically the next day, and if there are any new issues we can add new tests after that. (it is not really feasible to expect PR submitter to do revdep checks since there are so many)

@tdhock
Copy link
Member

tdhock commented Dec 14, 2023

I see only two errors, I would encourage you to reopen and try to fix

test.data.table() running: /home/runner/work/data.table/data.table/check/data.table.Rcheck/data.table/tests/tests.Rraw
Test 1102.18 ran without errors but failed check that x equals y:
> x = names(dcast(DT, a ~ ... + b, value.var = "value")) 
First 6 of 21 (type 'character'): 
[1] "a"                        "c_0.176556752528995_2014"
[3] "c_0.205974574899301_2013" "c_0.380035179434344_2013"
[5] "c_0.384103718213737_2014" "c_0.497699242085218_2014"
> y = c("a", "c_2013", "c_2014", "d_2013", "d_2014") 
First 5 of 5 (type 'character'): 
[1] "a"      "c_2013" "c_2014" "d_2013" "d_2014"
Lengths (21, 5) differ (string compare on first 5)
4 string mismatches
Test 1102.28 ran without errors but failed check that x equals y:
> x = dcast(DT, ... ~ label, value.var = "val", sum) 
   val Apr Feb Jan Mar May
1:   0   0   0   0   0   0
    [Key=val Types=dou,dou,dou,dou,dou,dou Classes=num,num,num,num,num,num]
1:                                                                         
> y = data.table(. = ".", Apr = 0, Feb = 0, Jan = 0, Mar = 0, May = 0,      key = ".") 
   . Apr Feb Jan Mar May
1: .   0   0   0   0   0
    [Key=. Types=cha,dou,dou,dou,dou,dou Classes=cha,num,num,num,num,num]
1:                                                                       
Error: Error: R CMD check found ERRORs
Execution halted
Different column names

Wed Dec 13 14:48:19 2023  endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ=='UTC', Sys.timezone()=='UTC', Sys.getlocale()=='LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C;LC_MONETARY=C.UTF-8;LC_MESSAGES=C.UTF-8;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codeset=UTF-8', getDTthreads()=='OpenMP version (_OPENMP)==[201](https://github.com/Rdatatable/data.table/actions/runs/7196859206/job/19602648644#step:9:202)511; omp_get_num_procs()==4; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==2147483647; omp_get_max_threads()==4; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 2 threads with throttle==1024. See ?setDTthreads.', zlibVersion()==1.2.11 ZLIB_VERSION==1.2.11
Error: Error: 2 error(s) out of 10992. Search tests/tests.Rraw for test number(s) 1102.18, 1102.28. Duration: 26.8s elapsed (29.3s cpu).

@tdhock
Copy link
Member

tdhock commented Dec 14, 2023

(you can leave it open as a draft, and then change it to "ready to merge" when tests are passing)

@iagogv3
Copy link
Contributor Author

iagogv3 commented Dec 14, 2023

@tdhock Thanks! Let's see.
Test 1102.28 error/fail is expected (see initial comment and examples). Test 1102.18 error/fail was unexpected, but I solved with a posterior commit.

@iagogv3 iagogv3 reopened this Dec 14, 2023
@iagogv3
Copy link
Contributor Author

iagogv3 commented Dec 14, 2023

@MichaelChirico
Regarding #5824 (comment) there are the options specified above in the first comment. What do you think?

@iagogv3 iagogv3 marked this pull request as ready for review December 15, 2023 10:07
Copy link

codecov bot commented Dec 15, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.51%. Comparing base (6f3fc8d) to head (0efc16f).
Report is 30 commits behind head on master.

❗ Current head 0efc16f differs from pull request most recent head 06fde87. Consider uploading reports for the commit 06fde87 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5825   +/-   ##
=======================================
  Coverage   97.51%   97.51%           
=======================================
  Files          80       80           
  Lines       14979    14942   -37     
=======================================
- Hits        14607    14571   -36     
+ Misses        372      371    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@iagogv3
Copy link
Contributor Author

iagogv3 commented Dec 15, 2023

Updates here.

This is not more a draft.

Options

From the options above in main comment I implemented the third one, which is implementing new behaviour through new arguments, but keeping current behaviour as default and documenting it. Therefore, after the commit we get:

Current behaviour

> dcast(DT, ...~label, value.var = "val", sum)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <num> <num> <num> <num> <num>
1:      .     0     0     0     0     0
> dcast(DT, val ~ label, value.var = "val", sum)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <num> <num> <num> <num> <num> <num>
1:     0     0     0     0     0     0
> dcast(DT, ...~label, value.var = "val", length)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <int> <int> <int> <int> <int>
1:      .     1     1     1     1     1
> dcast(DT, val ~ label, value.var = "val", length)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <num> <int> <int> <int> <int> <int>
1:     0     1     1     1     1     1
> dcast(DT2, ...~label, value.var = "val", sum)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <int> <int> <int> <int> <int>
1:      .     4     2     1     3     5
> dcast(DT2, val ~ label, value.var = "val", sum)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <int> <int> <int> <int> <int> <int>
1:     1     0     0     1     0     0
2:     2     0     2     0     0     0
3:     3     0     0     0     3     0
4:     4     4     0     0     0     0
5:     5     0     0     0     0     5
> dcast(DT2, ...~label, value.var = "val", length)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <int> <int> <int> <int> <int>
1:      .     1     1     1     1     1
> dcast(DT2, val ~ label, value.var = "val", length)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <int> <int> <int> <int> <int> <int>
1:     1     0     0     1     0     0
2:     2     0     1     0     0     0
3:     3     0     0     0     1     0
4:     4     1     0     0     0     0
5:     5     0     0     0     0     1
> (out.dots <- dcast(out1, r + ... + indexU ~ indexU, fun = length))
Key: <r, var1, var2, index, a, indexU>
       r  var1  var2  index     a indexU     k     l
   <int> <int> <int> <char> <int> <char> <int> <int>
1:     1     1     5      a     1      k     1     0
2:     2     2     6      b     0      l     0     1
3:     3     3     7      a     1      k     1     0
4:     4     4     8      b     0      l     0     1
> (out.dots <- dcast(out1, r + var1 + var2 + index + a + b + indexU ~ indexU, fun = length))
Key: <r, var1, var2, index, a, b, indexU>
       r  var1  var2  index     a     b indexU     k     l
   <int> <int> <int> <char> <int> <int> <char> <int> <int>
1:     1     1     5      a     1     0      k     1     0
2:     2     2     6      b     0     1      l     0     1
3:     3     3     7      a     1     0      k     1     0
4:     4     4     8      b     0     1      l     0     1
> dcast(DT, .~label, value.var = "val", sum)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <num> <num> <num> <num> <num>
1:      .     0     0     0     0     0
> dcast(DT, .~label, value.var = "val", length)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <int> <int> <int> <int> <int>
1:      .     1     1     1     1     1
> dcast(DT2, .~label, value.var = "val", sum)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <int> <int> <int> <int> <int>
1:      .     4     2     1     3     5
> dcast(DT2, .~label, value.var = "val", length)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <int> <int> <int> <int> <int>
1:      .     1     1     1     1     1

New behaviour

> dcast(DT, ...~label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <num> <num> <num> <num> <num> <num>
1:     0     0     0     0     0     0
> dcast(DT, val ~ label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <num> <num> <num> <num> <num> <num>
1:     0     0     0     0     0     0
> dcast(DT, ...~label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <num> <int> <int> <int> <int> <int>
1:     0     1     1     1     1     1
> dcast(DT, val ~ label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <num> <int> <int> <int> <int> <int>
1:     0     1     1     1     1     1
> dcast(DT2, ...~label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <int> <int> <int> <int> <int> <int>
1:     1     0     0     1     0     0
2:     2     0     2     0     0     0
3:     3     0     0     0     3     0
4:     4     4     0     0     0     0
5:     5     0     0     0     0     5
> dcast(DT2, val ~ label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <int> <int> <int> <int> <int> <int>
1:     1     0     0     1     0     0
2:     2     0     2     0     0     0
3:     3     0     0     0     3     0
4:     4     4     0     0     0     0
5:     5     0     0     0     0     5
> dcast(DT2, ...~label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <int> <int> <int> <int> <int> <int>
1:     1     0     0     1     0     0
2:     2     0     1     0     0     0
3:     3     0     0     0     1     0
4:     4     1     0     0     0     0
5:     5     0     0     0     0     1
> dcast(DT2, val ~ label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <val>
     val   Apr   Feb   Jan   Mar   May
   <int> <int> <int> <int> <int> <int>
1:     1     0     0     1     0     0
2:     2     0     1     0     0     0
3:     3     0     0     0     1     0
4:     4     1     0     0     0     0
5:     5     0     0     0     0     1
> (out.dots <- dcast(out1, r + ... + indexU ~ indexU, fun = length, value.var.in.dots = TRUE))
Key: <r, var1, var2, index, a, b, indexU>
       r  var1  var2  index     a     b indexU     k     l
   <int> <int> <int> <char> <int> <int> <char> <int> <int>
1:     1     1     5      a     1     0      k     1     0
2:     2     2     6      b     0     1      l     0     1
3:     3     3     7      a     1     0      k     1     0
4:     4     4     8      b     0     1      l     0     1
> (out.dots <- dcast(out1, r + var1 + var2 + index + a + b + indexU ~ indexU, fun = length, value.var.in.dots = TRUE))
Key: <r, var1, var2, index, a, b, indexU>
       r  var1  var2  index     a     b indexU     k     l
   <int> <int> <int> <char> <int> <int> <char> <int> <int>
1:     1     1     5      a     1     0      k     1     0
2:     2     2     6      b     0     1      l     0     1
3:     3     3     7      a     1     0      k     1     0
4:     4     4     8      b     0     1      l     0     1
> dcast(DT, .~label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <num> <num> <num> <num> <num>
1:      .     0     0     0     0     0
> dcast(DT, .~label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <int> <int> <int> <int> <int>
1:      .     1     1     1     1     1
> dcast(DT2, .~label, value.var = "val", sum, value.var.in.dots = TRUE)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <int> <int> <int> <int> <int>
1:      .     4     2     1     3     5
> dcast(DT2, .~label, value.var = "val", length, value.var.in.dots = TRUE)
Key: <.>
        .   Apr   Feb   Jan   Mar   May
   <char> <int> <int> <int> <int> <int>
1:      .     1     1     1     1     1

I include examples with a unique dot to show that they are not modified.

RHS comment

This is solved too:

> (DT = data.table(a=sample(10), b=2013:2014, variable=rep(c("c", "d"), each=10), value=runif(20)))
        a     b variable      value
    <int> <int>   <char>      <num>
 1:     8  2013        c 0.27318542
 2:     6  2014        c 0.94353915
 3:     3  2013        c 0.82729946
 4:     5  2014        c 0.89831191
 5:     7  2013        c 0.78324213
 6:     9  2014        c 0.32990720
 7:     4  2013        c 0.43093672
 8:    10  2014        c 0.90222205
 9:     1  2013        c 0.30670624
10:     2  2014        c 0.97833887
11:     8  2013        d 0.91861714
12:     6  2014        d 0.67360817
13:     3  2013        d 0.39083139
14:     5  2014        d 0.03310679
15:     7  2013        d 0.56087016
16:     9  2014        d 0.94335232
17:     4  2013        d 0.90780129
18:    10  2014        d 0.33455651
19:     1  2013        d 0.91730701
20:     2  2014        d 0.56768771
        a     b variable      value
> dcast(DT, a ~ ... + b, value.var="value")
Key: <a>
        a    c_2013    c_2014    d_2013     d_2014
    <int>     <num>     <num>     <num>      <num>
 1:     1 0.3067062        NA 0.9173070         NA
 2:     2        NA 0.9783389        NA 0.56768771
 3:     3 0.8272995        NA 0.3908314         NA
 4:     4 0.4309367        NA 0.9078013         NA
 5:     5        NA 0.8983119        NA 0.03310679
 6:     6        NA 0.9435392        NA 0.67360817
 7:     7 0.7832421        NA 0.5608702         NA
 8:     8 0.2731854        NA 0.9186171         NA
 9:     9        NA 0.3299072        NA 0.94335232
10:    10        NA 0.9022220        NA 0.33455651
> identical(dcast(DT, a ~ variable + value + b, value.var="value"), dcast(DT, a ~ ... + b, value.var="value", value.var.in.dots=TRUE))
[1] TRUE
> identical(dcast(DT, a ~ variable + value + b, value.var="value"), dcast(DT, a ~ ... + b, value.var="value", value.var.in.RHSdots=TRUE))
[1] TRUE

Summary

What do you think? @jangorecki @tdhock @MichaelChirico @ben-schwen

Thanks!

Finally, if you find suitable to merge this PR (maybe after some modifications):

TODO

  • Include some test (I am not sure the path you follow to include new tests, so I didn't)
  • Include in NEWS.md

@iagogv3 iagogv3 changed the title allow implicit var.value in LHS of formula allow implicit var.value in formula Dec 15, 2023
@tdhock
Copy link
Member

tdhock commented Dec 18, 2023

thanks! this looks good to me so far.
Please add your name to CODEOWNERS.
Also please add a test in inst/tests/tests.Rraw -- preferably something similar (simpler?) to the original example which your issue described. Put it nearby another dcast example with ..., for example test number 1102.18 .
If you put it after that one, then you have to give a new test number which is between that number and the next one (1102.19) so a new test number like 1102.181 would be ok.

@jangorecki
Copy link
Member

jangorecki commented Dec 18, 2023

@tdhock I don't think suggesting for addition to CODEOWNERS is a good idea because being there requires some commitment from a person, and that we should not expect from every contributor, and especially a first time contributor.
What feels ok is something like this

if you are willing to actively maintain dcast function you may consider adding yourself to CODEOWNERS file

Copy link
Member

@jangorecki jangorecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Toby said, we need tests. Now codecov says only 21.73% of diff has been test covered.

@iagogv3
Copy link
Contributor Author

iagogv3 commented Dec 28, 2023

I have just pushed some modifications to the code and added the tests.

As Toby said, we need tests. Now codecov says only 21.73% of diff has been test covered.

Let's see how much of diff is test covered now.

Tell me if something more is needed.

I agree with @jangorecki is not a good idea including me as a CODEOWNER. I do not know most of the code involved in dcast.

@tdhock
Copy link
Member

tdhock commented Dec 29, 2023

Thanks Iago, I don't think anyone active knows most of the dcast code (I certainly do not), and since you have been working on dcast, you probably know it as well as anyone, so I think it would definitely be helpful to the community, if you would be willing to volunteer as reviewer. Again no problem if you do not want to volunteer though.

@tdhock
Copy link
Member

tdhock commented Dec 31, 2023

this looks great to me. I especially like your documentation and examples for the new arguments to dcast.

Comment on lines 3673 to 3674
DT2 = data.table(index = c("a","b"), x = 1:2, y = rnorm(2))
DT3 = data.table(year = c(rep(1986,4), rep(1987,3), rep(1988,2), 1989:1991), continent = rep(c("Europe","Asia"), each = 6), country = rep(c("Sweden","Germany","France","India","Japan","China"), each = 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend avoiding variable names with numbers in them like DT3 and DT3_dcasted2 -- can you please fix by changing numbers to informative names? (something that communicates expectation / intent would be appreciated)
also you may consider changing DT2 to DT, but moving the definition down below to where it is actually used (and after the tests involving the first DT)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I follow both of your suggestions. On one hand, reorganizing the code to have the definitions only when they will be used, and on the other hand with some names a bit more informative. I will push in a few seconds.

Thanks to reviewer Toby Dylan Hocking (@tdhock) and his comments.
R/fcast.R Outdated Show resolved Hide resolved
R/fcast.R Outdated Show resolved Hide resolved
Iago Giné-Vázquez added 2 commits May 2, 2024 12:01
in order to remove the repetition of `vars` redefinition and the
`split_deparsing` calls through multiple `if_else`.

Thanks again to Toby Dylan Hocking (@tdhock) for reviewing and
suggesting this improvement.
R/fcast.R Outdated Show resolved Hide resolved
instead of `isTRUE` and `isFALSE`.
@tdhock
Copy link
Member

tdhock commented May 2, 2024

nice 2- trick! I like the simplification /removal of repetition that you proposed.
Can you please add yourself as a contributor in DESCRIPTION, and add an item to NEWS? see https://github.com/Rdatatable/data.table/blob/master/.github/CONTRIBUTING.md#pull-requests-prs for guidelines about writing NEWS.
Also I added you as a project member, which means you can now create branches in this repo, so in the future, please send PRs from branches in this repo, rather than from your fork. Welcome!

@iagogv3
Copy link
Contributor Author

iagogv3 commented May 3, 2024

@tdhock
I've just added myself as contributor and included a NEWS entry. Thanks for adding me as a project member and for your help and guidance.

@tdhock
Copy link
Member

tdhock commented May 3, 2024

Great thanks! I will merge and then we can see what happens to revdeps tomorrow.

@tdhock tdhock merged commit d19bfef into Rdatatable:master May 3, 2024
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants