Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep.rownames argument for transpose #3715

Merged
merged 5 commits into from
Aug 8, 2019
Merged

keep.rownames argument for transpose #3715

merged 5 commits into from
Aug 8, 2019

Conversation

MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Jul 20, 2019

Closes #1886
via #3189

I think the logic is basically done. Just want to do some more robustness checks & minor details, e.g.

As of now, the new column is named V1, should we name it rn & bump the others to 1-n? 2-(n+1)?

@mattdowle
Copy link
Member

mattdowle commented Jul 22, 2019

The confusing thing for me on this one is that t(dt) already does keep the row names :

> dt = data.table(x=1:5, y=6:10)
> dt
       x     y
   <int> <int>
1:     1     6
2:     2     7
3:     3     8
4:     4     9
5:     5    10
> t(dt)
  [,1] [,2] [,3] [,4] [,5]
x    1    2    3    4    5    # there are the row names kept (x and y)
y    6    7    8    9   10
> class(t(dt))
[1] "matrix"
>

So the new keep.rownames argument feels more like it should be named return.data.table perhaps.
Also I can see a new request being created in future that DT == t(t(DT)) should be true. That's not true currently because t() returns a matrix. Maybe a new function name like twist, mirror, rotate, invert, or spin to convey that a data.table would be returned.
If one were to rotate a DT 90 degrees clockwise, then the user specifying which column should become the column names would make sense (and that argument would be col.names= or similar).
If one were to rotate a DT 90 degrees anticlockwise, then the column names becoming the first column called rn by default makes perfect sense (but setting col.names=NULL would drop them, perhaps).
I briefly considered screw and unscrew. But I guess a lot of people don't know that screw=clockwise and unscrew=anticlockwise.

In order words, given :

       rn   one   two three
   <char> <int> <int> <int>
1:      x     1     3     5
2:      y     2     4     6

how to specify the following result should be returned by t() :

       rn     x     y
   <char> <int> <int>
1:    one     1     2
2:    two     3     4
3:  three     5     6

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Jul 22, 2019

Also, screw(my_data) sounds like a frustrated Friday evening more than a successful manipulation. and unscrew(my_data) a plea for help 😉

@MichaelChirico
Copy link
Member Author

BTW @mattdowle this PR / issue is for transpose, not t

@mattdowle
Copy link
Member

Oh. We have our own transpose? There doesn't seem to be a data.table method for t().

@MichaelChirico
Copy link
Member Author

transpose is the workhorse of tstrsplit & it's also been exported since then yep

@mattdowle
Copy link
Member

mattdowle commented Jul 22, 2019

As of now, the new column is named V1, should we name it rn

Yes rn feels right to me.

bump the others to 1-n? 2-(n+1)?

Not sure what you mean here, but rn, V1, V2, ... seems right to me just like the original issue showed.

I'm not sure about the argument name keep.rownames. To me the name conveys taking the row names that exist and keeping them. But that's not what this argument does at all. In fact it is to do with keeping column names, not row names. Now that I realize that transpose takes a data.table input and returns a data.table, then rownames don't come into it really. Plus there needs to be another argument to specify which column becomes the column names (if any) and what should that be called? How about changing keep.rownames to names.to.column, and adding column.to.names.

Hm, I'm thinking there is no argument name that easily conveys what it does. How about changing keep.rownames to just rn. In other places in data.table we have an rn column at the beginning so users might easily guess what that means? So rn=TRUE would mean yes-create-rn (i.e. use column names to create the rn column). Then cn="rn" (a 2nd new argument) could be used to do the reverse (i.e. use the "rn" column to create the column names in the result). Using both rn= and cn= at the same time would make sense. rn= could be passed a name too to use instead of "rn"; e.g. user could specify rn="vars".

@mattdowle mattdowle changed the title Closes #1886 -- keep.rownames argument for transpose keep.rownames argument for transpose Jul 22, 2019
@mattdowle mattdowle added this to the 1.12.4 milestone Aug 2, 2019
@codecov
Copy link

codecov bot commented Aug 2, 2019

Codecov Report

Merging #3715 into master will decrease coverage by 0.03%.
The diff coverage is 85.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3715      +/-   ##
==========================================
- Coverage   98.81%   98.78%   -0.04%     
==========================================
  Files          70       70              
  Lines       13373    13393      +20     
==========================================
+ Hits        13215    13230      +15     
- Misses        158      163       +5
Impacted Files Coverage Δ
R/transpose.R 96.29% <100%> (ø) ⬆️
src/transpose.c 94.31% <84.84%> (-5.69%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a8e0230...6fc7309. Read the comment docs.

NEWS.md Outdated

```R
# default 4 threads on a laptop with 16GB RAM and 8 logical CPU
x = sample(c(TRUE,FALSE), 3e8, replace=TRUE) # 1GB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you went 5e8 to 3e8 for ifelse benchmark?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it 1GB. I don't remember changing it from 5e8 to 3e8 though; can't remember -- didn't I change something else at the same time too? Main change was getting away from 100 runs on small size. It can be 5e8 too.

NEWS.md Show resolved Hide resolved
@mattdowle mattdowle removed the WIP label Aug 8, 2019
@mattdowle
Copy link
Member

Follow-up PRs expected to be needed.
I can see coverage failing on Travis but I don't see why. Will merge anyway and see if master checks throw any light on it.

@mattdowle mattdowle merged commit 277e14a into master Aug 8, 2019
@mattdowle mattdowle deleted the transpose_rownames branch August 8, 2019 02:10
@mattdowle
Copy link
Member

Coverage issue seems to be related to covr upgrade yesterday. I filed issue: r-lib/covr#391

@mattdowle mattdowle mentioned this pull request Aug 10, 2019
@MichaelChirico
Copy link
Member Author

covr 3.3.1 is on CRAN now btw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request : transpose(keep.rownames = T)?
3 participants