Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_modify() gains .it and .at #478

Merged
merged 12 commits into from
Feb 7, 2024
Merged

data_modify() gains .it and .at #478

merged 12 commits into from
Feb 7, 2024

Conversation

strengejacke
Copy link
Member

library(datawizard)
data(iris)
# modify at specific positions or if condition is met
d <- iris[1:5, ]
data_modify(d, .at = "Species", .modify = as.numeric)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2       1
#> 2          4.9         3.0          1.4         0.2       1
#> 3          4.7         3.2          1.3         0.2       1
#> 4          4.6         3.1          1.5         0.2       1
#> 5          5.0         3.6          1.4         0.2       1
data_modify(d, .if = is.factor, .modify = as.numeric)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2       1
#> 2          4.9         3.0          1.4         0.2       1
#> 3          4.7         3.2          1.3         0.2       1
#> 4          4.6         3.1          1.5         0.2       1
#> 5          5.0         3.6          1.4         0.2       1

# can be combined with dots
data_modify(d, new_length = Petal.Length * 2, .at = "Species", .modify = as.numeric)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_length
#> 1          5.1         3.5          1.4         0.2       1        2.8
#> 2          4.9         3.0          1.4         0.2       1        2.8
#> 3          4.7         3.2          1.3         0.2       1        2.6
#> 4          4.6         3.1          1.5         0.2       1        3.0
#> 5          5.0         3.6          1.4         0.2       1        2.8

Created on 2024-02-06 with reprex v2.1.0

Copy link
Member Author

@strengejacke strengejacke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etiennebacher what do you think? If you like this idea, I would add tests.

Copy link

codecov bot commented Feb 6, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (9f39fcd) 89.40% compared to head (3869b37) 89.47%.
Report is 1 commits behind head on main.

❗ Current head 3869b37 differs from pull request most recent head 513341c. Consider uploading reports for the commit 513341c to get more accurate results

Files Patch % Lines
R/data_modify.R 94.66% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #478      +/-   ##
==========================================
+ Coverage   89.40%   89.47%   +0.06%     
==========================================
  Files          72       72              
  Lines        5419     5453      +34     
==========================================
+ Hits         4845     4879      +34     
  Misses        574      574              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @strengejacke, I have several comments relative to the docs and error behavior.

I'm also wondering if we should have an additional function to be used in data_modify() so that we can clearly make the distinction between what is passed in ... and the other args (something similar to across() in dplyr). For example, I think that

mtcars |> 
  data_modify(
    x = mpg + 2,
    y = center(drat),
    group_expr(
      .if = is.numeric,
      .modify = as.factor
    )
  )

is more readable than

mtcars |> 
  data_modify(
    x = mpg + 2,
    y = center(drat),
    .if = is.numeric,
    .modify = as.factor
  )

(I just use group_expr() as a placeholder here, I think it should have another name). What do you think?

R/data_modify.R Outdated Show resolved Hide resolved
R/data_modify.R Outdated Show resolved Hide resolved
R/data_modify.R Outdated Show resolved Hide resolved
R/data_modify.R Outdated Show resolved Hide resolved
tests/testthat/test-data_modify.R Show resolved Hide resolved
R/data_modify.R Outdated Show resolved Hide resolved
@strengejacke
Copy link
Member Author

strengejacke commented Feb 6, 2024

Thanks, I addressed all your comments. Regarding group_expr(): I personally think it's easier to handle if we have direct arguments, instead of a nested function. I'm no fan of across(), and usually, .if/.at and .modify are used instead of ..., so in most use cases, you will use one of the two, but not both. Nesting functions (i.e. having functions as arguments) is in my experience highly confusing for "casual" R users (that's why I don't teach tidyverse anymore, it's far too complex for the regular R user outside the "tidyverse" bubble - and I bet the most R users are outside this expert bubble)

@etiennebacher
Copy link
Member

etiennebacher commented Feb 7, 2024

usually, .if/.at and .modify are used instead of ..., so in most use cases, you will use one of the two, but not both

Maybe, but then should we forbid the use of ... when .modify is provided? I'm just trying to think if there would be some confusion arising from the fact that variables defined in ... are not included in the .if/.at arguments. But again clearly explaining this in the function documentation could be enough.

@strengejacke
Copy link
Member Author

Yes, I'm not decided whether to allow newly defined variables in ... in .at or .if - we could easily do this. My thought was that if I would create a new variable in ..., and use .at/.if`, why would I need to include the new variable there, too? What would be the benefit of modifying that I couldn't have done directly?

But maybe we decide on this later, after some experience with this feature?

@strengejacke strengejacke merged commit d129767 into main Feb 7, 2024
22 of 25 checks passed
@strengejacke strengejacke deleted the data_modify_if_at branch February 7, 2024 12:17
@etiennebacher
Copy link
Member

For example you could create/modify variables as usual, including some numeric variables, and then say "ok let's round() all numeric columns". In this case you would have to remember that you have to do that in two different data_modify() calls because doing this in the same data_modify() wouldn't affect the newly created variables.

If we forbid the use of ... and .modify in the same call then you also need two data_modify() calls, but at least you don't have to remember the fact that you need two calls, the error message tells you that directly.

@strengejacke
Copy link
Member Author

hm, makes sense. I think we should change to that default behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants