Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check_model fails if dependent variable is labelled #727

Closed
sjewo opened this issue May 31, 2024 · 5 comments · May be fixed by easystats/insight#880
Closed

check_model fails if dependent variable is labelled #727

sjewo opened this issue May 31, 2024 · 5 comments · May be fixed by easystats/insight#880

Comments

@sjewo
Copy link

sjewo commented May 31, 2024

Hi there,

i run i a bug with labelled data, similar to #629 .

check_model() will fail if the dependend variable is labelled

library(labelled)
library(performance)
library(see)

var_label(mtcars$wt) <- "Weight (1000 lbs)"
var_label(mtcars$mpg) <- "Miles/(US) gallon"
mtcars$am <- labelled(mtcars$am, c("automatic" = 0, "manual" = 1))

# this variable causes the error
mtcars$mpg <- labelled(mtcars$mpg, c("21" = 21))

m <- lm(mpg ~ wt + cyl + gear + disp + am, data = mtcars)

check_model(m)
> check_model(m)
Error: `check_model()` returned following error: Can't combine `..1` <character> and `..2` <double>.
  
If the error message does not help identifying your problem, another reason why `check_model()` failed might be that models of class `lm` are not yet
  supported.
> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] see_0.8.4            performance_0.11.0.9 labelled_2.13.0     

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       cli_3.6.2         rlang_1.1.3       forcats_1.0.0     haven_2.5.4       generics_0.1.3    glue_1.7.0        colorspace_2.1-0  datawizard_0.10.0
[10] hms_1.1.3         scales_1.3.0      fansi_1.0.6       grid_4.4.0        munsell_0.5.1     tibble_3.2.1      lifecycle_1.0.4   insight_0.19.11   compiler_4.4.0   
[19] dplyr_1.1.4       pkgconfig_2.0.3   rstudioapi_0.16.0 R6_2.5.1          tidyselect_1.2.1  utf8_1.2.4        pillar_1.9.0      magrittr_2.0.3    tools_4.4.0      
[28] gtable_0.3.5      bayestestR_0.13.2 ggplot2_3.5.1    
@sjewo sjewo changed the title check_model fails if depend variable is labelled check_model fails if dependent variable is labelled May 31, 2024
bwiernik added a commit to easystats/insight that referenced this issue Jun 1, 2024
@strengejacke
Copy link
Member

Thanks, should be fixed in insight, which will be submitted to CRAN the next days.

bwiernik added a commit to easystats/insight that referenced this issue Jun 3, 2024
@strengejacke
Copy link
Member

strengejacke commented Jun 5, 2024

@larmarange Is it necessary to preserve haven_labelled and vctrs class attributes when labelled::labelled() is used?

See from ?haven::labelled:

This class provides few methods, as I expect you'll coerce to a standard R class (e.g. a factor) soon after importing.

label(s) attributes can be used for standard R classes, so no need to keep the vectrs class attribute. The latter behaves differently than standard R classes, which can cause errors (like described in this issue), which are a pain to debug (and it's literally not clear to users, where the error comes from - namely, R language behaviour is "broken", and there's not bug in the package's code).

If not really necessary in your package, maybe it's possible to remove the haven_labelled and vctrs class attributes?

@larmarange
Copy link

Hi. labelled::labelled() is identical to haven::labelled()

The labelled package just provides functions to manipulate such vectors.

Such vectors are not intended to be used in a model. They should be transformed into factors with to_factor() or numeric/character vectors with unclass() before modelling (You could also use unlabelled()).

In performance, I do not see the need to support such vectors. In gtsummary, fire example, there is a warning saying to the user if he didn't forget to transform these vectors before analysis.

@larmarange
Copy link

larmarange commented Jun 5, 2024

So the error here is to use a haven_labelled vector in a model. The variable am should have been transformed into a factor to be correctly be considered as categorical by the model.

@strengejacke
Copy link
Member

Yes, I agree. The problem often is that users aren't aware that labelled data can be of classes haven_labelled and vctrs, and thus problems can arise. We fixed this issues in our packages by removing those class attributes whenever model-data is extracted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants