Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

..new is calculated wrong in lencode steps #243

Open
EmilHvitfeldt opened this issue Jan 29, 2025 · 0 comments
Open

..new is calculated wrong in lencode steps #243

EmilHvitfeldt opened this issue Jan 29, 2025 · 0 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@EmilHvitfeldt
Copy link
Member

The unseen levels are calculated based on the mean of the coeficients rather than the mean of global data. This should be fixed to better reflect the literature.

Make sure that the documentation is changed accordingly.

This change will be easily backward compatible as it changes how new values will change only.

data <- data.frame(
  outcome = rnorm(1000) + c(rep(10, 900), rep(0, 100)),
  predictor = c(rep("Big", 900), rep(letters[1:10], each = 10))
)

library(tidyverse)

data |>
  count(predictor)
#>    predictor   n
#> 1        Big 900
#> 2          a  10
#> 3          b  10
#> 4          c  10
#> 5          d  10
#> 6          e  10
#> 7          f  10
#> 8          g  10
#> 9          h  10
#> 10         i  10
#> 11         j  10

data |>
  summarize(
    mean = mean(outcome),
    .by = predictor
  )
#>    predictor        mean
#> 1        Big  9.92621834
#> 2          a -0.12884918
#> 3          b  0.24802560
#> 4          c  0.12339453
#> 5          d  0.33307724
#> 6          e  0.08705590
#> 7          f  0.86433875
#> 8          g  0.42452332
#> 9          h  0.42548890
#> 10         i -0.07257279
#> 11         j -0.67403943

embed:::glm_coefs(y = select(data, outcome), x = pull(data, predictor))
#> # A tibble: 12 × 2
#>    ..level ..value
#>    <chr>     <dbl>
#>  1 a       -0.129
#>  2 b        0.248
#>  3 Big      9.93
#>  4 c        0.123
#>  5 d        0.333
#>  6 e        0.0871
#>  7 f        0.864
#>  8 g        0.425
#>  9 h        0.425
#> 10 i       -0.0726
#> 11 j       -0.674
#> 12 ..new    0.256

mean(data$outcome, trim = 0.1)
#> [1] 9.717217
@EmilHvitfeldt EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

1 participant