Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in dplyr::select() #84

Open
spinkney opened this issue Aug 30, 2023 · 2 comments
Open

Error in dplyr::select() #84

spinkney opened this issue Aug 30, 2023 · 2 comments

Comments

@spinkney
Copy link

I'm not sure how to proceed after getting this error

Simulation 1 resulted in error when post-processing the fit.
Calling `recompute_SBC_statistics` after you've found and fixed the problem could let you move further without refitting
Error in `dplyr::select()`:
! Can't subset columns that don't exist.
✖ Column `variable` doesn't exist.

With this code

library(SBC) # remotes::install_github("hyunjimoon/SBC")
library(cmdstanr)
library(MCMCpack)
library(ggplot2)

library(future)
plan(multisession)
options(SBC.min_chunk_size = 5)

m <- cmdstan_model("softmax.stan")
backend <- SBC_backend_cmdstan_sample(m, chains = 2)

generate_one_dataset <- function(N, K, prior_alpha = 1) {
    x <- rdirichlet(1, alpha = rep(prior_alpha, K))
    observed <- as.integer(rmultinom(1, size = N, prob = x))
    
    list(
        variables = list(x = x),
        generated = list(K = K, observed = observed, prior_alpha = prior_alpha)
    )
}

y <- generate_one_dataset(N = 100, K = 20, prior_alpha = 4)

y_out <- m$sample(
    data = list(
        K = y$generated$K,
        observed = y$generated$observed,
        prior_alpha = y$generated$prior_alpha
    ),
    parallel_chains = 4
)


datasets_flat <- generate_datasets(
    SBC_generator_function(generate_one_dataset, N = 100, K = 20),
    n_sims = 10)


res_flat <- compute_SBC(datasets_flat, backend)

and Stan model

data {
  int K;
  array[K] int<lower=0> observed;
  real<lower=0> prior_alpha;
}

parameters {
  vector[K - 1] y;
}

transformed parameters {
  simplex[K] x = softmax(append_row(y, 0.));
}

model {
  x ~ dirichlet(rep_vector(prior_alpha, K));
  observed ~ multinomial(x);
}
@martinmodrak
Copy link
Collaborator

Thanks for reporting!
The underlying problem is that if you look at datasets_flat$variables, you'll notice that it treats x as 2D array, so the variables are called x[1,1], x[1,2] ... , while the variables in Stan fit are called x[1], x[2]. SBC then finds no shared variables and this leads to problems. This deserves a better error message.

However, you can get the code to work by changing to:

generate_one_dataset <- function(N, K, prior_alpha = 1) {
  x <- as.numeric(rdirichlet(1, alpha = rep(prior_alpha, K)))
  observed <- as.integer(rmultinom(1, size = N, prob = x))
  
  list(
    variables = list(x = x),
    generated = list(K = K, observed = observed, prior_alpha = prior_alpha)
  )
}

Also note that the $generated element of datasets is already in format to be pushed to Stan, so you can simplify your test with:

y_out <- m$sample(
  data = y$generated,
  parallel_chains = 4
)

I'll try to inform messages around this issue...

@spinkney
Copy link
Author

Thanks for the quick response @martinmodrak!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants