consider evaluating predictive cdfs at a vector of category boundaries rather than looping over them #4

elray1 · 2024-01-30T16:42:01Z

looking at this code:

idforecastutils/R/get_pmf_forecasts_from_quantile.R

Lines 91 to 120 in b137497

    
           for (i in 1:(num_cat-1)) { 
        
             truth_df_filtered[["crit_current"]] <- truth_df_filtered[[paste0("crit", i, sep="")]]  
        
             train_temp <- truth_df_filtered |> 
        
               dplyr::group_by(model_id, date, location, horizon, target, target_end_date) |> 
        
               dplyr::summarize( 
        
                 cdf_crit_current = distfromq::make_p_fn( 
        
                   ps = output_type_id, 
        
                   qs = value)(unique(crit_current), log = FALSE) 
        
               ) |> 
        
               dplyr::ungroup() |> 
        
               dplyr::select(-c(target, target_end_date)) 
        
             train_forecasts[[paste0("cdf_crit", i)]] <- train_temp[["cdf_crit_current"]] 
        
           } 
        
           #calculate percentages, correcting for negative numbers 
        
           exp_forecast <- train_forecasts |> 
        
             dplyr::ungroup() |> 
        
             dplyr::rename(reference_date=date, target=target_variable) |> 
        
             dplyr::mutate(cdf_crit0=1, .before=cdf_crit1) 
        
           exp_forecast[[paste0("cdf_crit", num_cat)]] <- exp_forecast[[paste0("crit", num_cat)]] <- 0 
        
           cdf_crit_sum <- 0 
        
           for (i in 1:(num_cat)) { 
        
             if (cdf_crit_sum < 1) { 
        
               exp_forecast[[categories[i]]] <- exp_forecast[[paste0("cdf_crit", i-1)]] - ifelse(exp_forecast[[paste0("crit", i)]] > 0, exp_forecast[[paste0("cdf_crit", i)]], 0) 
        
             } else { 
        
               exp_forecast[[categories[i]]] <- 0 
        
             } 
        
            cdf_crit_sum <- cdf_crit_sum + mean(exp_forecast[[categories[i]]]) 
        
           }

It seems like we should be able to avoid the for loops here by:

evaluating cdfs that come from distfromq at the vector of all endpoints in columns like truth_df_filtered[[paste0("crit", seq_len(num_cat - 1), sep="")]]
appending a 1 to the end of that vector (and maybe a 0 to the beginning)
taking the diff of the result to get from "cumulative category probabilities" to the pmf values
you may need to unnest the results at the end

It may be helpful to split all of that functionality out into a helper function

The text was updated successfully, but these errors were encountered:

elray1 · 2024-02-06T16:06:13Z

You may be able to adapt some ideas in this code: https://github.com/Infectious-Disease-Modeling-Hubs/example-complex-forecast-hub/blob/main/internal-data-raw/create_model_output_data.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider evaluating predictive cdfs at a vector of category boundaries rather than looping over them #4

consider evaluating predictive cdfs at a vector of category boundaries rather than looping over them #4

elray1 commented Jan 30, 2024 •

edited

Loading

elray1 commented Feb 6, 2024

consider evaluating predictive cdfs at a vector of category boundaries rather than looping over them #4

consider evaluating predictive cdfs at a vector of category boundaries rather than looping over them #4

Comments

elray1 commented Jan 30, 2024 • edited Loading

elray1 commented Feb 6, 2024

elray1 commented Jan 30, 2024 •

edited

Loading