diff --git a/dev/articles/scoring-rules.html b/dev/articles/scoring-rules.html index ad9fca7f..d904f3f4 100644 --- a/dev/articles/scoring-rules.html +++ b/dev/articles/scoring-rules.html @@ -63,7 +63,7 @@

Scoring rules in `scoringutils`

Nikos Bosse

-

2024-10-14

+

2024-10-22

Source: vignettes/scoring-rules.Rmd
scoring-rules.Rmd
diff --git a/dev/news/index.html b/dev/news/index.html index f398fdba..9387f985 100644 --- a/dev/news/index.html +++ b/dev/news/index.html @@ -61,6 +61,7 @@

score()get_metrics(), which is a a generic with S3 methods for each forecast type. It returns a named list of scoring rules suitable for the respective forecast object. For example, you could call get_metrics(example_quantile). Column names of scores in the output of score() correspond to the names of the scoring rules (i.e. the names of the functions in the list of metrics).
  • Instead of supplying arguments to score() to manipulate individual scoring rules users should now manipulate the metric list being supplied using purrr::partial() and select_metric(). See ?score() for more information.
  • the CRPS is now reported as decomposition into dispersion, overprediction and underprediction.
  • +
  • functionality to calculate the Probability Integral Transform (PIT) has been deprecated and replaced by functionality to calculate PIT histograms, using the get_pit_histogram() function; as part of this change, nonrandomised PITs can now be calculated for count data, and this is is done by default
  • @@ -128,8 +129,6 @@

    Renamed functions“range” was consistently renamed to “interval_range” in the code. The “range”-format (which was mostly used internally) was renamed to “interval”-format
  • Renamed correlation() to get_correlations() and plot_correlation() to plot_correlations()
  • -
  • -pit() was renamed to get_pit() and converted to an S3 method.
  • Deleted functions

    @@ -141,6 +140,8 @@

    Deleted functionsRemoved interval_coverage_sample() as users are now expected to convert to a quantile format first before scoring.
  • Function set_forecast_unit() was deleted. Instead there is now a forecast_unit argument in as_forecast_<type>() as well as in get_duplicate_forecasts().
  • Removed interval_coverage_dev_quantile(). Users can still access the difference between nominal and actual interval coverage using get_coverage().
  • +
  • +pit(), pit_sample() and plot_pit() have been removed and replaced by functionality to create PIT histograms (pit_histogram_sampel() and get_pit_histogram())
  • Function changes

    @@ -335,8 +336,8 @@

    New functions and function cha
  • New function avail_forecasts() allows to visualise the number of available forecasts.
  • New function find_duplicates() to find duplicate forecasts which cause an error.
  • All plotting functions were renamed to begin with plot_. Arguments were simplified.
  • -
  • The function pit() now works based on data.frames. The old pit function was renamed to pit_sample(). PIT p-values were removed entirely.
  • -
  • The function plot_pit() now works directly with input as produced by pit() +
  • The function pit() now works based on data.frames. The old pit function was renamed to pit_sample(). PIT p-values were removed entirely.
  • +
  • The function plot_pit() now works directly with input as produced by pit()
  • Many data-handling functions were removed and input types for score() were restricted to sample-based, quantile-based or binary forecasts.
  • The function brier_score() now returns all brier scores, rather than taking the mean before returning an output.
  • diff --git a/dev/pkgdown.yml b/dev/pkgdown.yml index 09d83084..033bc652 100644 --- a/dev/pkgdown.yml +++ b/dev/pkgdown.yml @@ -1,11 +1,11 @@ pandoc: 3.1.11 pkgdown: 2.1.1.9000 -pkgdown_sha: 71fd32ce95f17b379e57e7e33f2294bc034727fd +pkgdown_sha: 8d9cba19fe597bba170f8d4c99f351491ab4f4d1 articles: Deprecated-functions: Deprecated-functions.html Deprecated-visualisations: Deprecated-visualisations.html scoring-rules: scoring-rules.html -last_built: 2024-10-14T11:46Z +last_built: 2024-10-22T07:21Z urls: reference: https://epiforecasts.io/scoringutils/reference article: https://epiforecasts.io/scoringutils/articles diff --git a/dev/reference/get_metrics.forecast_binary.html b/dev/reference/get_metrics.forecast_binary.html index 782267f5..fd699f88 100644 --- a/dev/reference/get_metrics.forecast_binary.html +++ b/dev/reference/get_metrics.forecast_binary.html @@ -112,7 +112,7 @@

    Examples#> brierscore <- (observed - predicted)^2 #> return(brierscore) #> } -#> <bytecode: 0x55bae1b9b798> +#> <bytecode: 0x55745b0c4370> #> <environment: namespace:scoringutils> #> #> $log_score @@ -123,7 +123,7 @@

    Examples#> logs <- -log(1 - abs(observed - predicted)) #> return(logs) #> } -#> <bytecode: 0x55bae1b9e790> +#> <bytecode: 0x55745a4f9798> #> <environment: namespace:scoringutils> #> get_metrics(example_binary, select = "brier_score") @@ -135,7 +135,7 @@

    Examples#> brierscore <- (observed - predicted)^2 #> return(brierscore) #> } -#> <bytecode: 0x55bae1b9b798> +#> <bytecode: 0x55745b0c4370> #> <environment: namespace:scoringutils> #> get_metrics(example_binary, exclude = "log_score") @@ -147,7 +147,7 @@

    Examples#> brierscore <- (observed - predicted)^2 #> return(brierscore) #> } -#> <bytecode: 0x55bae1b9b798> +#> <bytecode: 0x55745b0c4370> #> <environment: namespace:scoringutils> #>

    diff --git a/dev/reference/get_metrics.forecast_nominal.html b/dev/reference/get_metrics.forecast_nominal.html index dbeee1fe..d3865b28 100644 --- a/dev/reference/get_metrics.forecast_nominal.html +++ b/dev/reference/get_metrics.forecast_nominal.html @@ -105,7 +105,7 @@

    Examples#> logs <- -log(pred_for_observed) #> return(logs) #> } -#> <bytecode: 0x55bade54e758> +#> <bytecode: 0x55745512c190> #> <environment: namespace:scoringutils> #> diff --git a/dev/reference/get_metrics.forecast_point.html b/dev/reference/get_metrics.forecast_point.html index 592be9dd..60dee6b9 100644 --- a/dev/reference/get_metrics.forecast_point.html +++ b/dev/reference/get_metrics.forecast_point.html @@ -143,7 +143,7 @@

    Examples#> { #> return(ae(actual, predicted)/abs(actual)) #> } -#> <bytecode: 0x55bae09b8d68> +#> <bytecode: 0x557457a82fc8> #> <environment: namespace:Metrics> #> diff --git a/dev/reference/get_metrics.forecast_quantile.html b/dev/reference/get_metrics.forecast_quantile.html index 2c6ce73f..c1c70271 100644 --- a/dev/reference/get_metrics.forecast_quantile.html +++ b/dev/reference/get_metrics.forecast_quantile.html @@ -191,7 +191,7 @@

    Examples#> return(reformatted$wis) #> } #> } -#> <bytecode: 0x55bae1608340> +#> <bytecode: 0x5574584c9230> #> <environment: namespace:scoringutils> #> diff --git a/dev/reference/get_metrics.forecast_sample.html b/dev/reference/get_metrics.forecast_sample.html index 795756eb..30d70c7f 100644 --- a/dev/reference/get_metrics.forecast_sample.html +++ b/dev/reference/get_metrics.forecast_sample.html @@ -141,7 +141,7 @@

    Examples#> return(res) #> } #> } -#> <bytecode: 0x55bae2e13a78> +#> <bytecode: 0x55745995c008> #> <environment: namespace:scoringutils> #> #> $dss @@ -150,7 +150,7 @@

    Examples#> assert_input_sample(observed, predicted) #> scoringRules::dss_sample(y = observed, dat = predicted, ...) #> } -#> <bytecode: 0x55bae19cafc8> +#> <bytecode: 0x5574584f32d8> #> <environment: namespace:scoringutils> #> #> $crps @@ -182,7 +182,7 @@

    Examples#> return(crps) #> } #> } -#> <bytecode: 0x55bae04149b8> +#> <bytecode: 0x557456f49308> #> <environment: namespace:scoringutils> #> #> $overprediction @@ -192,7 +192,7 @@

    Examples#> ...) #> return(crps$overprediction) #> } -#> <bytecode: 0x55bae1a63fe0> +#> <bytecode: 0x5574586a19d8> #> <environment: namespace:scoringutils> #> #> $underprediction @@ -202,7 +202,7 @@

    Examples#> ...) #> return(crps$underprediction) #> } -#> <bytecode: 0x55bae1a634b8> +#> <bytecode: 0x5574586a0e78> #> <environment: namespace:scoringutils> #> #> $dispersion @@ -212,7 +212,7 @@

    Examples#> ...) #> return(crps$dispersion) #> } -#> <bytecode: 0x55bae1a62990> +#> <bytecode: 0x5574583e71c8> #> <environment: namespace:scoringutils> #> #> $log_score @@ -222,7 +222,7 @@

    Examples#> scoringRules::logs_sample(y = observed, dat = predicted, #> ...) #> } -#> <bytecode: 0x55bae1a65c98> +#> <bytecode: 0x5574583e66a0> #> <environment: namespace:scoringutils> #> #> $ae_median @@ -234,7 +234,7 @@

    Examples#> ae_median <- abs(observed - median_predictions) #> return(ae_median) #> } -#> <bytecode: 0x55bae18f0080> +#> <bytecode: 0x5574584539c8> #> <environment: namespace:scoringutils> #> #> $se_mean @@ -245,7 +245,7 @@

    Examples#> se_mean <- (observed - mean_predictions)^2 #> return(se_mean) #> } -#> <bytecode: 0x55bae1a64370> +#> <bytecode: 0x5574583e89e8> #> <environment: namespace:scoringutils> #> diff --git a/dev/reference/get_pit_histogram-1.png b/dev/reference/get_pit_histogram-1.png new file mode 100644 index 00000000..d6a51759 Binary files /dev/null and b/dev/reference/get_pit_histogram-1.png differ diff --git a/dev/reference/get_pit_histogram-2.png b/dev/reference/get_pit_histogram-2.png new file mode 100644 index 00000000..b834bc64 Binary files /dev/null and b/dev/reference/get_pit_histogram-2.png differ diff --git a/dev/reference/get_pit_histogram.default.html b/dev/reference/get_pit_histogram.default.html new file mode 100644 index 00000000..9918995a --- /dev/null +++ b/dev/reference/get_pit_histogram.default.html @@ -0,0 +1,8 @@ + + + + + + + + diff --git a/dev/reference/get_pit_histogram.forecast_sample.html b/dev/reference/get_pit_histogram.forecast_sample.html new file mode 100644 index 00000000..9918995a --- /dev/null +++ b/dev/reference/get_pit_histogram.forecast_sample.html @@ -0,0 +1,8 @@ + + + + + + + + diff --git a/dev/reference/get_pit_histogram.html b/dev/reference/get_pit_histogram.html new file mode 100644 index 00000000..db577799 --- /dev/null +++ b/dev/reference/get_pit_histogram.html @@ -0,0 +1,190 @@ + +Probability integral transformation histogram — get_pit_histogram.forecast_quantile • scoringutils + Skip to contents + + +
    +
    +
    + +
    +

    Generate a Probability Integral Transformation (PIT) histogram for +validated forecast objects.

    +

    See the examples for how to plot the result of this function.

    +
    + +
    +

    Usage

    +
    # S3 method for class 'forecast_quantile'
    +get_pit_histogram(forecast, num_bins = NULL, breaks = NULL, by, ...)
    +
    +# S3 method for class 'forecast_sample'
    +get_pit_histogram(
    +  forecast,
    +  num_bins = 10,
    +  breaks = NULL,
    +  by,
    +  integers = c("nonrandom", "random", "ignore"),
    +  n_replicates = NULL,
    +  ...
    +)
    +
    +get_pit_histogram(forecast, num_bins, breaks, by, ...)
    +
    +# Default S3 method
    +get_pit_histogram(forecast, num_bins, breaks, by, ...)
    +
    + +
    +

    Arguments

    + + +
    forecast
    +

    A forecast object (a validated data.table with predicted and +observed values, see as_forecast()).

    + + +
    num_bins
    +

    The number of bins in the PIT histogram. For sample-based +forecasts, the default is 10 bins. For quantile-based forecasts, the +default is one bin for each available quantile. +You can control the number of bins by supplying a number. This is fine for +sample-based pit histograms, but may fail for quantile-based formats. In +this case it is preferred to supply explicit breaks points using the +breaks argument.

    + + +
    breaks
    +

    Numeric vector with the break points for the bins in the +PIT histogram. This is preferred when creating a PIT histogram based on +quantile-based data. Default is NULL and breaks will be determined by +num_bins. If breaks is used, num_bins will be ignored. +0 and 1 will always be added as left and right bounds, respectively.

    + + +
    by
    +

    Character vector with the columns according to which the +PIT values shall be grouped. If you e.g. have the columns 'model' and +'location' in the input data and want to have a PIT histogram for +every model and location, specify by = c("model", "location").

    + + +
    ...
    +

    Currently unused. You cannot pass additional arguments to scoring +functions via .... See the Customising metrics section below for +details on how to use purrr::partial() to pass arguments to individual +metrics.

    + + +
    integers
    +

    How to handle integer forecasts (count data). This is based +on methods described Czado et al. (2007). If "nonrandom" (default) the +function will use the non-randomised PIT method. If "random", will use the +randomised PIT method. If "ignore", will treat integer forecasts as if they +were continuous.

    + + +
    n_replicates
    +

    The number of draws for the randomised PIT for discrete +predictions. Will be ignored if forecasts are continuous or integers is +not set to random.

    + +
    +
    +

    Value

    +

    A data.table with density values for each bin in the PIT histogram.

    +
    +
    +

    References

    +

    Sebastian Funk, Anton Camacho, Adam J. Kucharski, Rachel Lowe, +Rosalind M. Eggo, W. John Edmunds (2019) Assessing the performance of +real-time epidemic forecasts: A case study of Ebola in the Western Area +region of Sierra Leone, 2014-15, doi:10.1371/journal.pcbi.1006785

    +
    +
    +

    See also

    + +
    + +
    +

    Examples

    +
    library("ggplot2")
    +
    +example <- as_forecast_sample(example_sample_continuous)
    +#>  Some rows containing NA values may be removed. This is fine if not
    +#>   unexpected.
    +result <- get_pit_histogram(example, by = "model")
    +ggplot(result,  aes(x = mid, y = density)) +
    +  geom_col() +
    +  facet_wrap(. ~ model) +
    +  labs(x = "Quantile", "Density")
    +
    +
    +# example with quantile data
    +example <- as_forecast_quantile(example_quantile)
    +#>  Some rows containing NA values may be removed. This is fine if not
    +#>   unexpected.
    +result <- get_pit_histogram(example, by = "model")
    +ggplot(result,  aes(x = mid, y = density)) +
    +  geom_col() +
    +  facet_wrap(. ~ model) +
    +  labs(x = "Quantile", "Density")
    +
    +
    +
    +
    + + +
    + + + + + + + diff --git a/dev/reference/index.html b/dev/reference/index.html index 948f5221..feec7079 100644 --- a/dev/reference/index.html +++ b/dev/reference/index.html @@ -349,10 +349,10 @@

    Evaluate forecastsget_pit() + get_pit_histogram() -
    Probability integral transformation (data.frame version)
    +
    Probability integral transformation histogram
    score() @@ -438,7 +438,7 @@

    Lower-level scoring functionspit_sample() + pit_histogram_sample()

    Probability integral transformation for counts
    @@ -515,12 +515,6 @@

    Functions for plotting an
    Plot heatmap of pairwise comparisons

    - plot_pit() - -
    -
    PIT histogram
    -
    - plot_quantile_coverage()
    diff --git a/dev/reference/pit_histogram_sample.html b/dev/reference/pit_histogram_sample.html new file mode 100644 index 00000000..94b752a9 --- /dev/null +++ b/dev/reference/pit_histogram_sample.html @@ -0,0 +1,203 @@ + +Probability integral transformation for counts — pit_histogram_sample • scoringutils + Skip to contents + + +
    +
    +
    + +
    +

    Uses a Probability integral transformation (PIT) (or a +randomised PIT for integer forecasts) to +assess the calibration of predictive Monte Carlo samples.

    +
    + +
    +

    Usage

    +
    pit_histogram_sample(
    +  observed,
    +  predicted,
    +  quantiles,
    +  integers = c("nonrandom", "random", "ignore"),
    +  n_replicates = NULL
    +)
    +
    + +
    +

    Arguments

    + + +
    observed
    +

    A vector with observed values of size n

    + + +
    predicted
    +

    nxN matrix of predictive samples, n (number of rows) being +the number of data points and N (number of columns) the number of Monte +Carlo samples. Alternatively, predicted can just be a vector of size n.

    + + +
    quantiles
    +

    A vector of quantiles between which to calculate the PIT.

    + + +
    integers
    +

    How to handle integer forecasts (count data). This is based +on methods described Czado et al. (2007). If "nonrandom" (default) the +function will use the non-randomised PIT method. If "random", will use the +randomised PIT method. If "ignore", will treat integer forecasts as if they +were continuous.

    + + +
    n_replicates
    +

    The number of draws for the randomised PIT for discrete +predictions. Will be ignored if forecasts are continuous or integers is +not set to random.

    + +
    +
    +

    Value

    +

    A vector with PIT histogram densities for the bins corresponding +to the given quantiles.

    +
    +
    +

    Details

    +

    Calibration or reliability of forecasts is the ability of a model to +correctly identify its own uncertainty in making predictions. In a model +with perfect calibration, the observed data at each time point look as if +they came from the predictive probability distribution at that time.

    +

    Equivalently, one can inspect the probability integral transform of the +predictive distribution at time t,

    +

    $$ +u_t = F_t (x_t) +$$

    +

    where \(x_t\) is the observed data point at time \(t \textrm{ in } t_1, +…, t_n\), n being the number of forecasts, and \(F_t\) is +the (continuous) predictive cumulative probability distribution at time t. If +the true probability distribution of outcomes at time t is \(G_t\) then the +forecasts \(F_t\) are said to be ideal if \(F_t = G_t\) at all times t. +In that case, the probabilities \(u_t\) are distributed uniformly.

    +

    In the case of discrete nonnegative outcomes such as incidence counts, +the PIT is no longer uniform even when forecasts are ideal. +In that case two methods are available ase described by Czado et al. (2007).

    +

    By default, a nonrandomised PIT is calculated using the conditional +cumulative distribution function +$$ + F(u) = + \begin{cases} + 0 & \text{if } v < P_t(k_t - 1) \\ + (v - P_t(k_t - 1)) / (P_t(k_t) - P_t(k_t - 1)) & \text{if } P_t(k_t - 1) \leq v < P_t(k_t) \\ + 1 & \text{if } v \geq P_t(k_t) + \end{cases} +$$

    +

    where \(k_t\) is the observed count, \(P_t(x)\) is the predictive +cumulative probability of observing incidence \(k\) at time \(t\) and +\(P_t (-1) = 0\) by definition. +Values of the PIT histogram are then created by averaging over the \(n\) +predictions,

    +

    $$ +$$

    +

    And calculating the value at each bin between quantile \(q_i\) and quantile +\(q_{i + 1}\) as

    +

    $$ +$$

    +

    Alternatively, a randomised PIT can be used instead. In this case, the PIT is +$$ + u_t = P_t(k_t) + v * (P_t(k_t) - P_t(k_t - 1)) +$$

    +

    where \(v\) is standard uniform and independent of \(k\). The values of +the PIT histogram are then calculated by binning the $u_t$ values as above.

    +
    +
    +

    References

    +

    Claudia Czado, Tilmann Gneiting Leonhard Held (2009) Predictive model +assessment for count data. Biometrika, 96(4), 633-648. +Sebastian Funk, Anton Camacho, Adam J. Kucharski, Rachel Lowe, +Rosalind M. Eggo, W. John Edmunds (2019) Assessing the performance of +real-time epidemic forecasts: A case study of Ebola in the Western Area +region of Sierra Leone, 2014-15, doi:10.1371/journal.pcbi.1006785

    +
    +
    +

    See also

    + +
    + +
    +

    Examples

    +
    
    +## continuous predictions
    +observed <- rnorm(20, mean = 1:20)
    +predicted <- replicate(100, rnorm(n = 20, mean = 1:20))
    +pit <- pit_histogram_sample(observed, predicted, quantiles = seq(0, 1, 0.1))
    +
    +## integer predictions
    +observed <- rpois(20, lambda = 1:20)
    +predicted <- replicate(100, rpois(n = 20, lambda = 1:20))
    +pit <- pit_histogram_sample(observed, predicted, quantiles = seq(0, 1, 0.1))
    +
    +## integer predictions, randomised PIT
    +observed <- rpois(20, lambda = 1:20)
    +predicted <- replicate(100, rpois(n = 20, lambda = 1:20))
    +pit <- pit_histogram_sample(
    +  observed, predicted, quantiles = seq(0, 1, 0.1),
    +  integers = "random", n_replicates = 30
    +)
    +
    +
    +
    + + +
    + + + + + + + diff --git a/dev/reference/scoring-functions-binary.html b/dev/reference/scoring-functions-binary.html index c57d3009..98776cf3 100644 --- a/dev/reference/scoring-functions-binary.html +++ b/dev/reference/scoring-functions-binary.html @@ -147,17 +147,17 @@

    Examplespredicted <- runif(n = 30, min = 0, max = 1) brier_score(observed, predicted) -#> [1] 0.050968633 0.057656155 0.286927444 0.067535387 0.036218014 0.323334572 -#> [7] 0.609439077 0.022618905 0.563870034 0.420850006 0.263388704 0.126243692 -#> [13] 0.815464255 0.065961231 0.551322692 0.655667703 0.170750321 0.601418670 -#> [19] 0.153762411 0.003327989 0.656657070 0.153676995 0.004340033 0.733833002 -#> [25] 0.003680076 0.010143612 0.317605349 0.405392737 0.369087629 0.177242828 +#> [1] 0.169810872 0.319983015 0.720333826 0.136189585 0.539816982 0.573052550 +#> [7] 0.210298251 0.005442733 0.940506004 0.209025175 0.132818958 0.775957259 +#> [13] 0.533583639 0.773922330 0.177404878 0.032952811 0.700006942 0.860101989 +#> [19] 0.010349776 0.035994321 0.423125196 0.650581640 0.408156615 0.150229898 +#> [25] 0.003399837 0.300913133 0.070701554 0.002211134 0.008862522 0.099258707 logs_binary(observed, predicted) -#> [1] 0.25587639 0.27459075 0.76712981 0.30093715 0.21110420 0.84077787 -#> [7] 1.51715864 0.16298472 1.38995224 1.04619879 0.71993087 0.43898241 -#> [13] 2.33335976 0.29682936 1.35677856 1.65932749 0.53310439 1.49393247 -#> [19] 0.49778684 0.05941962 1.66254231 0.49760766 0.06814922 1.94240004 -#> [25] 0.06258165 0.10615583 0.82911550 1.01253898 0.93528454 0.54645655 +#> [1] 0.53116635 0.83395161 1.88865474 0.46051080 1.32697839 1.41470349 +#> [7] 0.61356527 0.07663796 3.49981031 0.61100092 0.45325406 2.12766051 +#> [13] 1.31106851 2.11800405 0.54678893 0.20031742 1.81194692 2.62302230 +#> [19] 0.10728887 0.21037750 1.05119662 1.64292444 1.01852104 0.49036148 +#> [25] 0.06007715 0.79530283 0.30910680 0.04816419 0.09887158 0.37841454