paper.qmd

---
title: "The Noisy Work of Uncertainty Visualisation Research: A Review"
author: 
  - name: Harriet Mason
    url: https://harrietmason.netlify.app/
    orcid: 0009-0007-4568-8215
    email:  harriet.m.mason@gmail.com
    affiliation:
    - name: Monash University
      department: Department of Econometrics and Business Statistics
      city: Melbourne
      country: Australia
  - name: Dianne Cook
    url: https://dicook.org
    orcid: 0000-0002-3813-7155
    email: dicook@monash.edu
    affiliation:
    - name: Monash University
      department: Department of Econometrics and Business Statistics
      city: Melbourne
      country: Australia
  - name: Sarah Goodwin
    url: https://www.linkedin.com/in/smgoodwin/
    orcid: 0000-0001-8894-8282
    email: Sarah.Goodwin@monash.edu
    affiliation:
    - name: Monash University
      department: Department of Human Centred Computing
      city: Melbourne
      country: Australia
  - name: Emi Tanaka
    url: https://emitanaka.org/
    orcid: 0000-0002-1455-259X
    email: Emi.Tanaka@anu.edu.au
    affiliation: 
    - name: The Australian National University
      department: Biological Data Science Institute
      city: Canberra
      country: Australia
  - name: Susan VanderPlas
    url: https://srvanderplas.github.io
    orcid: 0000-0002-3803-0972
    email: susan.vanderplas@unl.edu
    affiliation:
    - name: University of Nebraska–Lincoln
      department: Statistics Department
      city:  Lincoln
      country: United States
bibliography: paper.bib
abstract: Uncertainty visualisation is quickly becomming a hot topic in information visualisation. Exisiting reviews in the field take the definition and purpose of an uncertainty visualisation to be self evident which results in a large amout of conflicting information. This conflict largely stems from a conflation between uncertainty visualisations designed for decision making and those designed to prevent false conclusions. We coin the term "signal suppression" to describe a visualisation that is designed for preventing false conclusions, as the approach demands that the signal (i.e. the collective take away of the estimates) is suppressed by the noise (i.e. the variance on those estimates). We argue that the current standards in visualisation suggest that uncertainty visualisations designed for decision making should not be considered uncertainty visualisations at all. Therefore, future work should focus on signal suppression. Effective signal suppression requires us to communicate the signal and the noise as a single "validity of signal" variable, and doing so proves to be difficult with current methods. We illustrate current approaches to uncertainty visualisation by showing how they would change the visual apprearance of a choropleth map. These maps allow us to see why some methods succeed at signal suppression, while others fall short. Evaluating visualisations on how well they perform signal suppression also proves to be difficult, as it involves measuring the effect of noise, a variable we typically try to ignore. We suggest authors use qualitative studies or compare uncertainty visualisations to the relevant hypothesis tests. 
date: last-modified
toc: false
number-sections: true
latex-clean: true
format:
  jasa-pdf:
    keep-tex: true  
    journal:
      blinded: false
  jasa-html: default
fig-valign: bottom
cap-location: bottom
editor_options: 
  chunk_output_type: console
---

```{r}
#| echo: false
#| message: false
#| warning: false


# load Libraries
library(tidyverse)
# devtools::install_github("lydialucchesi/Vizumap")
library(Vizumap)
library(RColorBrewer)
library(scales)
library(sf)
library(ggrepel)
# devtools::install_github("UrbanInstitute/urbnmapr")
library(urbnmapr)
library(flextable)
library(colorspace)
library(rgeos)
```

  
## Introduction
  
From entertainment choices to news articles to insurance plans, the modern citizen is so inundated with information in every aspect of their life it can be overwhelming. 
In the face of this overflow of information, tools that effectively reduce piles of information to simple and clear ideas become more valuable. 
That is, we need tools that can sort the signal from the noise.

Among these summary tools, information visualisations are one of the most powerful as they allow for quick and memorable communication that allow us to identify quirks in our data that we didn't know to look for.
Datasets such as Anscombe's quartet [@anscombe] or the Datasaurus Dozen [@datasaurpkg] show a case where visual statistics highlight elements of the data that are invisible to the typical summary statistics. 
Something as simple as sketching a distribution before recalling statistics or making predictions can greatly increase the accuracy of those measures [@Hullman2018; @Goldstein2014].

"Uncertainty visualisation" is a relatively new field in research. 
Early mentions of uncertainty visualisation start to appear in the late 1980s [@Ibrekk1987], with geospatial information visualisation literature from the early 1990s declaring this to be essential aspect of any information display [@MacEachren1992; @Carr1992]. 
@fig-ibrekk depicts an example of the uncertainty visualisations discussed in these early papers. 
Despite kicking off the field, these papers did not define uncertainty visualisation.
This has led to a lack of consensus on what it means for a graphic to visualise uncertainty, an issue we will return to later.
Therefore, while the field is considered to be quite new, many of the graphics used for uncertainty visualisation have been around for much longer. 
For example, box plots and histograms display variation which becomes synonymous with uncertainty when we are using them to depict the variation of an estimate. 
Today, there are an abundance of publications on the topic which makes it timely to construct a review of the field.
That is, now that there is an overwhelming amount of information, it is valuable to distil it into simple facts. 
In fact, there have already been several reviews published but a central piece of discussion is missing. 

```{r}
#| echo: false
#| message: false
#| warning: false
#| label: fig-ibrekk
#| fig-cap: "A replication of the uncertainty visualisations shown by @Ibrekk1987 in one of the earliest uncertainty visualisation experiments. Several visualisation methods that are now unpopular (such as the pie chart) are used throughout this paper."
#| fig-subcap: 
#|   - "Picture 1"
#|   - "Picture 2"
#|   - "Picture 3"
#|   - "Picture 4"
#|   - "Picture 5"
#|   - "Picture 6"
#|   - "Picture 7"
#|   - "Picture 8"
#|   - "Picture 9"
#| layout-ncol: 3

# Generate data
set.seed(1)
x=rnorm(1000, 8, 4)
ib_data <- tibble(x=ifelse(x<0, -x, x))

# Picture 1
p1 <- ib_data |>
  summarise(avg = mean(x),
          conf_95a = quantile(x, probs=c(0.025)),
          conf_95b = quantile(x, probs=c(0.975))) |>
  ggplot(aes(y="NA")) +
  geom_point(aes(x=avg)) +
  geom_errorbar(aes(xmin = conf_95a, xmax = conf_95b), width = 0.1) +
  scale_x_continuous(name = "INCHES OF SNOW",
                     breaks=seq(0,19),
                     labels= ggplot2:::interleave(as.character(c(seq(0,18, 2), 19)), rep("", 11))[c(0:19, 21)],
                     limits=c(0,19)) +
  theme_classic() +
  theme(axis.line.y=element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        aspect.ratio=1/10)

  
# Picture 2
p2 <- ib_data |>
  mutate(x = ifelse(x>18, 18, x),
         binx = cut(x, breaks=seq(0,18,2))) |>
  group_by(binx) |>
  summarise(n = n()) |>
  mutate(Probability = n / sum(n)) |>
  ggplot(aes(x=binx, y=Probability)) +
  geom_col(fill="black", colour="white") +
  scale_x_discrete(name = "INCHES OF SNOW",
                   labels= paste0(seq(0,16,2), sep = "-", seq(2,18,2))) +
  scale_y_continuous(breaks = seq(0.00, 0.25, 0.05)) +
  theme_classic() + 
  theme(aspect.ratio=0.33)

# Picture 3
p3 <- ib_data |>
  mutate(x = ifelse(x>18, 18, x),
         binx = cut(x, 
                     breaks=seq(0,18,2), 
                     labels= paste0(seq(0,16,2), sep = "-", seq(2,18,2)))) |>
  group_by(binx) |>
  summarise(n = n()) |>
  mutate(Probability = n / sum(n),
         csum = rev(cumsum(rev(Probability))), 
         pos = Probability/2 + lead(csum, 1),
         pos = if_else(is.na(pos), Probability/2, pos)) |>
  ggplot(aes(x="", y=Probability, fill=binx)) +
  geom_bar(stat="identity", width=1) +
  geom_text_repel(aes(y = pos, label = paste0(round(Probability*100), sep="", "%")),
                   size = 3, nudge_x = 0.6, show.legend = FALSE, segment.color = 'transparent') +
  #geom_label(aes(label = paste0(round(Probability*100), sep="", "%")),
  #          position = position_stack(vjust = 0.5)) + 
  scale_fill_grey() +
  coord_polar("y", start=0) +
  labs(fill = "INCHES OF SNOW") + 
  theme_void() + 
  theme(aspect.ratio=1)

# Picture 4
p4 <- ib_data |>
  ggplot(aes(x=x)) +
  geom_density() + 
  scale_x_continuous(name = "INCHES OF SNOW",
                     breaks = seq(0,20,2),
                     labels= paste0(seq(0,20,2))) +
  scale_y_continuous(name = "Probability density",
                     breaks = seq(0.00, 0.20, 0.02)) +
  theme_classic() + 
  theme(aspect.ratio=0.33)


# Picture 5
p5 <- ib_data |>
  ggplot(aes(y="", x=x)) +
  geom_violin() + 
  scale_x_continuous(name = "INCHES OF SNOW",
                     breaks = seq(0,20,2),
                     labels= paste0(seq(0,20,2)),
                     limits=c(0,20)) +
  theme_classic() + 
  theme(axis.line.y=element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        aspect.ratio=0.4)

# Picture 6
set.seed(1)
x=rnorm(5000, 8, 4)
ib_data2 <- tibble(x=ifelse(x<0, -x, x)) |>
  mutate(x=ifelse(x>=18, 18-rexp(5000,rate=0), x))
p6 <- ib_data2 |>
  ggplot(aes(y="", x=x)) +
  geom_jitter(size=0.05) + 
  scale_x_continuous(name = "INCHES OF SNOW",
                     breaks = seq(0,20,2),
                     labels= paste0(seq(0,20,2)),
                     limits=c(0,20)) +
  theme_classic() + 
  theme(axis.line.y=element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        aspect.ratio=0.1)


# Picture 7
p7 <- ib_data2 |>
  arrange(x) |>
  mutate(group = rep(1:50, each=100))|>
  group_by(group) |>
  summarise(x = max(x, na.rm=TRUE)) |>
  add_row(group=c(0,51), x = c(0,20)) |>
  ggplot(aes(x=x)) +
  geom_linerange(ymin = 0.1, ymax = 1) + 
  geom_linerange(y=1, xmin = -0.03, xmax = 20.03)+ 
  geom_linerange(y=0.1, xmin = -0.03, xmax = 20.03)+ 
  scale_x_continuous(name = "INCHES OF SNOW",
                     breaks = seq(0,20,2),
                     labels= paste0(seq(0,20,2)),
                     limits=c(0,20)) +
  scale_y_continuous(limits=c(0,1)) + 
  theme_classic() + 
  theme(axis.line.y=element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        aspect.ratio=0.1)

# Picture 8
p8 <- ib_data |>
  reframe(x = quantile(x, probs=c(0.25, 0.50, 0.75)))|>
  add_row(x = c(0,20)) |>
  arrange(x) |>
  mutate(quantile = c("min", "q1", "med", "q3", "max")) |>
  pivot_wider(names_from = quantile, values_from = x) |>
  ggplot(aes(y="")) +
  #geom_point(aes(x=med)) +
  geom_errorbar(aes(y="", xmin = min, xmax = max), width = 0.2) +
  geom_crossbar(aes(y="", x=med, xmin = q1, xmax = q3), width = 0.5) +
  scale_x_continuous(name = "INCHES OF SNOW",
                     breaks=seq(0,20),
                     labels= ggplot2:::interleave(as.character(c(seq(0,20, 2))), rep("", 11))[1:21],
                     limits=c(0,20)) +
  theme_classic() +
  theme(axis.line.y=element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        aspect.ratio=1/10)


# Picture 9
p9 <- ib_data |>
  ggplot(aes(x)) +
  stat_ecdf(geom = "step") +
  scale_x_continuous(name = "INCHES OF SNOW",
                     breaks=seq(0,20),
                     labels= ggplot2:::interleave(as.character(c(seq(0,20, 2))), rep("", 11))[1:21],
                     limits=c(0,20)) +
  scale_y_continuous(name = "Cumulative probability",
                     breaks=seq(0,1,0.1),
                     labels= seq(0,1,0.1),
                     limits=c(0,1)) +
  theme_classic() +
  theme(aspect.ratio=4/10)

# Display Plots
p1
p2
p3
p4
p5
p6
p7
p8
p9
```


Reviews on uncertainty visualisation rarely offer tried and tested rules for effective uncertainty visualisation, but rather, they comment on the *difficulties* faced when trying to summarise the field. 
@Kinkeldey2014 found most experimental methods to be ad hoc, with no commonly agreed upon methodology, formalisations, or greater goal of describing general principals. 
@Hullman2016 noticed there is a serious noise issue in the data coming from uncertainty visualisation experiments. 
She commented on the prevalence of confounding variables that make it unclear as to what exactly caused a subjects poor performance on a set of particular questions.
Mistakes due to misunderstanding visualisations, misinterpreting questions, and incorrectly applying heuristics are all combined into a single error value.
@Spiegelhalter2017 commented that different plots are good for different things, and disagreed with the goal of identifying a universal best plot for all people and circumstances.
@Griethe2006 did not identify common themes, but instead listed the findings and opinions of a collection of papers. 
@uncertchap2022 summarised several cognitive effects that repeatedly arise in uncertainty visualisation experiments, however these effects were each discussed in isolation as a list of considerations an author might make rather than an overarching theory of rules for effective uncertainty visualisation. 
While these reviews are thorough in scope, none discuss how the existing literature contribute to the broader goal of uncertainty visualisation. 
The problem faced by the literature is easily summarised with a famous quote by Henri Poincaré.

> "Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house." - Henri Poincaré (1905)

That is to say, despite the wealth of reviews, the field of uncertainty visualisation remains a heap of stones. 
There is a mountain of work that identifies common heuristics found in uncertainty visualisations, evaluate competing plot designs, or start a theoretical discussion on a niche aspect of the field.
This is important work that needs to be done, but each of these papers offers up their own bespoke motivation and methodology, with little reference to the uncertainty visualisation papers outside their periphery.
This becomes even more difficult to manage when these studies are in conflict. 
The field is in desperate need of a unifying theory that can tie this swath of research together.
This review attempts to address this issue by offering a novel perspective on the uncertainty visualisation problem.
That is, we are going to use the wealth of established stone to construct the foundations on which we can build a house. 

This review is broken into several parts that each reflect a different approach to uncertainty visualisation. 
First, we look at graphics that ignore uncertainty entirely and discuss why uncertainty should be included at all. 
Second, we look at methods that consider uncertainty to be just another variable and discuss the characteristics of uncertainty that make it a unique visualisation problem. 
Third, we look at methods that explicitly combine our estimate and its uncertainty and discuss the limitations of these approaches. 
Fourth, we discuss methods that implicitly include uncertainty by depicting a sample in place of an estimate. 
Finally, we discuss how uncertainty visualisations can be effectively evaluated.
When discussing each of these methods, we will repeatedly return to the *purpose* of uncertainty visualisation and the effectiveness of each approach in fulfilling that purpose.

### Spatial example
There are far too many uncertainty visualisations to exhaustively discuss them all. 
Instead we focus on the changes made to a single plot, the choropleth map. 
Due to the field's origins and focus in geospatial information visualisation, there have been a large number of suggested variations on the choropleth map that allow authors to include uncertainty. 
Utilising a single example will help isolate the ideas we are trying to convey. 
Therefore, it is important to remember that even though we focus our discussion on the choropleth map, the theoretical approach we outline in this review is useful to all uncertainty visualisations regardless of their application. 
Additionally, our examples focus on incorporating uncertainty through colour manipulation, as that is the key visual channel used in a choropleth map.
However, the methods we discuss go beyond variations in a colour palette.
Even though they are not explicitly shown, visualisations that depict uncertainty using layers such as position or shape, and more complicated graphics that incorporate animation or interactivity are also within the scope of this review.
We will use the choropleth maps as a tool to clearly highlight the costs and benefits of each approach. 

@fig-data shows the first six rows and the geographical boundaries of our data set. 
The temperature variable was generated using a sine wave, that is $Temperature_i = 29-2\cdot{|Latitude_i - \sin{2 \cdot Longitude_i}|}$ where the $Longitude$ and $Latitude$ are the longitude and latitude of the county's centroid scaled to a standard normal distribution. 
Each county's variance is independently, randomly sampled from a uniform distribution. 
The low standard errors are drawn from a $U_{[0,1]}$ distribution, while the high standard errors are drawn from a $U_{[1,2]}$ distribution. 
As we are dealing with an average, the sampling distribution would be approximately normal by the central limit theorem, so each county temperature estimate is assumed to come from a $N(Temp_i, SE_{case,i})$. 
This is the data we will be using in our spatial uncertainty examples for the rest of the paper.

```{r}
#| eval: false
#| echo: false
# Get map data: do this once and save
my_map_data <- get_urbn_map("counties", sf = TRUE) |>
  filter(state_name=="Iowa")
save(my_map_data, file="data/iowa_map.rda")
```

```{r}
#| eval: false
#| echo: false
# Get centroid because rgeos is depreciated
load("data/iowa_map.rda")
centroids <- as_tibble(gCentroid(as(my_map_data$geometry, "Spatial"), byid = TRUE))
my_map_data$cent_long <- centroids$x
my_map_data$cent_lat <- centroids$y
save(my_map_data, file="data/my_map_data.rda")
```


```{r}
#| echo: false
#| message: false
#| warning: false
#| label: fig-data
#| fig-cap: "The first 5 observations of the data used for the spatial uncertainty examples along with the boundaries of each county. The map boundaries are the Iowa county boundaries, however the 'temperature' data is not representative of the average temperature in Iowa. The temperature and standard error represent the average of the daily high temperature and the standard error of that average respectively."
#| fig-subcap: 
#|   - "Data Table"
#|   - "Map Boundaries"
#| layout-nrow: 1
#| cap-location: "bottom"

# get data
load("data/my_map_data.rda")

# seed for sampling
set.seed(1997)

# data dimension for sampling
n <- dim(my_map_data)[1]

# Make palettes
longpal <- rev(sequential_hcl(13, palette = "YlOrRd"))
basecols <- longpal[3:10]
breaks <- 21:29 
breakslong <- 18:32 
names(basecols) <- seq(8)
names(longpal) <- -1:11

my_map_data <- my_map_data |>
  mutate(temp = 29 - 2*abs(scale(cent_lat) - sin(2*(scale(cent_long)))[,1]), # trend
         highvar = runif(n, min=2, max=4), # high variance
         lowvar = runif(n, min=0, max=2), # low variance
         count_id = row_number()) |>
  pivot_longer(cols=highvar:lowvar, names_to = "variance_class", values_to = "variance") |>
  # add bivariate classes to data
  mutate(bitemp = cut(temp, breaks=breaks, labels=seq(8)),
         bivar = cut(variance, breaks=0:4, labels=seq(4)),
         biclass = paste(bitemp, bivar, sep="-"))|>
  mutate(highlight = ifelse(count_id <= 5, TRUE, FALSE))

# Make nice example table
example_table <- my_map_data |>
  mutate(variance = sqrt(variance)) |>
  select(c(count_id, county_name, temp, variance_class, variance)) |>
  as_tibble()|>
  pivot_wider(id_cols=c(count_id, county_name, temp,), 
              names_from = variance_class, 
              values_from = variance) |>
  head(5)|>
  flextable() |>
  set_caption(caption = "Average Daily High Temperatures of Iowa Counties") |>
  add_header_row(colwidths = c(3, 2),
                 values = c("", "Standard Error")) |>
  colformat_double(digits = 2) |>
  set_header_labels(count_id = "ID", 
                    county_name  = "County",
                    temp = "Average Temperature (°C)",
                    highvar = "High",
                    lowvar = "Low") |>
  add_footer_row(values = rep("..."), colwidths = 5) |>
  theme_vanilla() |>
  vline(i=c(1,2), j=3, part="header")|>
  align(align = "left", part = "all") |>
  bg(j = "temp",
    bg = col_numeric(palette = brewer.pal(8, name = "Oranges"),
                     domain = c(21, 30)),
    part = "body"
  ) |>
  bg(j = c("highvar", "lowvar"),
    bg = col_numeric(palette = brewer.pal(8, name = "Greens"),
                     domain = c(0, 3)),
    part = "body"
  ) 


# make blank map 
example_map <- my_map_data |>
  filter(variance_class=="lowvar") |>
  ggplot() +
  geom_sf(aes(geometry = geometry, fill=highlight)) + 
  scale_fill_manual(values=c("white", "#fbfba2")) +
  geom_text(data=filter(my_map_data, highlight==TRUE), aes(x=cent_long, y=cent_lat, label=count_id), size=3) +
  theme_void() +
  theme(legend.position = 'none') 

plot(example_table)

example_map

```


## Ignoring uncertainty
A good place to start might be at a deceptively straight forward question, why should we include uncertainty at all? 

### The choropleth map
@fig-choropleth depicts a choropleth map of the counties of Iowa. 
Each of these counties are coloured according to an estimate of average daily temperature that was generated so that the values followed a clear spatial trend. 
The variance of these estimates were simulated such that the trend accounts for most of the variance in the plot in the low variance case (so we should expect the trend to be visible) while in the high variance, there is more variance within each county than between all the counties, so we should expect it to (at least in some capacity) overwhelm the spatial trend. 
Is this aspect of the data and the spatial data it communicates clear in in the map? 
Is the strength of the trend communicated through the visualisation?

```{r}
#| echo: false
#| message: false
#| warning: false
#| label: fig-choropleth
#| fig-cap: "Two choropleth maps that depict the counties of Iowa where each country coloured acording to a simulated average temperature. Both maps depict a spatial trend, where counties closer to the centre of the map are hotter than counties on the edge of the map. In the low variance condition, the trend accounts for most of the variation in the data, in the high variance case, the variance on the temperature estimate accounts for most of the variance. This distinction is not clear in the map as they both appear identical. The high variance condition displays a spatial trend that could simply be spurious, which means the plot is displaying a false conclusion."
#| fig-subcap: 
#|   - "Low Variance Data"
#|   - "High Variance Data"
#|   - "Choropleth Palette"
#| layout-ncol: 3
#| layout-valign: "bottom"
#| cap-location: "bottom"


# Choropleth Map
p1a <- my_map_data |>
  filter(variance_class=="lowvar") |>
  ggplot() +
  geom_sf(aes(fill = bitemp, 
              geometry = geometry), colour=NA) + 
  scale_fill_manual(values = basecols) +
  #scale_fill_gradientn(colours = basecols, 
  #                     values=breaks/limits[2],
  #                     limits=limits) +
  theme_void() + 
  theme(legend.position = "none")

p1b <- p1a %+% filter(my_map_data, variance_class=="highvar")

show_pal <- function (colours, borders = NULL, cex_label = 1, ncol = NULL, myxlab, breaks, textnudge, xlabx, xlaby, tsize=1.2) {
  # Set dimensions of palette
  n <- length(colours)
  ncol <- ncol %||% ceiling(sqrt(length(colours)))
  nrow <- ceiling(n/ncol)
  # make matrix with null values (if not full)
  colours <- c(colours, rep(NA, nrow * ncol - length(colours)))
  colours <- matrix(colours, ncol = ncol, byrow = TRUE)
  # set graphical parameters (?)
  old <- par(pty = "s", mar = c(0, 0, 0, 0))
  on.exit(par(old))
  size <- max(dim(colours))
  plot(c(0, size), c(0, -size), type = "n", xlab = "", ylab = "", 
      axes = FALSE)
  rect(col(colours) - 1, -row(colours) + 1, col(colours), -row(colours), 
       col = colours, border = borders)
  text(c(0,col(colours)) + textnudge, -c(1,row(colours))-0.25, breaks, 
              cex = 1, col = "black")
  text(xlabx, xlaby, myxlab ,cex = tsize, col = "black")
}

p1a
p1b
show_pal(basecols, ncol=8, borders=NA, myxlab = "Temperature", breaks = 21:29, textnudge = c(0.2, 0.1, 0,0,0,0,0,-0.1,-0.2), xlabx= 4, xlaby=-1.75, tsize=1.5)

```

### Signal-suppression
Uncertainty visualisation is required for transparency. 
The two choropleth maps that appear to be identical in @fig-choropleth highlight the issues with simply electing to ignore uncertainty. 
This sentiment appears frequently in the uncertainty visualisation literature.
Some authors suggest uncertainty is important to include as it communicates the legitimacy (or illegitimacy) of the conclusion drawn from visual inference [@Correll2014; @Kale2018; @Griethe2006]. 
Some authors have said that uncertainty should be included to degree of confidence or trust in the data [@Boukhelifa2012; @Zhao2023]. 
Some authors directly connect uncertainty visualisation to hypothesis testing as it ensures the validity of a statement [@Hullman2020a; @Griethe2006], but allows for a proportional level of trust that is more detailed than the binary results of a hypothesis test [@Correll2014; @Correll2018]. 
Some authors even go so far as to claim that failing to include uncertainty is akin to fraud or lying [@Hullman2020a; @Manski2020].

This consensus leads us to understand that uncertainty visualisation is motivated by the need for a sort of "visual hypothesis test". 
A successful uncertainty visualisation would act as a "statistical hedge" for any inference we make using the graphic. 
Since the purpose of a visualisation is to give a quick gist of the information [@Spiegelhalter2017], this hedging needs to be communicated visually without the need for complicated calculations. 
If we refer to the conclusion we draw from a graphic to be its signal and the variance that makes this signal harder to identify as the "noise", we can summarise the above information into three key requirements. A good uncertainty visualisation needs to:

  1) Reinforce justified signals to encourage confidence in results
  2) Hide signals that are overwhelmed by noise to prevent unjustified conclusions
  3) Perform tasks 1) and 2) in a way that is proportional to the level of confidence in those conclusions.

As @fig-choropleth showed, visualisations that are unconcerned with uncertainty have no issue showing justified signals, but struggle with the display of unjustified signals. 
Therefore, we call this approach to uncertainty visualisation as "signal-suppression" since it primarily differentiates itself from from the normal "noiseless" visualisation approach through criteria (2). 
That is, the main difference between an uncertainty visualisation and a normal visualisation is that an uncertainty visualisation should prevent us from drawing unjustified conclusions. 


### Uncertainty as a signal
Uncertainty visualisation is not only motivated by signal-suppression, and we would be remiss if we did not mention these alternative approaches. 
Some authors claim the purpose of uncertainty is to improve decision making [@Ibrekk1987; @uncertchap2022; @Hullman2016; @Cheong2016; @Boone2018; @Padilla2017]. 
Other authors do not describe uncertainty as important for decision making, but rather explicitly state it as a variable of importance in of itself [@Blenkinsop2000]. 
While uncertainty can provide useful information in decision making, it is important to recognise that the "uncertainty" in these cases is not acting as uncertainty at all. 
It is acting as signal. 

This is obvious for the cases where we are explicitly interested in the variance or error, as we are literally trying to draw conclusions about an statistic that is used to describe uncertainty. 
The same is true for visualisations made for decision making, but it is less overt. 
This is easiest to understand with an example. 

Imagine you are trying to decide if you want to bring an umbrella with you to work. 
An umbrella is annoying to bring with you, so you only want to pack it if the chance of rain is greater than 10%. 
Unfortunately, your weather prediction app only provides you with the predicted daily rainfall. 
Therefore, your decision will be improved with the inclusion of uncertainty. 
This is *not* because uncertainty in general is important for decision making, but because it gives you the tools required to calculate the *actual* statistic you are basing your decision on. 
In this sense, uncertainty is no more special to decision making than weight is special to a body mass index calculation.

This means the uncertainty visualisations that would perform the best in decision making would simply display the uncertainty statistic we are interested in, such as the variance, or probability of an event, using existing visualisation principles. 
This is precisely what we observe in the literature. 
@fig-exceed depicts an exceedance probability map that was designed as an alternative to the choropleth map to improve decision making under uncertainty [@Kuhnert2018; @Lucchesi2021]. 
A keen viewer may notice that the exceedance probability map is actually just a choropleth map, only the statistic being displayed has changed. 
We are not sure it is productive to categorise this visualisation as an uncertainty visualisation.

```{r}
#| echo: false
#| message: false
#| warning: false
#| label: fig-exceed
#| fig-cap: "An exceedance probability map that depict the counties of Iowa where each country coloured acording to the probability that the average temperature exceeds 27. This map is a choropleth map where the variable of interest is a probability."
#| fig-subcap: 
#|   - "Low Variance Data"
#|   - "High Variance Data"
#|   - "Exceedance Probability Map Palette"
#| layout-ncol: 3

# quantile
prob_breaks <- seq(-0.1,1.1, length.out=9)
exeed_data <- my_map_data |> 
  as_tibble() |>
  mutate(xprob = 1- pnorm(27, mean=temp, sd=sqrt(variance))) |>
  mutate(xprob = cut(xprob, breaks=prob_breaks, labels=seq(8)))
             
# Exceed Prob Map
p2a <- exeed_data |>
  filter(variance_class=="lowvar") |>
  ggplot() +
  geom_sf(aes(fill = xprob, 
              geometry = geometry), colour=NA) + 
  scale_fill_manual(values = basecols) +
  theme_void() + 
  theme(legend.position = "none")

p2b <- p2a %+% filter(exeed_data, variance_class=="highvar")

p2a
p2b
show_pal(basecols, ncol=8, borders=NA, myxlab = "P(Temperature>27)", breaks = c(0, prob_breaks[2:8], 1), textnudge = c(0.2, 0.1, 0,0,0,0,0,-0.1,-0.2), xlabx= 4, xlaby=-1.75, tsize=1.2)
```

There seem to be two different definitions of uncertainty visualisation floating around in the literature. 
The first considers *any* visualisation of error, variance, or probability to be an uncertainty visualisation. 
The second believes an uncertainty visualisation is the output of a function that takes a normal visualisation as an input, and transforms it to include uncertainty information. 
The former group believe the purpose of uncertainty visualisation to provide signal about a distribution, while the later believe it should act as noise to obfuscate a signal. 
The lack of explicit distinction between these two motivations leaves the literature muddled and reviewers struggle to understand if uncertainty should be treated as a variable, as metadata, or as something else entirely [@Kinkeldey2014]. 
This disagreement creates constant contradictions in what the literature considers to be an uncertainty visualisation. 
For example @Leland2005 mentions that popular graphics, such as pie charts and bar charts omit uncertainty, and @Wickham2011 suggests their product plot framework, which includes histograms and bar charts, should be extended to include uncertainty. 
However, pie charts, bar charts and histograms have all been used in a significant number of uncertainty visualisation experiments [@Ibrekk1987; @Olston2002; @Zhao2023; @Hofmann2012]. 
If you view an uncertainty visualisation as a function applied to an existing graphic, then you would not see a pie chart or bar chart as uncertainty visualisations.
These charts are they are yet to have the uncertainty visualisation function applied to them. 
If you view an uncertainty visualisation as any graphic that depicts a statistic then there are no limitations on which graphics can or cannot be uncertainty visualisations. 

When we use the term uncertainty visualisation to refer to graphics that simply communicate a variance or probability, we are classifying visualisations by the data they display, not their visual features. 
Graphics, just like statistics, are not defined by their input data. 
A scatter plot that compares mean and a scatter plot that compares variances are both scatter plots. 
Given that there is no special class of visualisation for *other* statistics (such the median or maximum) there is no reason to assume visualisations that simply depict a variance, error, or probability to be special. 
Some authors implicitly suggest that visualisations of variance or probability are differentiated due to the psychological heuristics involved in interpreting uncertainty [@Hullman2019].
While it is true that heuristics lead people to avoid uncertainty [@Spiegelhalter2017], there is no evidence that this psychological effect translates to issues with the visual representation of uncertainty. 
Again, given that we do not make these same visual considerations for other variables that elicit distaste or irrational behaviour, there is no reason to assume this is what makes uncertainty visualisation so special. 

This leads us to the conclusion that the visualisations made for the purpose of displaying information about uncertainty statistics are not uncertainty visualisations. 
These graphics are just normal information visualisations, and authors can follow existing principles of graphical design. 
We focus on the perspective that uncertainty visualisation serves to obfuscate signal, and an uncertainty visualisation is a variation on an existing graphic that gives it the ability to suppress false signals. 

Of course, there is nothing wrong with explicitly visualising variance, error, bias, or any other statistic used to depict uncertainty as a signal. 
Just like any other statistic, these metrics provide important and useful information for analysis and decisions. 
However, there is no interesting visualisation challenge associated with these graphics, and they do not require any special visualisation techniques. 
The uncertainty in these graphics are acting as a signal variable, and they should be treated as such. 

## Visualising uncertainty as a variable
Upon hearing that uncertainty needs to be included for transparency, the solutions may seem obvious. 
You may think "well, I will just add a dimension to my plot that includes uncertainty". 
This is a reasonable approach. 
The simplest way to add uncertainty to an existing graphic is to simply map uncertainty to an unused visual channel. 
However, it is unclear if this approach is sufficient for our purposes. 

### The bivariate map
@fig-bivariate depicts a variation of the choropleth map, where we have a two dimensional colour palette. 
In this graphic, temperature is still mapped to hue, but the variance is included by utilising colour saturation. 
While these two maps *do* look visually different (which was not the case in the choropleth map) the spatial trend is still clearly visible in both graphics. 
This means the uncertainty *is technically* being communicated, however the main message of the graphic is still the spatial trend (that may not exist). 
The graphic did not suppress the invalid signal, so it is not performing signal-suppression as we would like. 
At this point, it might be reasonable to ask, why? 
Why is including the uncertainty as a variable insufficient to achieve signal-suppression, and what changes should we make to ensure signal-suppression occurs?

```{r}
#| echo: false
#| message: false
#| warning: false
#| label: fig-bivariate
#| fig-cap: "A bivariate map that depict the counties of Iowa where each county is coloured acording to it's average daily temperature and the variance in temperature. This map is a choropleth map with a two dimensional colour palette where temperature is represented by colour hue, and variance is represented by colour saturation. Even though uncertainty has been added to the graphic the spatial trend is still clearly visible in the high variance case."
#| fig-subcap: 
#|   - "Low Variance Data"
#|   - "High Variance Data"
#|   - "Bivariate Palette"
#| layout-ncol: 3
#| layout-valign: "bottom"

# Bivariate Map
# Make bivariate palette
# Function to devalue by a certain amount
colsupress <- function(basecols, hue=1, sat=1, val=1) {
    X <- diag(c(hue, sat, val)) %*% rgb2hsv(col2rgb(basecols))
    hsv(pmin(X[1,], 1), pmin(X[2,], 1), pmin(X[3,], 1))
}

# recurvisely decrease value
v_val = 0.5
bivariatepal <- c(basecols,
                  colsupress(basecols, sat=v_val),
                  colsupress(colsupress(basecols, sat=v_val), sat=v_val),
                  colsupress(colsupress(colsupress(basecols, sat=v_val), sat=v_val), sat=v_val))
# establish levels of palette
names(bivariatepal) <- paste(rep(1:8, 4), "-" , rep(1:4, each=8), sep="")

# Bivariate maps
p2a <- my_map_data |>
  filter(variance_class=="lowvar") |>
  ggplot() +
  geom_sf(aes(fill = biclass, geometry = geometry), colour=NA) + 
  scale_fill_manual(values = bivariatepal) +
  theme_void() + 
  theme(legend.position = "none")
  
p2b <- p2a %+% filter(my_map_data, variance_class=="highvar")

show_pal2 <- function (colours, borders = NULL, cex_label = 1, ncol = NULL, myxlab, myylab, breaks, breaks2, tsize1=1.2, tsize2=1.2) {
  # Set dimensions of palette
  n <- length(colours)
  ncol <- ncol %||% ceiling(sqrt(length(colours)))
  nrow <- ceiling(n/ncol)
  # make matrix with null values (if not full)
  colours <- c(colours, rep(NA, nrow * ncol - length(colours)))
  colours <- matrix(colours, ncol = ncol, byrow = TRUE)
  # set graphical parameters (?)
  old <- par(pty = "s", mar = c(0, 0, 0, 0))
  on.exit(par(old))
  size <- max(dim(colours))
  plot(c(-1.5, size), c(0, -size), type = "n", xlab = "", ylab = "", 
      axes = FALSE)
  rect(col(colours) - 1, -row(colours) + 1, col(colours), -row(colours), 
       col = colours, border = borders)
  text(c(0,col(colours)[nrow,]) + c(0.2, 0.1, 0,0,0,0,0,-0.1,-0.2) , -4.5, 
       breaks, cex = 1, col = "black")
  text(-0.25, -c(0,row(colours)[,ncol]) + c(-0.2, -0.1, 0, 0.1, 0.2),
       breaks2, cex = 1, col = "black")
  text(4, -5.5, myxlab ,cex = tsize1, col = "black")
  text(x=-1.25,y=-2, myylab, srt=270, cex = tsize2, col = "black")
}

p2a
p2b
show_pal2(colours = bivariatepal, ncol=8, borders=NA, myxlab = "Temperature", myylab = "Variance", breaks = 21:29, breaks2 = 0:4)
```

### Why this approach may (or may not) work
The difficulty in incorporating uncertainty into a visualisation is frequently mentioned but seldom explained. 
For example @Hullman2016 commented that it is straightforward to show a value but it is much more complex to show uncertainty but did not explain why. 
Many authors seem to believe uncertainty visualisation is a simple high-dimensional visualisation problem, as the difficulty comes from working out how to add uncertainty into already existing graphics [@Griethe2006]. 
While this is part of the problem in uncertainty visualisation, it is not the complete picture. 
@fig-bivariate makes it clear that simply including uncertainty as a variable is insufficient to perform signal-suppression. 
If we cannot treat uncertainty the same as we would any other variable, how should we treat it? 
We need to understand what uncertainty actually *is*, in order to understand how to integrate it into a visualisation.

#### It's a variable... it's metadata... it's uncertainty?
Describing what uncertainty actually is, is surprisingly hard. 
Most authors simply avoid the problem and describe the characteristics of uncertainty, of which there are plenty. 
Often, uncertainty is split using an endless stream of ever changing boundaries, such as whether the uncertainty is due to true randomness or a lack of knowledge [@Spiegelhalter2017; @Hullman2016; @utypo], if the uncertainty is in the attribute, spatial elements, or temporal element of the data [@Kinkeldey2014], whether the uncertainty is scientific (e.g. error) or human (e.g. disagreement among parties) [@Benjamin2018], if the uncertainty is random or systematic [@Sanyal2009], statistical or bounded [@Gschwandtnei2016; @Olston2002], recorded as accuracy or precision [@Griethe2006; @Benjamin2018], which stage of the data analysis pipeline the uncertainty comes from [@utypo], how quantifiable the uncertainty is [@Spiegelhalter2017; @utypo], etc. 
There are enough qualitative descriptors of uncertainty to fill a paper, but, none of this is particularly helpful in understanding how to integrate it into a visualisation. 
Rather than trying to define uncertainty by looking at the myriad ways in which it *does* appear in an analysis, we may find it easier to look at where it *does not*.

Descriptive statistics describe our sample as it is and summarises large data down into an easy to swallow format. 
Descriptive statistics are not seen as the primary goal of modern statistics, however, this was not always the case. 
In 19th century England, *positivism* was the popular philosophical approach to science (positivists included famous statisticians such as Francis Galton and Karl Pearson). 
Practitioners of the approach believed statistics ended with descriptive statistics as science must be based on actual experience and observations [@Otsuka2023]. 
In order to make statements about population statistics, future values, or new observations we need to perform inference, which requires the assumption of the "uniformity of nature", that is, we need to assume that unobserved phenomena should be similar to observed phenomena [@Otsuka2023]. 
Positivists believed referencing the unobservable was bad science. 
In other words, these scientists embraced descriptive statistics due to the inherent certainty that came with them.
Since uncertainty is non-existent in descriptive statistics, it is clear that uncertainty is a by-product of inference.

This history lesson illustrates what uncertainty actually is. 
At several stages in a statistical analysis, we will violate the uniformity of nature assumption. 
Each of these violations will impact the statistic we have calculated and push it further from the population parameter we wish to draw inference on. 
Uncertainty is the amalgamation of these impacts. 
If we do not violate the uniformity of nature assumption at any point in our analysis, we do not have any uncertainty.

This interpretation of uncertainty indicates that the uncertainty is not a variable of importance in of itself. 
Uncertainty is metadata about our statistic that is required for valid inference.
This means uncertainty should not be visualised by itself and we should seek to display signal and uncertainty together as a "single integrated uncertain value" [@Kinkeldey2014]. 
This aspect of uncertainty visualisation makes it a uniquely difficult problem. 

#### Visualising the "single integrated uncertain value"
Typically, when making visualisations, we want the visual channels to be separable. 
That is, we don't want the data represented through one visual channel to interfere with the others [@Smart2019]. 
Mapping uncertainty and signal to separable channels allows them to be read separately, which does not align with the goal of communicating them as a single integrated channel. 
Visualising uncertainty and signal separately allows the uncertainty information to simply be ignored, which is a pervasive issue in current uncertainty visualisation methods [@uncertchap2022]. 
We can see this problem in @fig-bivariate, as it sends the message "this data has a spatial trend and the estimates have a large variance" as we read the signal and the uncertainty separately.

This means effective uncertainty visualisation should be leveraging integrability. 
That is, the visual channels of the uncertainty and the signal would need to be separately manipulable, but read as a single channel by the human brain. 
While most visual aesthetics *are* separable, there are some variables that have been shown to be integrable, such as colour hue and brightness [@Vanderplas2020].
When visualising uncertainty using its own visual channel, we can also consider visual semiotics and make sure to map uncertainty to intuitive visual channels, such as mapping more uncertain values to lighter colours [@Maceachren2012].
Unfortunately relying on integrability may not give us the amount of control we want over our signal-suppression. 
Without a strong understanding of how these visual channels collapse down into a single channel, relying on integrability could create unintended consequences such as displaying phantom signals or hiding justified signals. 
Additionally, multi-dimensional colour palettes can make the graphics harder to read and hurt the accessibility of the plots [@Vanderplas2015]. 

There is another benefit to mapping uncertainty to saturation that is not directly related to integrability. 
As the saturation decreases colours become harder to distinguish. 
This means high uncertainty values are harder to differentiate than low uncertainty values. 
We can leverage this implicit feature of colour value by transforming the visual feature space ourselves.

## Combining uncertainty and signal in a transformed space
Instead of hoping that uncertainty might collapse signal values into a single dimension, we can do some of that work ourselves. 
As a matter of fact, some uncertainty visualisation authors already have.

### Value Suppressing Uncertainty Palettes
The Value Suppressing Uncertainty Palette (VSUP) [@Correll2018], was designed with the intention of preventing high uncertainty values from being extracted from a map.
Since the palette was designed with the extraction of individual values in mind and it has only been tested on simple value extraction tasks [@Correll2018] or search tasks [@Ndlovu2023], it is unclear how effective the method is at suppressing broader insights such as spatial trends. 

@fig-vsup is a visualisation of the Iowa temperature data using a VSUP to colour the counties. 
The low uncertainty case still has a visible spatial trend, while the spatial trend in the high uncertainty map has functionally disappeared. 
This means the VSUP has successfully suppressed the spatial trend in the data.
However, the spatial trend may not be the only signal of concern in our graphic.
Now we must return to the original signal-suppression criteria and ask ourselves if they have all been met. 
Are all the justified signals reinforced, while all the unjustified signals are suppressed? 
Is a graphic that performs perfect signal-suppression even possible?

```{r}
#| echo: false
#| message: false
#| warning: false
#| label: fig-vsup
#| fig-cap: "A map made with a VSUP. The counties of Iowa are coloured acording to its average daily temperature and the variance in temperature. Similar to the bivariate map, temperature is mapped to hue while variance is mapped to saturation. Unlike the bivariance map, the colour space we are mapping our variables to has been transformed so that high variance estimates are harder to discern from each other. This map successfully reduces the visibility of the spatial trend in the high uncertainty case while maintaining the visibility of the spatial trend in the low uncertainty case."
#| fig-subcap: 
#|   - "Low Variance Data"
#|   - "High Variance Data"
#|   - "VSUP Palette"
#| layout-ncol: 3
#| layout-valign: "bottom"

# VSUP
# Function to combine colours for VSUP
colourblend <- function(basecols, p_length, nblend) {
    X <- rgb2hsv(col2rgb(unique(basecols)))
    v1 <- X[,seq(1,dim(X)[2], 2)]
    v2 <- X[,seq(2,dim(X)[2], 2)]
    if("matrix" %in% class(v1)){
      # hue issue wrap around pt 1
      v3 <- (v1+v2)
      v3["h",] <- ifelse(abs(v1["h",]-v2["h",])>0.5, v3["h",]+1, v3["h",])
      v3 <- v3/2
      # hue issue wrap around pt 2
      v3["h",] <- ifelse(v3["h",]>=1 , v3["h",]-1 ,v3["h",])
      hsv(rep(v3[1,], each=nblend), rep(v3[2,], each=nblend), rep(v3[3,], each=nblend))
      } else {
        v3 <- (v1+v2)
        v3["h"] <- ifelse(abs(v1["h"]-v2["h"])>0.5, v3["h"]+1, v3["h"])
        v3 <- v3/2
        v3["h"] <- ifelse(v3["h"]>=1 , v3["h"]-1 ,v3["h"])
        rep(hsv(h=v3[1], s=v3[2], v=v3[3]), p_length)
        }
}

VSUPfunc <- function(basecols, p_length, nblend){
  colourblend(colsupress(basecols, sat=0.5), p_length, nblend)
}

# VSUP
p = length(basecols)
VSUP <- c(basecols,
          VSUPfunc(basecols, p, 2),
          VSUPfunc(VSUPfunc(basecols, p, 2), p, 4),
          VSUPfunc(VSUPfunc(VSUPfunc(basecols, p, 2), p, 4), p, 8))

names(VSUP) <- paste(rep(1:8, 4), "-" , rep(1:4, each=8), sep="")

# VSUP maps
p3a <- my_map_data |>
  filter(variance_class=="lowvar") |>
  ggplot() +
  geom_sf(aes(fill = biclass, geometry = geometry), colour=NA) + 
  scale_fill_manual(values = VSUP) +
  theme_void() + 
  theme(legend.position = "none")

p3b <- p3a %+% filter(my_map_data, variance_class=="highvar")

p3a
p3b
show_pal2(colours = VSUP, ncol=8, borders=NA, myxlab = "Temperature", myylab = "Variance", breaks = 21:29, breaks2 = 0:4)

```


### What can and cannot be suppressed?
The methods used by the VSUP bring to light a slight problem with uncertainty visualisation. 
Specifically, uncertainty and the purpose of visualisation are somewhat at odds with one another. 
There are two primary motivations behind visualisation, communication and exploratory data analysis (EDA). 
Communication involves identifying a signal we want to communicate and designing a visualisation that best conveys that, while EDA involves creating a versatile visualisation and using it to extract several signals. 
If we are designing an uncertainty visualisation for communication then we can just suppress the specific signal we are seeking to communicate. 
In the map example, we would consider @fig-vsup to be a success as the only signal we are concerned with is the spatial trend. 
However, it is not uncommon for authors to express a desire for uncertainty visualisations that perform signal-suppression in visualisations made for EDA [@Sarma2024; @Griethe2006]. 
For uncertainty visualisation for EDA to work, we would need to assume that suppressing individual estimates using their variance should naturally extend to broader suppression of plot level insights. 
Unfortunately, it is unclear how reliably this would work.

#### There is no uncertainty in EDA
Earlier we established that uncertainty is a by-product of inference, which means without inference, there is no uncertainty. 
Often EDA is used to give us an understanding of our data and identify which signals are worth pursuing. 
In this sense, EDA is the visual parallel to descriptive statistics, as it is performed without an explicit hypothesis, which means there is no inference, and by extension, there is no uncertainty. 

Some authors recognise inference will always occur (in some shape or form) and believe uncertainty *should* be visualised but do not recognise *how* uncertainty would be visualised. 
@Hullman2021 argued that there is no such thing as a "model-free" visualisation, therefore all visualisations require uncertainty as we are always performing inference. 
While it is true that we can think of visualisations as containing implicit inferential properties, there are many potential inferences in any single visualisation. 
This makes it a little difficult to ensure uncertainty is always included.
For example, if we have a visualisation that shows an average, we would need to identify if the signal suppression should be performed using the sampling variance or the sample variance [@Hofman2020].
The distribution we use depends on the inferential statistic but until the viewer chooses one and thinks about it (which isn't easily observable), the particular variety of uncertainty which would need to be displayed can't be calculated.
This means the ideal uncertainty visualisation should not only meet the signal suppression requirements, but should also endeavor to be versatile enough to meet those requirements for all the signals displayed in the graphic.


#### The limitations of explicitly visualising uncertainty and signal
The lack of versatility of the VSUP is easy to see with a simple example. 

Let's say we have a graphic that depicts a set of coefficients from a linear regression and the value of the coefficient is shown using a single colour. 
We want to know "Which of these coefficients are different from 0?" as well as "Which of these coefficients are different from each other?". 
To answer this question we do a series of $t$-tests on these estimates. 
All of the individual $t$-tests of fail to reject the null hypothesis that the coefficients are different from 0. 
We then make a visualisation that suppresses this signal and ensures that all of the estimates are visually indistinguishable from 0. 
Next, we conduct two sample $t$-test and find that several of the values need to be visually distinguishable from each other. 
The VSUP method must pick a single colour for each estimate, and these colours must be *either* visually distinguishable or indistinguishable from each other. 
We cannot perform signal-suppression on both these signals simultaneously.

This example highlights a fundamental problem with the VSUP that extends to the bivariate map as well. 
When we blend these colours, we need to decide at what level of *uncertainty* to blend these colours together. 
Even though the bivariate map does not explicitly combine colour values at certain variance levels, the mapping of variance to colour saturation does this implicitly. 
That is, at certain saturation values the colours in a bivariate map are imperceptibly different and appear as though they are mapped to the same value.
At this point, it is irrelevant whether or not the colours are technically different, they are the same colour in the human brain. 
This is of course complicated by the fact that human colour perception varies at an individual level. 
Some women are believed to have four different cone cells which allows them to perceive a greater range of colours, while others only have one or two types of cone cells and have colour deficiencies [@simunovic2010] 
For VSUP to function for all individuals, we must calibrate each plot to an individual's ability to perceive colour.

If we only use a single colour to express each signal-suppressed statistic, we will always need to decide which signals we suppress and which we do not. 
This issue has already been raised in the literature. 
Which hypothesis are suppressed and which are not largely depends on the method used to combining colours in the palette [@Kay2019]. 
The VSUP in @fig-vsup used the tree based method that was used by @Correll2018, but there are alternatives that are more appropriate for different hypothesis. 

Uncertainty visualisation for EDA would be possible if we designed a plot in such a way that suppressing individual estimates using their variance would naturally extend to broader suppression of plot level insights. 
This assumption is commonly made by visualisation researchers in normal visualisation experiments [@North2006]. 
If we could express the statistic of a cell using multiple colours, this limitation may disappear entirely.


## Implicitly Combining Uncertainty and Signal
Rather than trying to figure out how to combine signal and uncertainty into a single colour, we can just display a sample instead and allow the viewer to extract *both* the estimate and the variance.

### Pixel map
@fig-pixel displays a pixel map [@Lucchesi2021], which is a variation of the choropleth map where each area is divided up into several smaller areas, each coloured using outcomes from the larger area's (i.e. the county's) average temperature sampling distribution. 
The spatial trend is clearly visible in the low variance case, but functionally disappears in the low variance case. 
While the spatial trend is just barely visible in the high uncertainty case, it is much harder to see. 
This means the graphic also achieves the third criteria for signal-suppression, i.e. our difficulty in seeing the distribution is proportional to the level of uncertainty in the graphic. 

```{r}
#| eval: false
#| echo: false
# Make + save pixel map (in case of depreciation)
# Low variance map
my_map_data_a <- my_map_data |>
  filter(variance_class == "lowvar") |>
  mutate(my_id = seq(n),
         error = variance) 

# quantile
q_a <- my_map_data_a |> 
  as.tibble() |>
  mutate(bitemp=as.numeric(bitemp)) |>
  with(data.frame(p0.05 = qnorm(0.05, mean=bitemp, sd=sqrt(variance)),
                  p0.25 = qnorm(0.25, mean=bitemp, sd=sqrt(variance)), 
                  p0.5 = qnorm(0.5, mean=bitemp, sd=sqrt(variance)), 
                  p0.75 = qnorm(0.75, mean=bitemp, sd=sqrt(variance)), 
                  p0.95 = qnorm(0.95, mean=bitemp, sd=sqrt(variance))))


pixel_1a <- my_map_data_a |>
  as.data.frame() |>
  select(my_id, bitemp, error) |>
  read.uv(estimate="bitemp", error="error")

pixel_2a <- my_map_data_a |> as("Spatial")

pix_a <- pixelate(pixel_2a, pixelSize = 70, id = "my_id")

pmap_a <- build_pmap(data = pixel_1a, distribution = "discrete", pixelGeo = pix_a, id = "my_id", border = pixel_2a, q=q_a)

p4a <- view(pmap_a) +
  geom_path(
      data = pmap_a$bord,
      aes_string(x = 'long', y = 'lat', group = 'group'),
      colour = "white"
    )  +
  scale_fill_gradientn(colours = longpal) +
  scale_colour_gradientn(colours = longpal) +
  theme(legend.position="none")

# High variance
my_map_data_b <- my_map_data |>
  filter(variance_class == "highvar") |>
  mutate(my_id = seq(n),
         error = variance) 

q_b <- my_map_data_b |> 
  as.tibble() |>
  mutate(bitemp=as.numeric(bitemp)) |>
  with(data.frame(p0.05 = qnorm(0.05, mean=bitemp, sd=sqrt(variance)),
                  p0.25 = qnorm(0.25, mean=bitemp, sd=sqrt(variance)), 
                  p0.5 = qnorm(0.5, mean=bitemp, sd=sqrt(variance)), 
                  p0.75 = qnorm(0.75, mean=bitemp, sd=sqrt(variance)), 
                  p0.95 = qnorm(0.95, mean=bitemp, sd=sqrt(variance))))

pixel_1b <- my_map_data_b |>
  as.data.frame() |>
  select(my_id, bitemp, error) |>
  read.uv(estimate="bitemp", error="error")

pixel_2b <- my_map_data_b |> 
  as("Spatial")

pix_b <- pixelate(pixel_2b, pixelSize = 70, id = "my_id")

  
pmap_b <- build_pmap(data = pixel_1b, distribution = "discrete", pixelGeo = pix_b, id = "my_id", border = pixel_2b, q=q_b)

p4b <- view(pmap_b) +
  geom_path(
      data = pmap_b$bord,
      aes_string(x = 'long', y = 'lat', group = 'group'),
      colour = "white"
    )  +
  scale_fill_gradientn(colours = longpal) +
  scale_colour_gradientn(colours = longpal) +
  theme(legend.position="none")

save(p4a, file="data/p4a.rda")
save(p4b, file="data/p4b.rda")
```

```{r}
#| echo: false
#| message: false
#| warning: false
#| label: fig-pixel
#| fig-cap: "A pixel map of the counties of Iowa. In this map, each county is broken up into several small areas and coloured according to a potential daily temperature, given the its average daily temperature and its sampling distribution. This results in each county being represented by a sample rather than a single value. In this graphic, we can clearly see the spatial trend in the low variance case, while the spatial trend is much harder to identify in the high variance case. "
#| fig-subcap: 
#|   - "Low Variance Data"
#|   - "High Variance Data"
#|   - "Pixel Map Palette"
#| layout-ncol: 3
#| layout-valign: "bottom"

# load pixel maps
load("data/p4a.rda")
load("data/p4b.rda")
# Display plots and palettes
p4a
p4b
show_pal(longpal, ncol=13, borders="white", myxlab = "Temperature", breaks = 19:32, textnudge = c(0.2, 0.1, 0,0,0,0,0,0,0,0,0,0,-0.1,-0.2), xlabx= 6.5, xlaby=-2.2)
```

It is clear that the pixel-map is not only suppressing the false information, but it is doing so by simulating *more* information. 
The efficacy of this method means that visualisations of simulated samples pop up repeatedly in the literature, with examples including samples that are animated over time [@Hullman2015; @Blenkinsop2000], pixel-maps, and spaghetti time series plots. 
Not only does this method help readers understand the plot level gist, it is also unlikely to damage the viewers ability to extract individual estimates.
Extracting global statistics, such as the mean or variance, from a sample can be done with relative ease, especially when those values are mapped to colour [@Franconeri2021]. 
Additionally, @Ndlovu2023 found that participants applied the same methods they used for simple choropleth maps to complicated uncertainty maps even if that take away was invalid. 
The pixel map can be read the same way as a choropleth map, which allows us to leverage that automatic response. 
Therefore, the pixel map performs signal suppression, without sacrificing the viewers ability to extract general statistics, unless those statistics *should* be harder to extract due to the uncertainty in the value. 
So, what is it about @fig-pixel that allows it to be such a successful uncertainty visualisation?
Is this a question we can easily answer?

### Show me the data
To thoroughly understand why pixel maps seem to successfully perform signal suppression, we would need a more thorough discussion than what can be offered here. 
However, this does not prohibit us from discussing potential reasons for the graph's effectiveness. 
There is a second type of uncertainty that the pixel map quietly communicates but we have not touched on yet, and this uncertainty is also conveyed by visualisations of our raw data. 
That is, these graphics can show assumption violations.

As we discussed in previous sections, we can consider uncertainty to be "the amalgamations of the impacts of violations to the assumption of the uniformity of nature". 
It's not a definition that rolls off the tongue, but we can work with it.
Thankfully, this definition of uncertainty aligns nicely with all the concepts that are included in the uncertainty umbrella. 
Some works [@Hullman2018; @Maceachren2012; @Thomson2005] focus narrowly on specific terms with mathematical definitions, such as probability, confidence intervals, variance, error, or precision. 
These works are only concerned with quantifying the final impact of uncertainty on our statistics. 
That is, how large should the bound around our statistic be, such that our "true" statistic can be inferred. 
Others [@Griethe2006; @Leland2005; @Pang1997; @Pham2009; @Boukhelifa2017] include broader loosely related elements, such as missing values, reliability, model validity, or source integrity. 
These broader and harder to quantify concepts are concerned about potential sources of uncertainty, that is, they describe violations to the assumption of the uniformity of nature. 

What this means is that we have two types of uncertainty, but one is more "processed" than the other. 
Quantifiable uncertainties are just assumption violations expressed as an effect on our final statistic (or true variation of our data). 
The disconnect between quantifiable and unquantifiable uncertainty creates a huge problem for authors trying to visualise it.
A survey of visualisation authors cited "not knowing how to calculate uncertainty" as one of the primary reasons they did not include it in visualisations [@Hullman2020a]. 

There are two reasons we might leave our uncertainty as an assumption violation rather quantifying the effect. 
The first reason is that we may be unable to translate the assumption violation to a quantifiable uncertainty. 
There is no blanket rule that allows us to reliably quantify all uncertainty for every statistic, although some researchers have tangled with the idea. 
For example, @Thomson2005 suggests a mathematical formula for *examples* of uncertainty, and information theory tries to quantify uncertainty using the idea of entropy, but they ignore the disconnect between the broad concept of uncertainty and what we can reliably quantify. 
Some authors don't believe that it is even possible to quantify all the assumption violations [@Spiegelhalter2017]. 

The second reason to leave uncertainty as a potential violation of our assumptions, is that we might not know the final statistic we are seeking to calculate. 
This is the case for visualisations made for EDA, and a large number of developments in EDA visualisation have been in displaying these difficult to quantify violations. 
For example, @Tierney2023 builds upon the tidy data principles to allow users to handle missing values. 
This includes data plots with a missing value "shadow" that allows visualisation authors to identify if the variables used in a plot have any structure in their missing values, which would contribute to uncertainty.

With this understanding it becomes clear to see why uncertainty is tied to an endless string of examples in the data analysis pipeline. 
Uncertainty examples include imputed data, model selection, inherent randomness, biased sampling, etc., not because these things *are* uncertainty, but because they *create* uncertainty when we perform inference. 
Whether or not these elements are relevant is highly dependent on what statistic you are trying to draw inference on, and by extension, the purpose of your visualisation.

This relationship between uncertainty and the purpose of our analysis is littered throughout the literature. 
Multiple authors have commented on the need to consider quantifying and expressing uncertainty at every stage of a project as the goal shapes every step of the analysis [@Kinkeldey2014; @Hullman2016; @Refsgaard2007]. 
@Otsuka2023 suggested that the process of observing data to perform statistics is largely dependent on our goals, because the process of boiling real world entities down into probabilistic objects (or "probabilistic kind" as he puts it) depends on the relationship we seek to identify with our data. 
@Meng2014 commented what is kept as data and what is tossed away is determined by the motivation of an analysis and what was previously noise can be shown to become signal depending on the question we seek to answer.  
@Wallsten1997 argue that the best method for evaluating or combining subjective probabilities depends on the uncertainty the decision maker wants to represent and why it matters. 
@Fischhoff2014 looks at uncertainty visualisation for decision making decides that we should have different ways of communicating uncertainty based off what the user is supposed to do with it.  

This makes it very difficult to move quantified uncertainty through the layers of our analysis, especially when designing a visualisation for EDA. 
If we don't know what the final statistic is, we cannot quantify the effects of our assumptions. 
Therefore, it is often the case that the best uncertainty visualisation is not an uncertainty visualisation at all, but simply the most accurate depiction of our raw data.
This does not mean that visualising raw data will always prevent insignificant signal from getting through. 
@Chowdhury illustrated how groups that appear linearly separable in a linear discriminant analysis (LDA) visualisation of the data can actually be the result of a LDA performed on too many variables, something that was not clear from the visualisation until the line-up protocol was implemented. 
However, just showing the data is a simple but effective option for uncertainty visualisation that seems to be largely overlooked. 

Of course, visualising the raw data is not always possible nor suitable.
For example, if we have a specific statistic in mind, it may be more appropriate to show the sampling distribution of that statistic rather than the variability of all observations.
If we visualise that sampling distribution in a way that allows us to see it's shape, and therefore recognise any assumption violations (such as a non-normal distribution), we maintain some of the benefits that came with visualising the raw data. 
This is exactly what the pixel map does, and likely contributes to it's success as a signal suppression method.

## Evaluating uncertainty visualisations
If we want to make conclusions about how effective any uncertainty visualisation method is we need to look at the results of evaluation experiments. 
Unfortunately the illustrative methods we have used thus far, i.e. showing a graphic and saying "wow look at this", is lacking if we want any serious results.
However, despite the abundance of uncertainty visualisation evaluation experiments, existing literature reviews have struggled to synthesise them into any common rules [@Kinkeldey2014; @Hullman2016].

Here we discuss common evaluation methods, why these methods might struggle to create a cohesive set of recommendations for uncertainty visualisations, and consider how to evaluate visualisations on their ability to perform signal-suppression.

### Current methods
Including uncertainty in a visualisation comes with many secondary benefits.
Examples of these benefits include better decisions, more trust in the results and the ability to extract additional statistics, such as the variance.
Ultimately, these secondary benefits are not the primary goal of uncertainty visualisation, and evaluating uncertainty visualisations on these criteria often has unintended consequences. 

#### Value extraction of uncertainty statistics
Uncertainty visualisations are often evaluated based on how accurately [@Hullman2019] viewers can separately extract the estimate and its variance [@Kinkeldey2014]. 
This means a significant chunk of evaluation studies boil down to showing a participant a visualisation and asking questions such as "what is the variance of $X$?", or "what is the mean of $X$?". 
This seems like a relatively straight forward approach, and it is similar to how non-uncertainty visualisations are evaluated, but is this appropriate for uncertainty visualisations? 
The role of uncertainty is rarely evaluated in these studies as the graphics are often compared on the basis of being uncertainty visualisations [@Ibrekk1987; @Hullman2015; @Hofman2020], a class that has no established definition. 
By shifting the focus of our inference from $\bar{X}$ to $Var(X)$ or $P(X)$ we end up evaluating visualisations on their ability to convey uncertainty statistics, rather than on uncertainty's ability to suppress statistics. 
This leads to a series of experiments where the uncertainty is evaluated as a signal even if that was not the goal of the experiment.

The problem with evaluating uncertainty as a signal are identical to the problems associated with displaying uncertainty as a signal. 
There is no reason to assume uncertainty would behave any differently to any other variable when we evaluate them in this way. 
For example, @Ibrekk1987 found that participants were more accurate at extracting a statistic when it could be directly read off the graphic, than when it required an area estimate (which is the case if using the probability density function), or when there was no visual indicator for the statistic at all (which is the case when using the cumulative distribution function of a skewed random variable). 
@Hullman2015 found that a visualisation that allows viewers to count outcomes to estimate a probability outperformed one that required a complicated area calculation. 
@Hofman2020 and @Zhang2022 found that participants were better at answering questions about prediction intervals when shown a prediction interval instead of a sampling distribution. 
@Gschwandtnei2016 found that graphics where the required statistic could be directly read off the plot outperformed those that involved guesswork due to a gradually decreasing line. 
@Cheong2016 found that participants made better decisions when they were explicitly given the relevant probability in text rather than when they needed to read it off a map. 
It is well established that extracting information from a graphic using a perceptual task will always be less accurate than explicitly reading the value provided in text form [@Cleveland1984].

The biggest failing of this evaluation method is not the predictable outcomes, but that it encourages us to see successful examples of signal-suppression as a failing. 
@Blenkinsop2000 commented that visually integrable depictions of uncertainty should be avoided, as they decrease the viewers confidence in their extracted data values. 
This conclusion is antithetical to the goals of signal-suppression and occurs because these methods evaluate uncertainty as a signal, not as noise. 

#### Trust, confidence, and risk aversion
Trust is a by-product of displaying uncertainty so it is commonly measured in uncertainty evaluation studies [@Hullman2019]. 
Considering trust, and not transparency, as the metric of importance in uncertainty communication can lead to a questionable subtext that argues against transparency, something that has been noticed by several other authors [@Spiegelhalter2017; @ONeill2018]. 
Science communication should be primarily concerned with accuracy. 
This accuracy may not always involve showing the data exactly as it is, for example we may want to adjust our graphic to correct for visual heuristics, but these changes should still be in pursuit of accuracy.
Setting trust as the variable of interest implicitly encourages statisticians to set trust as the primary goal of communication. 
Evaluating visualisations on trust conflates trust and transparency and ultimately discourages signal-suppression. 
We can see this effect pop up in the results of evaluation studies. 
For example, @Zhao2023 found that participants were more trusting of model estimates with low uncertainty, but this effect did not carry over to estimates with high uncertainty. 
Despite decreased trust being a desired outcome of signal suppression, the author's discussion implied this result was not desired. 
This perspective ends up extending to visualisation authors as well.
@Hullman2020a found that authors simultaneously argued that failing to visualise uncertainty was akin to fraud, but also many avoided uncertainty visualisation because they didn't want their work to come across as untrustworthy. 
This is a classic example of the negative impacts of placing direct importance on *trust* rather than *transparency*. 
In cases of high uncertainty, authors will opt to leave out uncertainty information because it decreases confidence in the authors conclusions. 
This is the end result of designing uncertainty visualisations for increased trust, rather than signal-suppression.
Of course, this does not mean measuring trust or perceived trustworthiness has no place in visualisation experiments.
After all, if participants political motivations can influence their ability to accurately read a plot [@nurse2020], then their level of trust in the information could have a similar effect.
However, it does mean trust should not be the primary goal of uncertainty visualisation evaluation experiments.

Another metric that is similar to trust is the participants confidence in their decision or extracted value. 
Confidence has many of the same issues as trust, but it has an additional confounding factor. 
In non-uncertainty visualisation evaluation experiments, confidence is used as a proxy for the clarity of the visualisation. 
Confidence cannot simultaneously be a measure of clarity of visualisation *and* a way to capture the uncertainty expressed in a visualisation. 
Uncertainty visualisations conflate these two measures.

Risk-aversion is another secondary effect of uncertainty visualisation that is used to evaluate uncertainty visualisations [@Hullman2019]. 
Risk aversion is an economics term used to describe an agent who would chose a random variable with a lower expected payout because it also has a lower variance. 
Risk aversion is considered irrational behaviour because it is a deviation from the behaviour of a rational agent. 
Comparing participant responses to that of a rational agent has even been suggested as a benchmark for uncertainty visualisation experiments [@wu2023rational].
However, just like trust, evaluating graphics on risk-aversion discourages signal-suppression. 
Rational agents *by definition* should *ignore* uncertainty information. 
Risk aversion is considered to be irrational because it means the economic agent *is* considering the uncertainty information.
The only case where a rational agent should not ignore the uncertainty information is when the uncertainty is signal, not noise. 
Designing graphics that encourage choices that align with that of a rational agent, is to encourage graphics that do not include uncertainty at all.

For these reasons we do not believe trust, confidence, or risk-aversion are useful measures to evaluate uncertainty visualisations. 
While they are designed to capture the secondary effects of uncertainty, using them as primary measures of visualisation performance is in direct conflict with designing visualisations for signal-suppression.

#### Questions that attempt to capture signal-suppression
There is a collection of studies that seem to be aware of the issues behind using trust or value extraction to evaluate uncertainty visualisations. 
These studies try to measure the effect of some kind of single integrated value, but the methodology is often ad-hoc with varying levels of success. 

The first method is what we call the "vague question" approach. 
The authors of these studies will ask the participants a question that implies that they should use the uncertainty information, but it is unclear how. 
The authors then go on to compare the readers answer to a very specific ground truth. 
This means the participants are being *evaluated* as though they are performing a value extraction task, but they are not being asked the *questions* that are asked in the value extraction task. 
Ultimately this approach results in strangely cryptic questions that create a large amount of noise due to the varied interpretation of the questions [@Hullman2016]. 

For example @Hofmann2012 showed study participants 20 plots, where each plot displayed two distributions (as a pair of jittered samples, density plots, histograms, or box plots) and asked them to identify the plot where "the blue group is furthest to the right". 
The mean of the two groups was used to decide which group was actually "furthest to the right" and the different graphical approaches were evaluated against that ground truth. 
In another example @Ibrekk1987 asked participants for the "best estimate", but the participant's responses were evaluated against the mean of the distribution.
Ultimately, the term "best" is up to the users interpretation, and the estimate that minimised the sum of squared errors was not implied by the question. 
This vague question approach leads to inconclusive results, as we are left unclear if it was the phrasing of the question or the plot design that caused the participants to answer incorrectly.

A variation of the vague question problem, is a series of studies that ask questions that are impossible to answer using the information given to the participants. 
These studies frequently expect participants to give deterministic answers for probabilistic questions. 
For example @Correll2014 showed participants the distribution of voter preferences for two candidates in an election, and asked them "how likely is candidate B to win the election?". 
Participants were not able to answer the question about likelihood in term of probability, but were instead given seven options from 1 = "Outcome will be most in favour of A" to 7 = "Outcome will be most in favour of B". 
The ground truth statistic for this question was a scalar multiple of Cohen’s d, indicating participants were supposed to incorporate uncertainty information using a very specific formula that was likely unknown to them but assumed to be used implicitly.  
In another example, @Padilla2017 provided participants with a visualisation of the cone of uncertainty and asked then to "decide which oil rig will receive more damage based on the depicted forecast of the hurricane path". 
The cone of uncertainty provides a 60% confidence interval for the location of the eye of a hurricane, which allows us to know the area where the eye of the storm will go, but it does not given any information about the intensity of a storm, the size of a storm, or even if a location will be hit. 
Other authors have commented on the complexity of communicating hurricane risk because the path, storm surge and wind speed are all important and cannot be ignored [@Spiegelhalter2017]. 
@Padilla2017 seem to be aware of the issues in the visualisation and are using the user study to show that current government communications are not the best choice when communicating hurricane risk, as it is unreasonable to assume that "more likely to be hit" should automatically translate to "more damage". 
While this is useful to highlight, the authors end up caught by their own trap, as answering the questions in their second study correctly required participants to assume an area being more likely to be hit *does* translate to receiving more damage.
The assumptions required to answer these deterministic questions are often invisible, even to the authors of the paper, which means they often create more problems than they solve when they are used as a tool for graphical evaluation.
Similar to the vague question problem, these studies seem to be aware that we should be evaluating an uncertainty visualisation based on a deterministic observation (such as our example of "does this map have a spatial trend") but are unsure how to incorporate or evaluate it. 

While these approaches are certainly a step in the right direction, the experiments end up having far too much noise in their results. 
This is further complicated by the fact that visual statistics and their computational equivalents often differ, even before we consider these measurement errors.
By having a ground truth, these experiments implicitly ask users to treat uncertainty as a signal which goes against the purposes of signal-suppression. 

### Testing signal suppression
If we cannot ask direct questions about uncertainty, or measure the secondary effects of uncertainty, or ask indirect questions about the uncertainty; then how are we supposed to evaluate uncertainty visualisations? 
How do you measure something that disappears the second you look directly at it?
To evaluate uncertainty visualisations, we need an experimental design that evaluates uncertainty as noise, not as signal. 
This means we need to measure *uncertainty's impact on the signal*, not the uncertainty itself.  

#### Comparing to hypothesis tests
The most obvious way to evaluate uncertainty visualisations is to compare the visualisations to statistical tests. 
If a graphic was performing signal-suppression we would expect the signal to be harder to see at higher levels of uncertainty. 
The ideal outcome is a an uncertainty visualisation where the signal is only perceivable when it would be identified by a hypothesis test. Evaluating visualisations as though they are akin to hypothesis tests is a well established concept in visualisation. 
The lineup protocol is a good example of this approach.
Lineups are a confirmatory visualisation tool that can be used to check if perceived patterns are real or merely the result of chance [@Buja2009; @Wickham2010]. 
Lineups can also be used to evaluate the effectiveness of different types of plots [@Hofmann2012] and design decisions [@vanderplas2017].
The motivation behind the lineup protocol is similar to the motivations behind signal-suppression, although the lineup protocol is more explicitly tied to a specific hypothesis. 
@Patrick2023 compared standard statistical tests to the lineup-protocol, and evaluated the visualisations using the power curves that are typical for hypothesis testing. 
In a similar vein, @Kim2019 investigated how different uncertainty visualisation methods influenced user's prior beliefs, and evaluated the graphics by comparing their results to those from Bayesian inference. 
This approach is similar to the lineup protocol as it also evaluates a visualisation by comparing it to an analogous statistical calculation, however it does so using a different statistical philosophy. 
The connection with visualisation and statistical tests even extends to EDA, where @Hullman2021 commented that a visualisation can be considered a model check where the null hypothesis is the expected and we are looking for the unexpected.
This is all to say that comparing visualisations to hypothesis tests is not uncharted territory.

We could use a similar evaluation method to evaluate uncertainty visualisations, however, it is not entirely clear what the computational equivalent of an uncertainty visualisation is.
Even the lineup protocol, which is much closer to a standard statistical test, runs into some difficulty when trying to directly compare it to its computational equivalent.
@Patrick2023 found that human viewers using the lineup protocol are less sensitive to deviations from the null hypothesis than standard statistical tests.
However, this finding does not directly translate to the lineup protocol being worse than its computational equivalent.
The lineup protocol allows viewers to detect if "something" is not right even if that something is unspecified, or what the authors intended.
For example, @vanderplas2017 did an experiment where participants were drawn to plots with unequal cluster sizes, even though the size of the cluster was not one of the features the authors were testing.
While standard hypothesis tests may outperform the lineup protocol on power, computational tests have the benefit of knowing exactly what they are looking for. 
Visualisations are often explicitly made when we *don't* know what we are looking for.
Additionally, lineup protocols often have a specific hypothesis that is used to create the null distribution, while uncertainty visualisations are intended to be more flexible and can be created without identifying a specific null distribution.
Therefore, while we could evaluate uncertainty visualisations by comparing them to a selection of hypothesis tests, we should not expect the same level of sensitivity that we get in standard hypothesis tests as the assumed knowledge about our data is quite different.
This approach could give us a good idea of the use cases of different uncertainty visualisation designs, but we should be mindful of it's limitations in fully describing the effectiveness of these graphics. 


#### Qualitative Studies
Alternatively visualisation research could shift away from the accuracy concept all together and ask questions that allow for open ended responses. This method can enlighten authors as to *how* the uncertainty information was used by the participants. 
@Hofmann2012 captured this by asking participants why they considered a particular plot to be more right-shifted, even though this qualitative analysis did not make it into the final paper. 
@Daradkeh2015 presented participants with ten investment alternatives and asked participants "from among available alternatives, which alternative do you prefer the most", and were asked to think aloud and consider the uncertainty in their decision making. 
The experimenters goal was to observe and organise the methods people use when making decisions in the face of uncertainty. 
They highlighted the specific aspects of uncertainty that participants typically considered, such as the range of outcomes that are above/below a certain threshold, minimum and maximum values, the risk of a loss, etc., and identified where in the decision making process participants made these considerations. 

#### Heuristics
While experiments that explicitly identify heuristics in current methods are not technically measuring signal-suppression, they are still a useful consideration to keep in mind when designing experiments for signal suppression. 
Heuristic checks look at unknown pitfalls that might exist in interpretation of current plots [@Hullman2016] and can change depending on the larger scope of the graphic and the population we are communicating with [@Spiegelhalter2017; @Kinkeldey2014]. 

Several heuristics are of particular importance for uncertainty visualisations, and are likely to impact how well different methods performing signal suppression. 
The sine illusion can cause the confidence interval of a smoothed sine curve to seem wider at the peaks than the troughs, causing us to underestimate uncertainty associated with changing values [@Vanderplas2015]. 
Points that were on an outcome of an ensemble display were perceived as more likely than points not on an outcome, even when the point that was not on an outcome was closer to the centre of the distribution (and therefore more likely) [@Padilla2017]. 
This can be considered an extension of the within bar bias, where participants looking at a bar chat with error bars view outcomes within the bar as more likely than those outside it [@Newman2012]. 
Several studies have found that viewers use a heuristic where they compare the distance between two estimates to estimate if they are different, and in doing so, ignore the uncertainty information [@uncertchap2022].

These heuristics have the potential to create noise in signal suppression evaluations, and should be kept in mind when designing an evaluation experiment.


## Conclusions and Future work
This paper has identified gaps in the uncertainty visualisation literature that must be filled for the field to progress. 
We formalised the uncertainty visualisation problem, and in doing so highlighted an untouched area of uncertainty visualisation research.

*Experimental practices on uncertainty visualisation need to be standardised.* 
Uncertainty visualisations for decision making treat uncertainty as signal, while visualisations for signal-suppression treat uncertainty as noise. 
As the literature currently exists, there is no way to combine papers to get a meaningful sense of how uncertainty information is understood by a viewer.
Researchers need to ensure that when they identify the motivation behind their visualisation technique, and that their evaluation methods match the motivations of the paper. 
Additionally *evaluation methods that evaluate uncertainty as noise need to be developed.*

*Experimenters should consider evaluating visual aesthetics on how well they suppress information in a graphic.* 
Research into separability and integrability is of particular interest to uncertainty visualisation, as it allows interference from the uncertainty variable. 
When designing experiments, authors often choose aesthetics that are visually distinguishable; uncertainty visualisation authors should consider doing the opposite. 

*Software that allows users to easily perform signal suppression should be developed.* 
Existing uncertainty visualisation methods view a distribution as its own object and there are no software options for the "an uncertainty visualisation is a function of an existing visualisation" philosophy.  

Signal suppression is an untouched area of visualisation research and developing methods for the practice may require us to challenge our entire notion of what makes a good visualisation.
We are interested to see how this challenge plays out.

## Acknowledgements

The first author of this paper is supported in part by a scholarship from the the Australian Energy Market Operator. 
The R packages were used for this work were: `tidyverse` [@tidyverse], `Vizumap` [@Vizumap], `RColorBrewer` [@RColorBrewer], `scales` [@scales], `sf` [@sf], `ggrepel` [@ggrepel], `urbnmapr` [@urbnmapr], `flextable` [@flextable], `colorspace` [@colorspace], and `rgeos` [@rgeos]. 
The GitHub repository for this paper can be found at https://github.com/harriet-mason/ARSA-UncertaintyLitReview which contains the files required to reproduce this article in full.

## Bibliography

```{r, include=FALSE, eval=FALSE}
library(spelling)
qmd <- "paper.qmd"
ignore <- readLines("WORDLIST")
check_spelling <- spell_check_files(
  qmd,
  ignore = ignore,
  lang = "en_GB"
)
if (nrow(check_spelling) > 0) {
  print(check_spelling)
  stop("Check spelling in Qmd files!")
}
```