organicml-evaluation.Rmd

---
title: "Organic ML VGG16 Evaluation"
output:
  md_document:
    variant: markdown_github
---

## Libraries

```{r error=FALSE, warning=FALSE}
library(keras)
library(tibble)
library(dplyr)
library(readr)
```

## Recreate generator for evaluation images

```{r}
batch_size <- 10
validation_dir <- "data/validation"

validation_datagen <- image_data_generator(rescale = 1/255)

validation_generator <- flow_images_from_directory(
    validation_dir,
    validation_datagen,
    target_size = c(256, 256),
    batch_size = batch_size,
    shuffle = FALSE,
    class_mode = "binary"
  )
```

## Reload and Evaluate

Reload and evaluate the best model that was saved.

```{r  error=FALSE, warning=FALSE}
network <- load_model_hdf5("organicml_checkpoint.h5")
network %>% evaluate(validation_generator)
```

## Make a vector of all output from the model

Present each batch of images from the generator (it has already been iterated over once) and obtain the output of the sigmoid activation function that is the last layer of the network.

```{r}
predicted <- c()
actual <- c()
network_input <- c()
i <- 0
while(TRUE) {
  batch <- generator_next(validation_generator)
  inputs_batch <- batch[[1]]
  labels_batch <- batch[[2]]
  predicted <- c(predicted, round(network %>% predict(inputs_batch)))
  actual <- c(actual, labels_batch)
  network_input <- c(network_input, inputs_batch)
  i <- i + 1
  if (i * batch_size >= validation_generator$samples)
    break
}
```

## Compute the error rate

Assemble a dataframe that lines up predictions alongside the actual labels and the input image filename.

```{r}
evaluation_df <- tibble(
    image_filename = validation_generator$filepaths,
    actual = actual,
    predicted = predicted
  )
```

Compute accuracy, true positive rate (tpr), true negative rate (tnr), false negative rate (fnr), and false positive rate (fpr). Using formulas from [https://en.wikipedia.org/wiki/Sensitivity_and_specificity](https://en.wikipedia.org/wiki/Sensitivity_and_specificity).

```{r}
network_performance <- evaluation_df %>%
  summarize(
    accuracy = sum(actual == predicted) / n(),
    tpr = sum(actual == 1 & predicted == 1) / sum(actual == 1),
    tnr = sum(actual == 0 & predicted == 0) / sum(actual == 0),
    fpr = sum(actual == 0 & predicted == 1) / sum(actual == 0),
    fnr = sum(actual == 1 & predicted == 0) / sum(actual == 1)
  )
knitr::kable(network_performance)
```

## Create a csv of failed classification attempts

```{r}
evaluation_df %>%
  filter(actual != predicted) %>%
  write_csv("organicml-evaluation-failures.csv")
```