README.Rmd

---
output:
  md_document:
    variant: markdown_github
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README/README-"
)
```

rp.flm.test
===========

[![Travis-CI Build Status](https://travis-ci.org/egarpor/rp.flm.test.svg?branch=master)](https://travis-ci.org/egarpor/rp.flm.test)
[![License](https://img.shields.io/badge/license-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)

## Overview

Software companion for the paper *Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes* (Cuesta-Albertos, García-Portugués, Febrero-Bande and González-Manteiga, 2019). It implements the proposed tests and allows to replicate the empirical results presented.

## Install

```{r,eval=FALSE}
# install.packages("devtools")
library(devtools)
install_github("egarpor/rp.flm.test")
```

Alternatively, see function `rp.flm.test` in the [`fda.usc`](http://cran.r-project.org/web/packages/fda.usc/) library (Febrero-Bande and Oviedo de la Fuente, 2012) for versions above `1.3.1`.

## Usage

### Simulated data

```{r, simulated, message = FALSE, warning = FALSE, fig.asp = 1, fig.align = 'center'}
# Load package
library(rp.flm.test)

# Generate data
set.seed(345678)
t <- seq(0, 1, l = 101)
n <- 100
X <- r.ou(n = n, t = t, alpha = 2, sigma = 0.5)
beta0 <- fdata(mdata = cos(2 * pi * t) - (t - 0.5)^2, argvals = t,
               rangeval = c(0,1))
Y1 <- inprod.fdata(X, beta0) + rnorm(n, sd = 0.1)
Y2 <- log(norm.fdata(X)) + rnorm(n, sd = 0.1)

# Do not reject FLM
rp.test1 <- rp.flm.test(X.fdata = X, Y = Y1, verbose = FALSE)
rp.test1$p.values.fdr

# Reject FLM
rp.test2 <- rp.flm.test(X.fdata = X, Y = Y2, verbose = FALSE)
rp.test2$p.values.fdr

# Estimations of beta
plot(beta0, main = "", ylab = expression(beta(t) * ", " * hat(beta)(t)))
lines(rp.test1$beta.est, col = 2)
lines(rp.test2$beta.est, col = 3)

# Simple hypothesis: do not reject beta = beta0
rp.flm.test(X.fdata = X, Y = Y1, beta0.fdata = beta0, 
            verbose = FALSE)$p.values.fdr

# Simple hypothesis: reject beta = beta0^2
rp.flm.test(X.fdata = X, Y = Y1, beta0.fdata = beta0^2, 
            verbose = FALSE)$p.values.fdr

# Increasing n.proj
rp.flm.test(X.fdata = X, Y = Y1, n.proj = 3, verbose = FALSE)$p.values.fdr
rp.flm.test(X.fdata = X, Y = Y1, n.proj = 5, verbose = FALSE)$p.values.fdr

# Increasing B
rp.flm.test(X.fdata = X, Y = Y1, B = 1e3, verbose = FALSE)$p.values.fdr
rp.flm.test(X.fdata = X, Y = Y1, B = 5e3, verbose = FALSE)$p.values.fdr
```

### Tecator dataset

```{r, tecator, message = FALSE, warning = FALSE, fig.asp = 1, fig.align = 'center'}
# Load package
library(rp.flm.test)

# Load data
data(tecator)
absorp <- tecator$absorp.fdata
ind <- 1:215 # sometimes ind <- 1:129 is considered as training dataset
x <- absorp[ind, ]
y <- tecator$y$Fat[ind]

# Composite hypothesis
rp.tecat <- rp.flm.test(X.fdata = x, Y = y, verbose = FALSE, B = 1e4)
rp.tecat$p.values.fdr

# Simple hypothesis
zero <- fdata(mdata = rep(0, length(x$argvals)), argvals = x$argvals,
              rangeval = x$rangeval)
rp.flm.test(X.fdata = x, Y = y, beta0.fdata = zero, verbose = FALSE, B = 1e4)

# With derivatives
rp.tecat.1 <- rp.flm.test(X.fdata = fdata.deriv(x, 1), Y = y, verbose = FALSE,
                          B = 1e4)
rp.tecat.1$p.values.fdr
rp.tecat.2 <- rp.flm.test(X.fdata = fdata.deriv(x, 2), Y = y, verbose = FALSE,
                          B = 1e4)
rp.tecat.2$p.values.fdr
```

### AEMET dataset

```{r, aemet, message = FALSE, warning = FALSE, fig.asp = 1, fig.align = 'center'}
# Load package
library(rp.flm.test)

# Load data
data(aemet)
wind.speed <- apply(aemet$wind.speed$data, 1, mean)
temp <- aemet$temp

# Remove the 5% of the curves with less depth (i.e. 4 curves)
par(mfrow = c(1, 1))
res.FM <- depth.FM(temp, draw = TRUE)
qu <- quantile(res.FM$dep, prob = 0.05)
l <- which(res.FM$dep <= qu)
lines(aemet$temp[l], col = 3)

# Data without outliers
wind.speed <- wind.speed[-l]
temp <- temp[-l]

# Composite hypothesis
rp.aemet <- rp.flm.test(X.fdata = temp, Y = wind.speed, verbose = FALSE, 
                        B = 1e4)
rp.aemet$p.values.fdr

# Simple hypothesis
zero <- fdata(mdata = rep(0, length(temp$argvals)), argvals = temp$argvals,
              rangeval = temp$rangeval)
rp.flm.test(X.fdata = temp, Y = wind.speed, beta0.fdata = zero, verbose = FALSE,
            B = 1e4)
```

## Reproducibility of Cuesta-Albertos et al. (2019)

The directory [`/simulation`](https://github.com/egarpor/data-gofflm/tree/master/simulation) in the [data-gofflm](https://github.com/egarpor/data-gofflm) repository contains the scripts used in the simulation study of the aforementioned paper, as well as their `.RData` outputs. Those files are not downloaded when installing `rp.flm.test`.

## References

Cuesta-Albertos, J. A., García-Portugués, E., Febrero-Bande, M. and González-Manteiga, W. (2019). Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes. *Annals of Statistics*, 47(1):439-467. [doi:10.1214/18-AOS1693](https://doi.org/10.1214/18-AOS1693)

Febrero-Bande, M. and Oviedo de la Fuente, M. (2012). Statistical Computing in Functional Data Analysis: The R Package fda.usc. *Journal of Statistical Software*, 51(4), 1-28. [doi:10.18637/jss.v051.i04](https://doi.org/10.18637/jss.v051.i04)