Skip to content

permaverse/nevada

Repository files navigation

Overview

R-CMD-check test-coverage Codecov test coverage pkgdown CRAN status

The package nevada (NEtwork-VAlued Data Analysis) is an R package for the statistical analysis of network-valued data. In this setting, a sample is made of statistical units that are networks themselves. The package provides a set of matrix representations for networks so that network-valued data can be transformed into matrix-valued data. Subsequently, a number of distances between matrices is provided as well to quantify how far two networks are from each other and several test statistics are proposed for testing equality in distribution between samples of networks using exact permutation testing procedures. The permutation scheme is carried out by the flipr package which also provides a number of test statistics based on inter-point distances that play nicely with network-valued data. The implementation is largely made in C++ and the matrix of inter- and intra-sample distances is pre-computed, which alleviates the computational burden often associated with permutation tests.

Installation

You can install the latest stable version of nevada on CRAN with:

install.packages("nevada")

Or you can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("astamm/nevada")

Usage

Example 1

In this first example, we compare two populations of networks generated according to two different models (Watts-Strogatz and Barabasi), using the adjacency matrix representation of networks, the Frobenius distance to compare single networks and the combination of Student-like and Fisher-like statistics based on inter-point distances to summarize information and perform the permutation test.

sample_size <- 10L
num_vertices <- 10L
smallworld_params <- list(n_dim = 1L, dim_size = num_vertices, order = 4L, p_rewire = 0.15)
barabasi_albert_params <- list(power = 1L, n = num_vertices)

withr::with_seed(1234, {
  x <- nevada::nvd(
    sample_size = sample_size,
    model = "smallworld", 
    !!!smallworld_params
  )
  y <- nevada::nvd(
    sample_size = sample_size,
    model = "barabasi_albert", 
    !!!barabasi_albert_params
  )
})
ℹ Calling the `tidygraph::play_smallworld()` function with the following arguments:n_dim: 1dim_size: 10order: 4p_rewire: 0.15loops: FALSEmultiple: FALSECalling the `tidygraph::play_barabasi_albert()` function with the following arguments:power: 1n: 10growth: 1growth_dist: NULLuse_out: FALSEappeal_zero: 1directed: TRUEmethod: psumtree

By default the nvd() constructor generates networks with 25 nodes. One can wonder whether there is a difference between the distributions that generated these two samples (which there is given the models that we used). The test2_global() function provides an answer to this question:

t1_global <- nevada::test2_global(x, y, seed = 1234)
t1_global$pvalue
[1] 0.0009962984

The p-value is very small, leading to the conclusion that we should reject the null hypothesis of equal distributions.

Although this is a fake example, we could create a partition to try to localize differences along this partition:

partition <- as.integer(c(1:5, each = 5))

The test2_local() function provides an answer to this question:

t1_local <- nevada::test2_local(x, y, partition, seed = 1234)
t1_local
$intra
# A tibble: 5 × 3
  E     pvalue truncated
  <chr>  <dbl> <lgl>    
1 P1     0.425 TRUE     
2 P2     0.425 TRUE     
3 P3     0.425 TRUE     
4 P4     0.425 TRUE     
5 P5     0.425 TRUE     

$inter
# A tibble: 10 × 4
   E1    E2      pvalue truncated
   <chr> <chr>    <dbl> <lgl>    
 1 P1    P2    0.000996 FALSE    
 2 P1    P3    0.000996 FALSE    
 3 P1    P4    0.000996 FALSE    
 4 P1    P5    0.000996 FALSE    
 5 P2    P3    0.000996 FALSE    
 6 P2    P4    0.000996 FALSE    
 7 P2    P5    0.000996 FALSE    
 8 P3    P4    0.000996 FALSE    
 9 P3    P5    0.000996 FALSE    
10 P4    P5    0.000996 FALSE    

Example 2

In this second example, we compare two populations of networks generated according to the same model (Watts-Strogatz), using the adjacency matrix representation of networks, the Frobenius distance to compare single networks and the combination of Student-like and Fisher-like statistics based on inter-point distances to summarize information and perform the permutation test.

withr::with_seed(1234, {
  x <- nevada::nvd(
    sample_size = sample_size,
    model = "smallworld", 
    !!!smallworld_params
  )
  y <- nevada::nvd(
    sample_size = sample_size,
    model = "smallworld", 
    !!!smallworld_params
  )
})
ℹ Calling the `tidygraph::play_smallworld()` function with the following arguments:n_dim: 1dim_size: 10order: 4p_rewire: 0.15loops: FALSEmultiple: FALSECalling the `tidygraph::play_smallworld()` function with the following arguments:n_dim: 1dim_size: 10order: 4p_rewire: 0.15loops: FALSEmultiple: FALSE

One can wonder whether there is a difference between the distributions that generated these two samples (which there is given the models that we used). The test2_global() function provides an answer to this question:

t2 <- nevada::test2_global(x, y, seed = 1234)
t2$pvalue
[1] 0.9190782

The p-value is larger than 5% or even 10%, leading us to failing to reject the null hypothesis of equal distributions at these significance thresholds.