Skip to contents

Split an adjacency matrix into a train and test adjacency matrix using either data thinning (Poisson or Gaussian edges) or data fission (Bernoulli edges.)

Usage

split_network(
  A,
  distribution,
  epsilon = 0.5,
  gamma = NULL,
  allow_self_loops = TRUE,
  is_directed = TRUE,
  tau = NULL
)

Arguments

A

The square adjacency matrix to be split.

distribution

The distribution that the edges of the adjacency matrix follow. Acceptable distributions are `"gaussian"`, `"poisson"`, or `"bernoulli"`.

epsilon

The parameter controlling the amount of information allocated to the train network versus the test network. For Gaussian and Poisson networks, this must be between 0 and 1 (non-inclusive). A larger value of epsilon indicates more information in the train network. For Bernoulli networks, this input is an alias to the `gamma` parameter.

gamma

For Bernoulli networks, the parameter controlling the amount of information allocated to the train network versus the test network. This must be between 0 and 0.5 (non-inclusive) A larger value of `gamma` indicates less information in the train network, and more in the test network.

allow_self_loops

A logical indicating whether the network allows self loops (edges pointing from a node to itself.) By default this parameter is set to `TRUE`. If this is set to `FALSE`, then the values in the adjacency matrix along the diagonal will be ignored.

is_directed

A logical indicating whether the network is a directed network, and by default is set to `TRUE`. If this is set to `FALSE`, then only the values along the upper triangular portion of the matrix will be used.

tau

For networks with Gaussian edges only, this parameter indicates the known common standard deviation (square root of the variance) of the edges in the network.

Value

A list labeled with two elements labeled `"Atr"` and `"Ate"`, which are the train and test networks, respectively.

Examples

# Split a simulated Gaussian adjacency matrix
A_gaussian <- matrix(rnorm(n = 10^2, mean = 10, sd = 5), nrow = 10)
gaussian_split <- split_network(A_gaussian, "gaussian", 0.3, tau = 5)
A_gaussian_tr <- gaussian_split$Atr
A_gaussian_te <- gaussian_split$Ate

# Split a simulated Bernoulli adjacency matrix with gamma = 0.25
A_bernoulli <- matrix(rbinom(n = 10^2, size = 1, p = 0.5), nrow = 10)
bernoulli_split <- split_network(A_bernoulli, "bernoulli", gamma = 0.25)
A_bernoulli_tr <- bernoulli_split$Atr
A_bernoulli_te <- bernoulli_split$Ate