Skip to contents

Conduct inference using `Ate` for the selected target of inference: a linear combination of connectivity parameters based upon estimated communities. Note that communities should be estimated using the `Atr` matrix produced from the `split_network()` function prior to using this function.

Usage

infer_network(
  Ate,
  u,
  communities,
  distribution,
  epsilon = 0.5,
  gamma = NULL,
  Atr = NULL,
  allow_self_loops = TRUE,
  is_directed = TRUE,
  tau = NULL
)

Arguments

Ate

The test adjacency matrix produced from the `split_network()` function which will be used to conduct inference.

u

The linear combination vector (or matrix) which specifies which connectivity parameters should be considered when constructing the selected target of inference. This input should have norm 1. If `is_directed` is set to `FALSE`, then the matrix version of `u` must be upper triangular.

communities

A vector or matrix which specifies the estimated communities. If this is inputted as a vector, then if `Ate` is of size `n` x `n`, then this should be a vector of length `n`, where the ith element is the numbered community that the ith node belongs to. For example, `communities[2] = 3` would indicate that the 2nd node belongs to the 3rd estimated community. If this is inputted as a matrix, then it should be a matrix of size `n` x `K`, where `K` is the number of estimated communities. The matrix should have 0s and 1s only, where `communities[i, k] = 1` indicates that the ith node belongs to the `k`th community. Each node is only allowed to belong to a single community, so there should only be a single 1 in each row of this matrix.

distribution

The distribution that the edges of the adjacency matrix follow. Acceptable distributions are `"gaussian"`, `"poisson"`, or `"bernoulli"`.

epsilon

The parameter controlling the amount of information allocated to the train network versus the test network. For Gaussian and Poisson networks, this must be between 0 and 1 (non-inclusive). A larger value of epsilon indicates more information in the train network. For Bernoulli networks, this input is an alias to the `gamma` parameter.

gamma

For Bernoulli networks, the parameter controlling the amount of information allocated to the train network versus the test network. This must be between 0 and 0.5 (non-inclusive) A larger value of `gamma` indicates less information in the train network, and more in the test network.

Atr

The train adjacency matrix produced from the `split_network()` function. This is only necessary when the network has edges which follow the Bernoulli distribution.

allow_self_loops

A logical indicating whether the network allows self loops (edges pointing from a node to itself.) By default this parameter is set to `TRUE`. If this is set to `FALSE`, then values in the adjacency matrix along the diagonal will be ignored.

is_directed

A logical indicating whether the network is a directed network, and by default is set to `TRUE`. If this is set to `FALSE`, then only the values in the upper triangular portion of the adjacency matrix will be used.

tau

For networks with Gaussian edges only, this parameter indicates the known common standard deviation (square root of the variance) of the edges in the network.

Value

A list labeled with two elements labeled `"estimate"` and `"estimate_variance"`, which contain the estimate for the selected target of inference, as well as an estimate of the variance of the estimator.

Examples

# ==============================
# == Gaussian network example ==
# ==============================
# (Poisson networks proceed nearly identically except that the parameter
#  tau is not necessary to input into split_network() or infer_network())

# First, split a simulated Gaussian adjacency matrix
A_gaussian <- matrix(stats::rnorm(n = 10^2, mean = 10, sd = 5), nrow = 10)
gaussian_split <- split_network(A = A_gaussian, distribution = "gaussian",
                               epsilon = 0.3, tau = 5)
A_gaussian_tr <- gaussian_split$Atr
A_gaussian_te <- gaussian_split$Ate

# Estimate some communities using the train matrix using spectral
# clustering for K=3 communities
if (requireNamespace("nett", quietly = TRUE)) {
  communities_estimate <- nett::spec_clust(A_gaussian_tr, K = 3)
} else {
  communities_estimate <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3)
}

# This particular "u" vector will specify that we want to conduct inference
# for the mean connectivity within the first estimated community.
# Note that "u" is of length 9, because we have 3 estimated communities.
# (This should always be of length K^3)
u_vector <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0)

# We can also specify "u" in matrix form.
u_matrix <- matrix(c(1, 0, 0,
                     0, 0, 0,
                     0, 0, 0), nrow = 3)

# Conduct inference for the selected target (mean connectivity within the
# first estimated community)
gaussian_inference <-
    infer_network(Ate = A_gaussian_te, u = u_matrix,
                  communities = communities_estimate,
                  distribution = "gaussian",
                  epsilon = 0.3, tau = 5)

# Produce a 90% confidence interval for the target of inference
margin_of_error <- sqrt(gaussian_inference$estimate_variance) * qnorm(0.95)
ci_upper_bound <- gaussian_inference$estimate + margin_of_error
ci_lower_bound <- gaussian_inference$estimate - margin_of_error

# ===============================
# == Bernoulli network example ==
# ===============================

# First, split a simulated Bernoulli adjacency matrix with gamma=0.10
A_bernoulli <- matrix(stats::rbinom(n = 10^2, size = 1, p = 0.5), nrow = 10)
bernoulli_split <- split_network(A = A_bernoulli, distribution = "bernoulli",
                                gamma = 0.10)
A_bernoulli_tr <- bernoulli_split$Atr
A_bernoulli_te <- bernoulli_split$Ate

# Estimate some communities using the train matrix using spectral
# clustering for K=3 communities
if (requireNamespace("nett", quietly = TRUE)) {
  communities_estimate <- nett::spec_clust(A_bernoulli_tr, K = 3)
} else {
  communities_estimate <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3)
}

# This particular "u" vector will specify that we want to conduct inference
# for the mean connectivity within the first estimated community.
# Note that "u" is of length 9, because we have 3 estimated communities.
# (This should always be of length K^3)
u_vector <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0)

# We can also specify "u" in matrix form.
u_matrix <- matrix(c(1, 0, 0,
                     0, 0, 0,
                     0, 0, 0), nrow = 3)

# Conduct inference for the selected target (mean connectivity within the
# first estimated community)
bernoulli_inference <-
    infer_network(Ate = A_bernoulli_te, u = u_matrix,
                  communities = communities_estimate,
                  distribution = "bernoulli",
                  gamma = 0.10, Atr = A_bernoulli_tr)

# Produce a 90% confidence interval for the target of inference
margin_of_error <- sqrt(bernoulli_inference$estimate_variance) * qnorm(0.95)
ci_upper_bound <- bernoulli_inference$estimate + margin_of_error
ci_lower_bound <- bernoulli_inference$estimate - margin_of_error