Generate multivariate binary data

Randomly generate a list of two matrices containing multivariate binary data.

Usage

genMVBinaryData(
  n = c(30, 30),
  d = 2000,
  null_hyp = TRUE,
  r = 0.3,
  epsilon = 0.2,
  sigma = c(0.3, 0.1),
  gamma = 0.3,
  p0 = 0.1
)

Arguments

n: Vector of length 2 containing group size (i.e. number of samples) for each group. Default value is (30, 30).
d: Number of variables (dimension) of the data to be generated. Default value is 2000.
null_hyp: Boolean indicating whether group means should be the same (i.e. null hypothesis is TRUE) or different (i.e. null hypothesis is FALSE). Default value is TRUE.
r: Mean for distribution of of \(U_{ij} ~ Ber(r)\). See details below. Increase r to increase the amount of correlation among the d variables. Default value is 0.3.
epsilon: Used in mixture model that generates the probability vectors. See details below. Sparsity can be increased by decreasing epsilon and vice versa. Default value is 0.2.
sigma: Used to define a uniform distribution used to generates the probability vectors. See details below. Default value is (0.3,0.1).
gamma: Mean for dist of \(Z_i ~ Ber(gamma)\). See details below. Default value is 0.3.
p0: See details below. Default is 0.1.

Value

X: List of two n by d matrices each containing the generated datasets.

p: The probability vectors used to generate the two datasets.

null_hyp: Value of the null_hyp parameter.

r: Value of the r parameter.

epsilon: Value of the epsilon parameter.

Details

The \((i,j)^{th}\) entry of the \(c^{th}\) matrix is \(X_{cij} = (1 - U_{ij})Y_{icj} + U_{ij}Z_{i}\) where

\(U_{ij} \sim Ber(r)\),
\(Z_i \sim Ber(\gamma)\),
\(Y_{icj} \sim Ber(p_{jc})\) where
- \(p_{jc} = (1 - \beta)p_{o} + \beta h_c\)
- \(\beta \sim Ber(\epsilon)\)
- \(h_c \sim Uniform(0,\sigma_c)\)

Examples

binData <- genMVBinaryData(null_hyp = TRUE)$X

# Check the dimension of each matrix:
lapply(binData, dim)
#> [[1]]
#> [1]   30 2000
#> 
#> [[2]]
#> [1]   30 2000
#>

Usage

Arguments

Value

Details

See also

Examples