Generate multivariate binary data
genMVBinaryData.Rd
Randomly generate a list of two matrices containing multivariate binary data.
Arguments
- n
Vector of length 2 containing group size (i.e. number of samples) for each group. Default value is (30, 30).
- d
Number of variables (dimension) of the data to be generated. Default value is 2000.
- null_hyp
Boolean indicating whether group means should be the same (i.e. null hypothesis is TRUE) or different (i.e. null hypothesis is FALSE). Default value is TRUE.
- r
Mean for distribution of of \(U_{ij} ~ Ber(r)\). See details below. Increase
r
to increase the amount of correlation among thed
variables. Default value is 0.3.- epsilon
Used in mixture model that generates the probability vectors. See details below. Sparsity can be increased by decreasing
epsilon
and vice versa. Default value is 0.2.- sigma
Used to define a uniform distribution used to generates the probability vectors. See details below. Default value is (0.3,0.1).
- gamma
Mean for dist of \(Z_i ~ Ber(gamma)\). See details below. Default value is 0.3.
- p0
See details below. Default is 0.1.
Value
X
: List of two n by d matrices each containing the generated datasets.
p
: The probability vectors used to generate the two datasets.
null_hyp
: Value of the null_hyp
parameter.
r
: Value of the r
parameter.
epsilon
: Value of the epsilon
parameter.
Details
The \((i,j)^{th}\) entry of the \(c^{th}\) matrix is \(X_{cij} = (1 - U_{ij})Y_{icj} + U_{ij}Z_{i}\) where
\(U_{ij} \sim Ber(r)\),
\(Z_i \sim Ber(\gamma)\),
\(Y_{icj} \sim Ber(p_{jc})\) where
\(p_{jc} = (1 - \beta)p_{o} + \beta h_c\)
\(\beta \sim Ber(\epsilon)\)
\(h_c \sim Uniform(0,\sigma_c)\)
See also
Amanda Plunkett & Junyong Park (2017), Two-sample tests for sparse high-dimensional binary data, Communications in Statistics - Theory and Methods, 46:22, 11181-11193
Junyong Park & J. Davis (2011), Estimating and testing conditional sums of means in high dimensional multivariate binary data, Journal of Statistical Planning and Inference, 141:1021-1030
Examples
binData <- genMVBinaryData(null_hyp = TRUE)$X
# Check the dimension of each matrix:
lapply(binData, dim)
#> [[1]]
#> [1] 30 2000
#>
#> [[2]]
#> [1] 30 2000
#>