Generate multinomial data
genMultinomialData.Rd
Generate two sets of multinomially distributed vectors using
rmultinom
. Useful for hypothesis testing simulations. Three different
experiments with different probability vectors (of length \(k\)) are
available in addition to user-specified probability vector p
:
Experiment 1: \(p_{1i} = \frac{1/i^\alpha}{\sum_1^k 1/i^\alpha}\). When the
null_hyp
parameter is FALSE, the probability vector for the 2nd group is generated by switching the position of 1st and \(m^th\) entries.Experiment 2: \(p_{1i} = 1/k\). When the
null_hyp
parameter is FALSE, \(p_{2i} = 0\) for \(i \in 1...b\) and \(p_{2,b+1}= \sum_{1}^{b+1} p_{1i} = (b+1)/k \).Experiment 3: \(p_{1i} = 1/k\). When the
null_hyp
parameter is FALSE, \(p_{2i} = 0\) for \(i \in 1...b\) and \(p_{2i} = 1/(k − b)\) for \(i > b\).
Usage
genMultinomialData(
null_hyp = TRUE,
p = NULL,
k = 2000,
n = c(8000, 8000),
sample_size = 30,
expID = 1,
alpha = 0.45,
m = 1000,
numzero = 50,
...
)
Arguments
- null_hyp
logical; if TRUE, generate data using the same distribution. Default value is TRUE.
- p
An optional 2 by \(k\) matrix specifying the probabilities of the \(k\) categories for each of the two groups. Each row of
p
must sum to 1. If defined, all remaining parameters in the function definition are ignored. Default value is NULL.- k
integer representing dimension (number of categories). Default 2000.
- n
Vector of length 2 specifying the parameter of each multinomial distribution used to define the total number of objects that are put into \(k\) bins in the typical multinomial experiment.
- sample_size
integer specifying the number of random vectors to generate for each of the two groups.
- expID
Experiment number 1-3. Default is 1.
- alpha
Number between 0 and 1. Used for experiment 1. Default is 0.45.
- m
integer between 2 and \(k\). Used in experiment 1 for the alternative hypothesis. Default is 1000.
- numzero
integer between 1 and \(k\)-1. Used in experiments 2 and 3 for the alternative hypothesis. Default is 50.
- ...
Additional parameters.
Examples
#Generate data when the null hypothesis is FALSE:
X <- genMultinomialData(FALSE)
#Dimension of the two generated datasets:
lapply(X, dim)
#> [[1]]
#> [1] 30 2000
#>
#> [[2]]
#> [1] 30 2000
#>
#Proportion of entries less than 5 in the first dataset:
sum(X[[1]]<5)/(nrow(X[[1]])*ncol(X[[1]]))
#> [1] 0.6975333