Generate multinomial data
genMultinomialData.RdGenerate two sets of multinomially distributed vectors using
rmultinom. Useful for hypothesis testing simulations. Three different
experiments with different probability vectors (of length \(k\)) are
available in addition to user-specified probability vector p:
Experiment 1: \(p_{1i} = \frac{1/i^\alpha}{\sum_1^k 1/i^\alpha}\). When the
null_hypparameter is FALSE, the probability vector for the 2nd group is generated by switching the position of 1st and \(m^th\) entries.Experiment 2: \(p_{1i} = 1/k\). When the
null_hypparameter is FALSE, \(p_{2i} = 0\) for \(i \in 1...b\) and \(p_{2,b+1}= \sum_{1}^{b+1} p_{1i} = (b+1)/k \).Experiment 3: \(p_{1i} = 1/k\). When the
null_hypparameter is FALSE, \(p_{2i} = 0\) for \(i \in 1...b\) and \(p_{2i} = 1/(k − b)\) for \(i > b\).
Usage
genMultinomialData(
null_hyp = TRUE,
p = NULL,
k = 2000,
n = c(8000, 8000),
sample_size = 30,
expID = 1,
alpha = 0.45,
m = 1000,
numzero = 50,
...
)Arguments
- null_hyp
logical; if TRUE, generate data using the same distribution. Default value is TRUE.
- p
An optional 2 by \(k\) matrix specifying the probabilities of the \(k\) categories for each of the two groups. Each row of
pmust sum to 1. If defined, all remaining parameters in the function definition are ignored. Default value is NULL.- k
integer representing dimension (number of categories). Default 2000.
- n
Vector of length 2 specifying the parameter of each multinomial distribution used to define the total number of objects that are put into \(k\) bins in the typical multinomial experiment.
- sample_size
integer specifying the number of random vectors to generate for each of the two groups.
- expID
Experiment number 1-3. Default is 1.
- alpha
Number between 0 and 1. Used for experiment 1. Default is 0.45.
- m
integer between 2 and \(k\). Used in experiment 1 for the alternative hypothesis. Default is 1000.
- numzero
integer between 1 and \(k\)-1. Used in experiments 2 and 3 for the alternative hypothesis. Default is 50.
- ...
Additional parameters.
Examples
#Generate data when the null hypothesis is FALSE:
X <- genMultinomialData(FALSE)
#Dimension of the two generated datasets:
lapply(X, dim)
#> [[1]]
#> [1] 30 2000
#>
#> [[2]]
#> [1] 30 2000
#>
#Proportion of entries less than 5 in the first dataset:
sum(X[[1]]<5)/(nrow(X[[1]])*ncol(X[[1]]))
#> [1] 0.6975333