Perform the neighborhood test for multinom.test
multinom.neighborhood.test.RdPeforms the two sample test for two multinomial vectors testing \(H_0:\) the underlying multinomial probability vectors are within some neighborhood of one another vs. \(H_1:\) they are not.
Arguments
- x, y
Integer vectors (or matrices or dataframes containing multiple integer vector observations as rows).
xandymust be the same type and dimension. Ifxandyare matrices (or dataframes), the \(i^th\) row ofxwill be tested against the \(i^th\) row ofyfor all \(i\) in 1..nrow(x). Alternatively,xcan be a list of two vectors, matrices, or dataframes to be compared. In this case,yis NULL by default.- delta
A number (or vector) greater than 0.
Value
The statistic from multinom.test and its
associated p_delta, where p_delta
\(= 1 - pnorm(T - delta)\).
If x and y are two dimensional (that is, they are matrices
or dataframes with more than one row) and/or delta is a vector,
then a matrix will be returned where the \((i,j)^{th}\) entry will be the
p.delta associated with the \(i^{th}\) rows of x and
y and the \(j^{th}\) entry of the delta vector.
Details
In testing the equality of parameters from two populations
(as in multinom.test),
it frequenly happens that the null hypothesis is rejected even though the estimates
of effect sizes are close to each other. However, these differences may be so small
that the parameters are not considered different in practice. A neighborhood test
is useful in this situation.
See also
multinom.test, vignette("multinomial-neighborhood-test-vignette")
Amanda Plunkett & Junyong Park (2018), Two-Sample Test for Sparse High Dimensional Multinomial Distributions, TEST, https://doi.org/10.1007/s11749-018-0600-8
Examples
# Load the twoNewsGroups dataset
data(twoNewsGroups)
# Sample two sets of 200 documents from the sci.med newsGroup (to simulate
# the null hypothesis being TRUE). For each of the two groups, sum the
# 200 term frequency vectors together. They will be the two vectors that
# we test.
num_docs <- 200
vecs2test <- list(NA, 2)
row_ids <- 1:nrow(twoNewsGroups$sci.med)
group_1 <- sample(row_ids, num_docs)
group_2 <- sample(row_ids[-group_1], num_docs)
vecs2test[[1]] <- twoNewsGroups$sci.med[group_1,] |>
colSums() |>
matrix(nrow=1)
vecs2test[[2]] <- twoNewsGroups$sci.med[group_2,] |>
colSums() |>
matrix(nrow=1)
# Test the null that the two vectors come from the same distribution
# (i.e. the same news group)
vecs2test |> multinom.test()
#> $statistic
#> [1] 26.32967
#>
#> $pvalue
#> [1] 0
#>
# The above test likely produced a significant p-value meaning that we would
# reject the null. However, the difference isn't very interesting. Instead,
# test that the differences are within some neighborhood:
vecs2test |> multinom.neighborhood.test(delta=60)
#> $statistic
#> [1] 26.32967
#>
#> $pvalue_delta
#> [,1]
#> [1,] 1
#>