Perform the neighborhood test for multinom.test
multinom.neighborhood.test.Rd
Peforms the two sample test for two multinomial vectors testing \(H_0:\) the underlying multinomial probability vectors are within some neighborhood of one another vs. \(H_1:\) they are not.
Arguments
- x, y
Integer vectors (or matrices or dataframes containing multiple integer vector observations as rows).
x
andy
must be the same type and dimension. Ifx
andy
are matrices (or dataframes), the \(i^th\) row ofx
will be tested against the \(i^th\) row ofy
for all \(i\) in 1..nrow(x)
. Alternatively,x
can be a list of two vectors, matrices, or dataframes to be compared. In this case,y
is NULL by default.- delta
A number (or vector) greater than 0.
Value
The statistic
from multinom.test
and its
associated p_delta
, where p_delta
\(= 1 - pnorm(T - delta)\).
If x
and y
are two dimensional (that is, they are matrices
or dataframes with more than one row) and/or delta
is a vector,
then a matrix will be returned where the \((i,j)^{th}\) entry will be the
p.delta
associated with the \(i^{th}\) rows of x
and
y
and the \(j^{th}\) entry of the delta
vector.
Details
In testing the equality of parameters from two populations
(as in multinom.test
),
it frequenly happens that the null hypothesis is rejected even though the estimates
of effect sizes are close to each other. However, these differences may be so small
that the parameters are not considered different in practice. A neighborhood test
is useful in this situation.
See also
multinom.test
, vignette("multinomial-neighborhood-test-vignette")
Amanda Plunkett & Junyong Park (2018), Two-Sample Test for Sparse High Dimensional Multinomial Distributions, TEST, https://doi.org/10.1007/s11749-018-0600-8
Examples
# Load the twoNewsGroups dataset
data(twoNewsGroups)
# Sample two sets of 200 documents from the sci.med newsGroup (to simulate
# the null hypothesis being TRUE). For each of the two groups, sum the
# 200 term frequency vectors together. They will be the two vectors that
# we test.
num_docs <- 200
vecs2test <- list(NA, 2)
row_ids <- 1:nrow(twoNewsGroups$sci.med)
group_1 <- sample(row_ids, num_docs)
group_2 <- sample(row_ids[-group_1], num_docs)
vecs2test[[1]] <- twoNewsGroups$sci.med[group_1,] |>
colSums() |>
matrix(nrow=1)
vecs2test[[2]] <- twoNewsGroups$sci.med[group_2,] |>
colSums() |>
matrix(nrow=1)
# Test the null that the two vectors come from the same distribution
# (i.e. the same news group)
vecs2test |> multinom.test()
#> $statistic
#> [1] 26.32967
#>
#> $pvalue
#> [1] 0
#>
# The above test likely produced a significant p-value meaning that we would
# reject the null. However, the difference isn't very interesting. Instead,
# test that the differences are within some neighborhood:
vecs2test |> multinom.neighborhood.test(delta=60)
#> $statistic
#> [1] 26.32967
#>
#> $pvalue_delta
#> [,1]
#> [1,] 1
#>