A Simple and Adaptive Two-Sample Test in High Dimensions
by Professor Ming-Yen Cheng
Abstract: Testing the equality of two means is a fundamental inference problem. The conventional Hotelling's test performs poorly or becomes inapplicable for high-dimensional data which are commonly encountered nowadays. Several modifications have been proposed to address this challenging issue and shown to perform well. However, most of them are based on asymptotic normality of the null distributions of their test statistics, which inevitably requires strong assumptions on the covariance structure.
We study this issue thoroughly, and propose an L2-norm based test that works under much milder conditions and even when there are fewer observations than the dimension. Specially, to cope with possible non-normality of the null distribution we employ the Welch-Satterthwaite approximation. We prove an upper bound on the approximation error and demonstrate that it is preferred to normal approximation. Simple ratio-consistent estimators for the parameters in the Welch-Satterthwaite approximation are given.
Our test is still applicable when the null distribution is asymptotically normal. More importantly, while existing tests based on asymptotic normality are not, our test is adaptive to singularity or near singularity of the unknown covariance, which is commonly seen in high dimensions and is the main cause of non-normality. The approximate and asymptotic powers are also investigated.
Simulation studies and a real data application show that our test outperforms a number of existing tests in terms of size control, while the powers are comparable when their sizes are comparable.