On testing for a difference in two high-dimensional mean vectors.
A common problem in multivariate statistical analysis involves testing for differences in the mean vectors from two populations with equal covariance matrices.This problem is considered well-posed when the sum of the two sample sizes is greater than the data dimension and, therefore, the traditional Hotelling’s T2 test can be applied. In cases where the data dimension exceeds the sample-sizes sum minus two, the pooled sample covariance matrix is singular and, thus, nontraditional tests must be formulated. Using Monte Carlo simulations, we first contrast the powers of five hypothesis tests for two high-dimensional means that have been proposed in the statistical literature. We then examine the efficacy of linear dimension reduction derived from the singular value decomposition of the total data matrix and explore its effect on the powers of five tests when the tests are conducted with the dimension-reduced data. We then propose a new test for the difference in two high-dimensional mean vectors that combines aspects of the random subspaces and cluster subspaces tests to improve test power.