Topic on the statistical analysis of high-dimensional data.

Date

Access rights

Worldwide access
Access changed 8/2/2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

High-dimensional genomic data can provide deep insight into biological processes. However, conventional statistical methods typically cannot be applied directly to genomic data sets because the high dimensionality of markers commonly exceeds sample size, rendering the sample covariance matrix to be singular. Here, we examine three scenarios involving high-dimensional genomic data: reordering of principle components of multi-class data based on alternative criteria, comparing tests for two population means on high-dimensional data, and correcting for systematic batch effects in microarray data. All three investigations overcome issues of dimensionality and use principal components for dimension-reduction, visualization, or statistical analysis. First we use alternatively ordered principal components to produce low-dimensional models for visualization; second, we compare five high-dimensional tests of two-means and describe a principal-component alternative to Hotelling's T2 test; and finally, we utilize principal component reduction of microarray data to visualize existing batch effects between cohorts. Overall, we explore solutions to the analysis of high-dimensional, genomic data through the use of principal components analysis or other adaptations to reach the desired analytic objectives.

Description

Keywords

High-dimensional data. PCA.

Citation