# Topics in dimension reduction and missing data in statistical discrimination.

## Date

## Authors

## Access rights

Access changed 7/16/12

## Journal Title

## Journal ISSN

## Volume Title

## Publisher

## Abstract

This dissertation is comprised of four chapters. In the first chapter, we define the concept of linear dimension reduction, review some popular linear dimension reduction procedures, discuss background research that we use in chapters two and three, and give a brief outline of the dissertation contents. In chapter two, we derive a linear dimension reduction (LDR) procedure for statistical discriminant analysis for multiple multivariate skew-normal populations. First, we define the multivariate skew-normal distribution and give several applications of its use. We also provide marginal and conditional properties of the MSN random vector. Then, we state and prove several lemmas used in a series of theorems that present our LDR procedure for the multivariate skew-normal populations using parameter configurations. Lastly, we illustrate our LDR method for multiple multivariate skew-normal distributions with three examples. In the third chapter, we define and rigorously prove the existence of the multivariate singular skew-normal (MSSN) distribution. Next, we state and prove distributional properties for linear combinations, marginal, and conditional random variables from a MSSN distribution. Then, we state and prove several lemmas used in deriving our LDR transformation for the multiple MSSN distributions with assorted parameter combinations. We then state and prove several theorems concerning the formulation of our LDR technique. Finally, we illustrate the effectiveness of our LDR technique for multiple multivariate singular skew-normal classes with two examples. In chapter four, we compare two statistical linear discrimination procedures when monotone missing training data exists in the training data sets from two different multivariate normally distributed populations with unequal means but equal covariance matrices. We derive the maximum likelihood estimators (MLEs) for the partitioned population means and the common covariance matrix in an appendix. Additionally, we contrast two classifiers: a linear combination discriminant function derived from Chung and Han (C-H) (2000) and a linear classifier based on the MLE of two multivariate normal training samples with identical monotone missing training-data in one or more features. We then perform two Monte Carlo simulations with various parameter configurations to compare the effectiveness of the MLE and C-H classifiers as the correlation between features for the population covariance matrix increases. Moreover, we compare the two competing classifiers using parametric bootstrap estimated expected error rates for a subset of the well-known Iris data.