Selected topics in statistical discriminant analysis.
Access RightsBaylor University access only
Ounpraseuth, Songthip T.
MetadataShow full item record
This dissertation consists of three selected topics in statistical discriminant analysis: dimension reduction, regularization methods, and imputation methods. In Chapter 2 we first derive a new linear dimension-reduction method to determine a low-dimensional hyperplane that preserves or nearly preserves the separation of the individual populations and the Bayes probability of misclassification. Next, we derive a new low-dimensional representation-space approach for multiple high-dimensional multivariate normal populations. Third, we develop a linear dimension reduction method for quadratic discriminant analysis when the class population parameters must be estimated. Using a Monte Carlo simulation with several different parameter configurations, we compare our new methodology with two competing linear dimension-reduction procedures for statistical discrimination in terms of expected error rates. We find that under certain conditions, our new dimension-reduction method yields superior results for a majority of the configurations we consider. In addition, we determine that in several configurations, classification performance is actually enhanced by our new feature-reduction method when the sample size is sufficiently small relative to the original feature space dimension. In Chapter 3 we compare and contrast the efficacy of seven regularization methods for the quadratic discriminant function under a variety of parameter configurations. In particular, we use the expected error rate to assess the efficacy of these regularized quadratic discriminant functions. A two-parameter family of regularized class covariance-matrix estimators derived by Friedman (1989) yields superior classification results relative to its six competitors for the configurations, training-sample sizes, and original feature dimensions examined here. Finally, in Chapter 4 we consider the statistical classification problem for two multivariate normal populations with equal covariance matrices when the training samples contain observations missing at random. That is, we analyze the effect of missing-at-random data on Anderson's linear discriminant function. We use a Monte Carlo simulation to examine the expected probabilities of misclassification under several single and multiple imputation methods. The seven missing-data algorithms include: complete observation, mean substitution, expectation maximization, regression, predictive mean matching, propensity score, and MCMC. The regression, predictive mean, and propensity score multiple imputation approaches are, in general, superior to the other methods for the configurations and training-sample sizes we consider.