Selected topics in high-dimensional statistical learning.

dc.contributor.advisorYoung, Dean M.
dc.contributor.authorRamey, John A.
dc.contributor.departmentStatistical Sciences.en_US
dc.contributor.schoolsBaylor University. Dept. of Statistical Sciences.en_US
dc.date.accessioned2012-11-29T16:19:48Z
dc.date.available2012-11-29T16:19:48Z
dc.date.copyright2012-08
dc.date.issued2012-11-29
dc.description.abstractAdvances in microarray technology have equipped researchers to measure gene expression levels simultaneously from thousands of genes, yielding increasingly large and complex data sets. However, due to the cost and time required to obtain individual observations, the sample sizes of the resulting data sets are often much smaller than the number of gene expressions measured. Hence, due to the curse of dimensionality [Bellman, 1961], the analysis of these data sets with classic multivariate statistical methods is challenging and, at times, impossible. Consequently, numerous supervised and unsupervised learning methods have been proposed to improve upon classic methods. In Chapter 2 we formulate a clustering stability evaluation method based on decision-theoretic principles to assess the quality of clusters proposed by a clustering algorithm used to identify subtypes of cancer for diagnosis. We demonstrate that our proposed clustering-evaluation method is better suited to comparing clustering algorithms and to providing superior interpretability compared to the figure of merit (FOM) method from Yeung, Haynor, and Ruzzo [2001] and the cluster stability evaluation method from Hennig [2007] using three artificial data sets and a well- known microarray data set from Khan et al. [2001]. In Chapter 3 we investigate model selection of the regularized discriminant analysis (RDA) classifier proposed by Friedman [1989]. Using four small-sample, high-dimensional data sets, we compare the classification performance of RDA models selected with five conditional error-rate estimators to models selected with the leave-one-out (LOO) error-rate estimator, which has been recommended for RDA model selection by Friedman [1989]. We recommend the 10-fold cross-validation (CV ) estimator and the bootstrap CV estimator from Fu, Carroll, and Wang [2005] for model selection with the RDA classifier. In Chapters 4 and 5 we consider the diagonal linear discriminant analysis (DLDA) classifier, the shrinkage-based DLDA (SDLDA) classifier from Pang, Tong, and Zhao [2009], and the shrinkage-mean-based DLDA (SmDLDA) classifier from Tong, Chen, and Zhao [2012]. We propose four alternative classifiers and demonstrate that they are often superior to the diagonal classifiers using six well-known microarray data sets because they preserve off-diagonal classificatory information by nearly simultaneously diagonalizing the sample covariance matrix of each class.en_US
dc.description.degreePh.D.en_US
dc.identifier.urihttp://hdl.handle.net/2104/8513
dc.language.isoen_USen_US
dc.publisheren
dc.rightsBaylor University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. Contact librarywebmaster@baylor.edu for inquiries about permission.en_US
dc.rights.accessrightsWorldwide accessen_US
dc.subjectSupervised learning.en_US
dc.subjectUnsupervised learning.en_US
dc.subjectClustering.en_US
dc.subjectClustering stability.en_US
dc.subjectClustering evaluation.en_US
dc.subjectClassification.en_US
dc.subjectNaive Bayes classifier.en_US
dc.subjectRegularized discriminant analysis.en_US
dc.subjectDiagonal discriminant analysis.en_US
dc.subjectError-rate estimation.en_US
dc.subjectGene expression data.en_US
dc.subjectMicroarray data.en_US
dc.titleSelected topics in high-dimensional statistical learning.en_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
john_ramey_phd.pdf
Size:
606.68 KB
Format:
Adobe Portable Document Format
Description:
Dissertation
No Thumbnail Available
Name:
john_ramey_copyright-availability.pdf
Size:
76.05 KB
Format:
Adobe Portable Document Format
Description:
Copyright and Availability Form

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Item-specific license agreed upon to submission
Description: