PG-means: learning the number of clusters in data.
dc.contributor.advisor | Hamerly, Gregory James, 1977- | |
dc.contributor.author | Feng, Yu. | |
dc.contributor.department | Computer Science. | en |
dc.contributor.other | Baylor University. Dept. of Computer Science. | en |
dc.date.accessioned | 2007-03-19T14:52:48Z | |
dc.date.available | 2007-03-19T14:52:48Z | |
dc.date.copyright | 2006-12 | |
dc.date.issued | 2007-03-19T14:52:48Z | |
dc.description | Includes bibliographical references (p. 50-52). | en |
dc.description.abstract | We present a novel algorithm called PG-means in this thesis. This algorithm is able to determine the number of clusters in a classical Gaussian mixture model automatically. PG-means uses efficient statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we apply a statistical test to the entire model at once, not just on a per-cluster basis. We show that this method works well in difficult cases such as overlapping clusters, eccentric clusters and high dimensional clusters. PG-means also works well on non-Gaussian clusters and many true clusters. Further, the new approach provides a much more stable estimate of the number of clusters than current methods. | en |
dc.description.degree | M.S. | en |
dc.description.statementofresponsibility | by Yu Feng. | en |
dc.format.extent | vii, 52 p. : ill. | en |
dc.format.extent | 193840 bytes | |
dc.format.extent | 1477879 bytes | |
dc.format.mimetype | application/pdf | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/2104/5021 | |
dc.language.iso | en_US | en |
dc.rights | Baylor University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. Contact librarywebmaster@baylor.edu for inquiries about permission. | en |
dc.rights.accessrights | Worldwide access | en |
dc.subject | Algorithms. | en |
dc.subject | Computer network architecture. | en |
dc.title | PG-means: learning the number of clusters in data. | en |
dc.type | Thesis | en |
Files
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.95 KB
- Format:
- Item-specific license agreed upon to submission
- Description: