PG-means: learning the number of clusters in data.

Date

2006-12

Authors

Feng, Yu.

Access rights

Worldwide access

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We present a novel algorithm called PG-means in this thesis. This algorithm is able to determine the number of clusters in a classical Gaussian mixture model automatically. PG-means uses efficient statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we apply a statistical test to the entire model at once, not just on a per-cluster basis. We show that this method works well in difficult cases such as overlapping clusters, eccentric clusters and high dimensional clusters. PG-means also works well on non-Gaussian clusters and many true clusters. Further, the new approach provides a much more stable estimate of the number of clusters than current methods.

Description

Includes bibliographical references (p. 50-52).

Keywords

Algorithms., Computer network architecture.

Citation