Faster k-means clustering.

Drake, Jonathan, 1989-

Faster k-means clustering.

Files

jonathan_drake_masters.pdf (1.02 MB)

jonathan_drake_copyright-availability.pdf (795.37 KB)

Date

2013-08

Authors

Drake, Jonathan, 1989-

Access rights

Worldwide access

Abstract

The popular k-means algorithm is used to discover clusters in vector data automatically. We present three accelerated algorithms that compute exactly the same clusters much faster than the standard method. First, we redesign Hamerly’s algorithm to use k heaps to avoid checking distance bounds for all n points, with little empirical gain. Second, we use an adaptive number of distance bounds to avoid redundant calculations (Drake and Hamerly 2012). Experiments show the superior performance of adaptive k-means in medium dimension (20 ≤ d ≤ 200) on uniform random data. Finally, we reformulate the triangle inequality to constrain the search space for a point’s nearest center to an annular region centered at the origin. For uniform random data, annulus k-means is competitive with or much faster than other algorithms in low dimension (d < 20), and it outperforms other algorithms on five of six naturally-clustered, real-world datasets tested (d ≤ 74).

Keywords

Machine learning., Clustering.

URI

http://hdl.handle.net/2104/8826

Collections

Electronic Theses and Dissertations
Theses/Dissertations - Computer Science

Full item page

Faster k-means clustering.

Files

Date

Authors

Access rights

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections