Faster k-means clustering.

dc.contributor.advisorHamerly, Gregory James, 1977-
dc.contributor.authorDrake, Jonathan, 1989-
dc.contributor.departmentComputer Science.en_US
dc.contributor.schoolsBaylor University. Dept. of Computer Science.en_US
dc.date.accessioned2013-09-24T14:16:31Z
dc.date.available2013-09-24T14:16:31Z
dc.date.copyright2013-08
dc.date.issued2013-09-24
dc.description.abstractThe popular k-means algorithm is used to discover clusters in vector data automatically. We present three accelerated algorithms that compute exactly the same clusters much faster than the standard method. First, we redesign Hamerly’s algorithm to use k heaps to avoid checking distance bounds for all n points, with little empirical gain. Second, we use an adaptive number of distance bounds to avoid redundant calculations (Drake and Hamerly 2012). Experiments show the superior performance of adaptive k-means in medium dimension (20 ≤ d ≤ 200) on uniform random data. Finally, we reformulate the triangle inequality to constrain the search space for a point’s nearest center to an annular region centered at the origin. For uniform random data, annulus k-means is competitive with or much faster than other algorithms in low dimension (d < 20), and it outperforms other algorithms on five of six naturally-clustered, real-world datasets tested (d ≤ 74).en_US
dc.description.degreeM.S.en_US
dc.identifier.urihttp://hdl.handle.net/2104/8826
dc.language.isoen_USen_US
dc.publisheren
dc.rightsBaylor University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. Contact librarywebmaster@baylor.edu for inquiries about permission.en_US
dc.rights.accessrightsWorldwide accessen_US
dc.subjectMachine learning.en_US
dc.subjectClustering.en_US
dc.titleFaster k-means clustering.en_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
jonathan_drake_masters.pdf
Size:
1.02 MB
Format:
Adobe Portable Document Format
Description:
Thesis
No Thumbnail Available
Name:
jonathan_drake_copyright-availability.pdf
Size:
795.37 KB
Format:
Adobe Portable Document Format
Description:
Copyright availability form

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Item-specific license agreed upon to submission
Description: