Show simple item record

dc.contributor.advisorHamerly, Gregory James, 1977-
dc.contributor.authorDrake, Jonathan, 1989-
dc.date.accessioned2013-09-24T14:16:31Z
dc.date.available2013-09-24T14:16:31Z
dc.date.copyright2013-08
dc.date.issued2013-09-24
dc.identifier.urihttp://hdl.handle.net/2104/8826
dc.description.abstractThe popular k-means algorithm is used to discover clusters in vector data automatically. We present three accelerated algorithms that compute exactly the same clusters much faster than the standard method. First, we redesign Hamerly’s algorithm to use k heaps to avoid checking distance bounds for all n points, with little empirical gain. Second, we use an adaptive number of distance bounds to avoid redundant calculations (Drake and Hamerly 2012). Experiments show the superior performance of adaptive k-means in medium dimension (20 ≤ d ≤ 200) on uniform random data. Finally, we reformulate the triangle inequality to constrain the search space for a point’s nearest center to an annular region centered at the origin. For uniform random data, annulus k-means is competitive with or much faster than other algorithms in low dimension (d < 20), and it outperforms other algorithms on five of six naturally-clustered, real-world datasets tested (d ≤ 74).en_US
dc.language.isoen_USen_US
dc.publisheren
dc.rightsBaylor University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. Contact librarywebmaster@baylor.edu for inquiries about permission.en_US
dc.subjectMachine learning.en_US
dc.subjectClustering.en_US
dc.titleFaster k-means clustering.en_US
dc.typeThesisen_US
dc.description.degreeM.S.en_US
dc.rights.accessrightsWorldwide accessen_US
dc.contributor.departmentComputer Science.en_US
dc.contributor.schoolsBaylor University. Dept. of Computer Science.en_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record