Show simple item record

dc.contributor.advisorHamerly, Gregory James, 1977-
dc.creatorShrestha Khimbaja, Sumit, 1994-
dc.date.accessioned2021-01-28T15:36:31Z
dc.date.available2021-01-28T15:36:31Z
dc.date.created2020-12
dc.date.issued2020-10-22
dc.date.submittedDecember 2020
dc.identifier.urihttps://hdl.handle.net/2104/11192
dc.description.abstractClustering is a crucial branch of machine learning that groups the input data into different clusters based on the features of the data without the training label. K-means clustering is a widely used iterative clustering technique that groups data into k clusters by repeatedly minimizing a criterion function. The standard k-means algorithm, Lloyd’s algorithm, is simple and easy to implement the algorithm. But it does perform a lot of unnecessary work while computing the distance between the data points and the centers. This increases the runtime of the algorithm and makes it unsuitable for real-life applications. Previous works on k-means optimizations have used the triangle inequality law to generate geometric bounds for the data points which are then used to prevent unnecessary distance computations. Another approach to accelerate the k-means algorithm is to use only using a subset of data instead of the entire data in a single iteration. Due to the reduction in data size used for computation, this method also generates faster clustering results. In this thesis, we are proposing an algorithm that accelerates the k-means algorithm by using geometric bounds on mini-batches. We are combining the triangle inequality based optimization technique with the mini-batch approach by running the k-means algorithm in minibatches for multiple iterations and using the geometric bounds within the mini-batch to reduce the distance computations. The results show that there is a large speedup over the existing optimization techniques in terms of runtime while producing good quality clusters.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectClustering. K-means. Mini-batch.
dc.titleOptimizing k-means clustering using mini-batches and distance bounds.
dc.typeThesis
dc.rights.accessrightsWorldwide access
dc.type.materialtext
thesis.degree.nameM.S.
thesis.degree.departmentBaylor University. Dept. of Computer Science.
thesis.degree.grantorBaylor University
thesis.degree.levelMasters
dc.date.updated2021-01-28T15:36:32Z


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record