Geometric methods of accelerating triangle-inequality-based k-means.
One of the most frequent ways how to cluster data is k-means. The standard way of solving the problem is iterative Lloyd's algorithm. This algorithm performs many redundant calculations. Elkan's and Hamerly's algorithms, the heap algorithm and many others eliminate this redundancy by maintaining a set of upper and lower bounds. The goal of this thesis is to further improve the runtime of those algorithms. Namely we improve the way how those bounds are maintained between iterations. By tighter updates of lower bounds we can further decrease the number of distance calculations. The other improvements stated in the thesis include elimination of centroids from the innermost loop of the algorithms when the bounds do not help. The common property of those two proposals is that they require only calculations that are done once per iteration. We also solve a problem that is left as open in the heap algorithm.