A fast seeding technique for k-means algorithm.


The k-means algorithm is one of the most popular clustering techniques because of its speed and simplicity. This algorithm is very simple and easy to understand and implement. The first step of this algorithm is choosing k initial cluster centers. The way that this set of initial cluster centers are chosen, have a great effect on speed and quality of k-means. One of the most popular seeding techniques is k-means++ initialization, but this method needs k passes over the dataset. The goal of this thesis is to propose a new seeding technique which chooses the initial centers much faster than k-means++.



k-means. Seeding. Clustering.