As of 1 April 2026, the PSJD database will become an archive and will no longer accept new data. Current publications from Polish scientific journals are available through the Library of Science: https://bibliotekanauki.pl
K-means algorithm is one of the simplest and fastest clustering algorithms existing since more than four decades. One of the limitations of this algorithm is estimating number of clusters in advance. This algorithm also suffers from random initialization problem. This paper proposes a heuristic which initializes the cluster centers and estimates the number of clusters as a discrete value. The method estimates the number of clusters and initializes many cluster centers successfully for the clusters that are dense and separated significantly. The method selects a new cluster center in each iteration. The point selected is the point which is most dissimilar from the previously chosen points. The proposed algorithm is experimented on various synthetic data and the results are encouraging.
National Institute of Technology Goa, Farmagudi, Goa, India
References
[1] J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd ed., Morgan Kaufmann Publ., Waltham, USA 2006
[2] M. Castelnovi, P. Musso, A. Sgorbissa, R. Zaccaria, in: Proc. IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, Vol. 1, 2003, p. 229, doi: 10.1109/CIRA.2003.1222094
[4] A. Jain, R. Duin, J. Mao, IEEE Trans. Pattern Anal. Machine Intellig. 22, 4 (2000), doi: 10.1109/34.824819
[5] N. Srinivasan, V. Vaidehi, in: Proc. BroadNets 2005, 2nd Int. Conf. on Broadband Networks, Vol. 2, 2005, p. 1007, doi: 10.1109/ICBN.2005.1589714
[6] D. Aloise, A. Deshpande, P. Hansen, P. Popat, Machine Learning 75, 245 (2009), doi: 10.1007/s10994-009-5103-0
[7] J. Rousseeuw, J. Computat. Appl. Math. 20, 53 (1987), doi: 10.1016/0377-0427(87)90125-7
[8] J.B. MacQueen, in: 5th Berkeley Symp. on Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley, CA 1967, p. 281
[9] M.D. Berg, O. Cheong, M.V. Kreveld, M. Overmars, Computational Geometry: Algorithms and Applications, 3rd ed., Springer-Verlag, Berlin 2008
[10] L. Galluccio, O. Michel, P. Comon, A.O. Hero, Sign. Process. 92, 1970 (2012), doi: 10.1016/j.sigpro.2011.12.009
[11] A.K. Jain, Pattern Recogn. Lett. 31, 651 (2010), doi: 10.1016/j.patrec.2009.09.011