PL EN


Preferences help
enabled [disable] Abstract
Number of results
2016 | 130 | 1 | 78-82
Article title

HK-Means: A Heuristic Approach to Initialize and Estimate the Number of Clusters in Biological Data

Content
Title variants
Languages of publication
EN
Abstracts
EN
K-means algorithm is one of the simplest and fastest clustering algorithms existing since more than four decades. One of the limitations of this algorithm is estimating number of clusters in advance. This algorithm also suffers from random initialization problem. This paper proposes a heuristic which initializes the cluster centers and estimates the number of clusters as a discrete value. The method estimates the number of clusters and initializes many cluster centers successfully for the clusters that are dense and separated significantly. The method selects a new cluster center in each iteration. The point selected is the point which is most dissimilar from the previously chosen points. The proposed algorithm is experimented on various synthetic data and the results are encouraging.
Keywords
EN
Contributors
author
  • National Institute of Technology Goa, Farmagudi, Goa, India
author
  • National Institute of Technology Goa, Farmagudi, Goa, India
author
  • National Institute of Technology Goa, Farmagudi, Goa, India
References
  • [1] J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd ed., Morgan Kaufmann Publ., Waltham, USA 2006
  • [2] M. Castelnovi, P. Musso, A. Sgorbissa, R. Zaccaria, in: Proc. IEEE Int. Symp. on Computational Intelligence in Robotics and Automation, Vol. 1, 2003, p. 229, doi: 10.1109/CIRA.2003.1222094
  • [3] P.K. Chang, Wen Chen, Jiebo Luo, IEEE Trans. Image Process. 7, 1673 (1998), doi: 10.1109/83.730379
  • [4] A. Jain, R. Duin, J. Mao, IEEE Trans. Pattern Anal. Machine Intellig. 22, 4 (2000), doi: 10.1109/34.824819
  • [5] N. Srinivasan, V. Vaidehi, in: Proc. BroadNets 2005, 2nd Int. Conf. on Broadband Networks, Vol. 2, 2005, p. 1007, doi: 10.1109/ICBN.2005.1589714
  • [6] D. Aloise, A. Deshpande, P. Hansen, P. Popat, Machine Learning 75, 245 (2009), doi: 10.1007/s10994-009-5103-0
  • [7] J. Rousseeuw, J. Computat. Appl. Math. 20, 53 (1987), doi: 10.1016/0377-0427(87)90125-7
  • [8] J.B. MacQueen, in: 5th Berkeley Symp. on Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley, CA 1967, p. 281
  • [9] M.D. Berg, O. Cheong, M.V. Kreveld, M. Overmars, Computational Geometry: Algorithms and Applications, 3rd ed., Springer-Verlag, Berlin 2008
  • [10] L. Galluccio, O. Michel, P. Comon, A.O. Hero, Sign. Process. 92, 1970 (2012), doi: 10.1016/j.sigpro.2011.12.009
  • [11] A.K. Jain, Pattern Recogn. Lett. 31, 651 (2010), doi: 10.1016/j.patrec.2009.09.011
  • [12] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, A.Y. Wu, IEEE Trans. Pattern Anal. Machine Intellig. 24, 881 (2002), doi: 10.1109/TPAMI.2002.1017616
  • [13] Z. Du, Y. Wang, Z. Ji, Computat. Biol. Chem. 32, 243 (2008), doi: 10.1016/j.compbiolchem.2008.03.020
  • [14] J.Z.C. Lai, Y.C. Liaw, Pattern Recogn. 41, 3677 (2008), doi: 10.1016/j.patcog.2008.06.005
  • [15] J.Z.C. Lai, T.J. Huang, Y.C. Liaw, Pattern Recogn. 42, 2551 (2009), doi: 10.1016/j.patcog.2009.02.014
  • [16] K.R. Zalik, Pattern Recogn. Lett. 29, 1385 (2008), doi: 10.1016/j.petrec.2008.02.014
  • [17] S.J. Redmond, C. Heneghan, Pattern Recogn. Lett. 28, 965 (2007), doi: 10.1016/j.patrec.2007.01.001
  • [18] F. Cao, J. Liang, G. Jiang, Comput. Math. Appl. 58, 474 (2009), doi: 10.1016/j.camwa.2009.04.017
  • [19] S.S. Khan, A. Ahmad, Pattern Recogn. Lett. 25, 1293 (2004), doi: 10.1016/j.patrec.2004.04.007
  • [20] J.F. Lu, J.B. Tang, Z.M. Tang, J.Y. Yang, Pattern Recogn. Lett. 29, 787 (2008), doi: 10.1016/j.patrec.2007.12.009
  • [21] D.X. Chang, X.D. Zhang, C.W. Zheng, Pattern Recogn. 42, 1210 (2009), doi: 10.1016/j.patcog.2008.11.006
  • [22] A. Ahmad, L. Dey, Data Knowledge Eng. 63, 503 (2007), doi: 10.1016/j.patcog.2008.11.006
  • [23] S. Bandyopadhyay, U. Maulik, Inform. Sci. 146, 221 (2002), doi: 10.1016/S0020-0255(02)00208-6
  • [24] Y.M. Cheung, Pattern Recogn. Lett. 24, 2883 (2003), doi: 10.1016/S0167-8655(03)00146-6
  • [25] A. Likas, N. Vlassis, J.J. Verbeek, Pattern Recogn. 36, 451 (2003), doi: 10.1016/S0031-3203(02)00060-2
  • [26] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, Wiley, New York 1973
  • [27] G.W. Milligan, Psychometrika 45, 325 (1980), doi: 10.1007/BF02293907
  • [28] J.H. Ward, Jr., J. Am. Statist. Assoc. 58, 236 (1963), doi: 10.2307/2282967
  • [29] D. Fisher, J. Artif. Intellig. Res. 4, 147 (1996)
  • [30] D.H. Fisher, Machine Learn. 2, 139 (1987), doi: 10.1023/A:1022852608280
  • [31] P.S. Bradley, O.L. Mangasarian, W.N. Street, in: 10th Annual Conf. on Advances in Neural Information Processing System, USA, 1996, Vol. 9, p. 368
  • [32] J. Tou, R. Gonzales, Pattern Recognition Principles, Addison Wesley, Massachusetts 1974
  • [33] Y. Linde, A. Buzo, R.M. Gray, IEEE Trans. Commun. 28, 84 (1980), doi: 10.1109/TCOM.1980.1094577
  • [34] L. Kaufman, P.J. Rousseeuw, Finding Groups in Data - An Introduction to Cluster Analysis, Wiley, Canada 1990
  • [35] G.P. Babu, M.N. Murty, Pattern Recogn. Lett. 14, 763 (1993), doi: 10.1016/0167-8655(93)90058-L
  • [36] C. Huang, R. Harris, IEEE Trans. Image Process 2, 108 (1993), doi: 10.1109/83.210871
  • [37] B. Thiesson, B. Meck, C. Chickering, D. Heckerman, Microsoft Technical Report (MSR-TR-97-30), 1997
  • [38] P.S. Bradley, U.M. Fayyad, in: 15th Int. Conf. on Machine Learning (ICML-1998), Wisconsin (USA), 1998, p. 91
  • [39] E. Forgy, Biometrics 21, 768 (1965)
  • [40] UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/datasets.html
  • [41] J. Shen, S.I. Chang, E.S. Lee, Y. Deng, S.J. Brown, Appl. Math. Comput. 169, 1172 (2005), doi: 10.1016/j.amc.2004.10.076
Document Type
Publication order reference
Identifiers
YADDA identifier
bwmeta1.element.bwnjournal-article-appv130n1019kz
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.