A Tradeoff Between Accuracy and Speed for K-Means Seed Determination期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A Tradeoff Between Accuracy and Speed for K-Means Seed Determination

Authors:	Farzaneh Khorasani Morteza Mohammadi Zanjireh Mahdi Bahaghighat Qin Xin

Affiliation:	1 Ecole de Technologie Supérieure, Montréal, QCH3C1K3, Canada2 Faculty of Exact and Applied Sciences, University of Oran 1, Oran, 31000, Algeria3 Higher Institute of Sciences and Technologies, University of Gafsa, Gafsa, 2100, Tunisia4 AL-Lith Computer College, Umm Al-Qura University, Al-Lith, 28434, Saudi Arabia

Abstract:	With a sharp increase in the information volume, analyzing and retrieving this vast data volume is much more essential than ever. One of the main techniques that would be beneficial in this regard is called the Clustering method. Clustering aims to classify objects so that all objects within a cluster have similar features while other objects in different clusters are as distinct as possible. One of the most widely used clustering algorithms with the well and approved performance in different applications is the k-means algorithm. The main problem of the k-means algorithm is its performance which can be directly affected by the selection in the primary clusters. Lack of attention to this crucial issue has consequences such as creating empty clusters and decreasing the convergence time. Besides, the selection of appropriate initial seeds can reduce the cluster’s inconsistency. In this paper, we present a new method to determine the initial seeds of the k-mean algorithm to improve the accuracy and decrease the number of iterations of the algorithm. For this purpose, a new method is proposed considering the average distance between objects to determine the initial seeds. Our method attempts to provide a proper tradeoff between the accuracy and speed of the clustering algorithm. The experimental results showed that our proposed approach outperforms the Chithra with 1.7% and 2.1% in terms of clustering accuracy for Wine and Abalone detection data, respectively. Furthermore, achieved results indicate that comparing with the Reverse Nearest Neighbor (RNN) search approach, the proposed method has a higher convergence speed.

Keywords:	Data clustering k-means algorithm information retrieval outlier detection clustering accuracy unsupervised learning

	点击此处可从《计算机系统科学与工程》浏览原始摘要信息
	点击此处可从《计算机系统科学与工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏