首页 | 本学科首页   官方微博 | 高级检索  
     

基于距离不等式的K-medoids聚类算法
引用本文:余冬华,郭茂祖,刘扬,任世军,刘晓燕,刘国军.基于距离不等式的K-medoids聚类算法[J].软件学报,2017,28(12):3115-3128.
作者姓名:余冬华  郭茂祖  刘扬  任世军  刘晓燕  刘国军
作者单位:哈尔滨工业大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001,哈尔滨工业大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001;北京建筑大学 电气与信息工程学院, 北京 100044,哈尔滨工业大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001,哈尔滨工业大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001,哈尔滨工业大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001,哈尔滨工业大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001
基金项目:国家自然科学基金(61571164,61571163,61671188,61671189,QC2014C071)
摘    要:本文研究加速K-medoids聚类算法,首先以PAM(Partitioning Around Medoids)、TPAM(Triangular Inequality Elimination Criteria PAM)算法为基础,给出两个加速引理,并基于中心点之间距离不等式提出两个新加速定理.同时,以On+K2)额外内存空间开销辅助引理、定理的结合而提出加速SPAM(Speed Up PAM)聚类算法,使得K-medoids聚类算法复杂度由OKn-K2)降低至O((n-K2).在实际及人工模拟数据集上的实验结果表明,相对PAM、TPAM、FKMEDOIDS(Fast K-medoids)等参考算法均有改进,运行时间比PAM至少提升0.828倍.

关 键 词:数据挖掘  聚类算法  K-medoids  距离不等式
收稿时间:2016/6/18 0:00:00
修稿时间:2016/10/26 0:00:00

K-Medoids Clustering Algorithm Based on Distance Inequality
YU Dong-Hu,GUO Mao-Zu,LIU Yang,REN Shi-Jun,LIU Xiao-Yan and LIU Guo-Jun.K-Medoids Clustering Algorithm Based on Distance Inequality[J].Journal of Software,2017,28(12):3115-3128.
Authors:YU Dong-Hu  GUO Mao-Zu  LIU Yang  REN Shi-Jun  LIU Xiao-Yan and LIU Guo-Jun
Affiliation:School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China,School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;School of Electrical Engineering and Information Technique, Beijing University of Civil Engineering and Architecture, Beijing 100044, China,School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China,School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China,School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China and School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Abstract:In this paper, we research the speed up K-medoids clustering algorithm. Firstly, two acceleration lemmas are given based on partitioning around medoids (PAM) and triangular inequality elimination criteria partitioning around medoids (TPAM) algorithms. And two new acceleration theorems are proposed based on distance inequality between center points. Combining the lemma with the theorem with the aid of additional memory space O(n+K2), the speed up partitioning around medoids (SPAM) algorithm is proposed, which reduce the time complexity from O(K(n-K)2) to O((n-K)2). Experimental results on both real-world and artificial datasets show that the SPAM algorithm outperforms PAM, TPAM and FKEMDOIDS approaches in terms of running time with at least 0.828 times corresponding to PAM.
Keywords:data mining  clustering algorithm  K-medoids  distance inequality
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号