首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于关键域子空间的离群数据聚类算法
引用本文:金义富,朱庆生,邢永康.一种基于关键域子空间的离群数据聚类算法[J].计算机研究与发展,2007,44(4):651-659.
作者姓名:金义富  朱庆生  邢永康
作者单位:重庆大学计算机科学与工程学院,重庆,400044;湛江师范学院信息科学与技术学报,湛江,524048;重庆大学计算机科学与工程学院,重庆,400044
基金项目:国家自然科学基金 , 重庆市自然科学基金
摘    要:离群数据发现与分析是数据挖掘的重要组成部分,现有离群数据挖掘算法主要针对如何检测离群对象,缺乏对挖掘出的离群数据集进行解释与分析的有效方法.通过对离群数据来源及特性进行分析并结合粗糙集理论,定义了离群划分相似度的概念,提出了一种基于关键属性域子空间的离群数据聚类算法COKAS,该算法不仅揭示了离群数据子空间特性,进一步获取了扩展知识,而且有助于对整体数据集的理解.对两个多维数据集的实验结果表明,该算法具有良好的适应性及有效性.

关 键 词:离群集  离群划分相似度  关键域子空间  聚类
修稿时间:01 12 2005 12:00AM

An Algorithm for Clustering of Outliers Based on Key Attribute Subspace
Jin Yifu,Zhu Qingsheng,Xing Yongkang.An Algorithm for Clustering of Outliers Based on Key Attribute Subspace[J].Journal of Computer Research and Development,2007,44(4):651-659.
Authors:Jin Yifu  Zhu Qingsheng  Xing Yongkang
Affiliation:1.College of Computer Science and Engineering, Chongqing University, Chongqing 400044;School of Information Science and Technology, Zhanjiang Normal College, Zhanjiang 524048
Abstract:It is an important part of data mining to discover and analyze outlying observations.Outliers may contain crucial information,and so detecting them is much more significant than detecting general patterns in some applications which include,for instance,credit card fraud in finance,calling fraud in telecommunication,intrusion in network,disease diagnosis,etc.Existing outlier mining algorithms focus on detecting and identifying outliers,but studies of outliers include both mining outliers and analyzing why they are exceptional.The research on explaining and analyzing outliers slightly lags behind outlier mining technology now.It is inevitable that analyzing outliers to the full needs a great deal of knowledge from object task fields.However,some further discoveries of outliers may be obtained from studies of distributing characteristics of dataset in attribute space.By analyzing the origin and feature of outliers and using the theory of rough set,a concept of outlying partition similarity is defined and then an algorithm for clustering outliers based on key attribute subspace(COKAS)is proposed.The approach can provide the extended knowledge of identified outliers and improve the understanding of the whole data set.Experimental results of real multi-dimension data set show that this algorithm is scalable and efficient.
Keywords:outlier  outlying partition similarity  key attribute subspace  clustering
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号