首页 | 本学科首页   官方微博 | 高级检索  
     

基于层次聚类识别数据集前n个全局孤立点
引用本文:梁斌梅. 基于层次聚类识别数据集前n个全局孤立点[J]. 计算机工程与应用, 2012, 48(9): 101-103,107
作者姓名:梁斌梅
作者单位:广西大学数学与信息科学学院,南宁530004;四川大学计算机学院,成都610065
基金项目:广西大学科研基金(No.XJZ100258).
摘    要:孤立数据的存在使数据挖掘结果不准确,甚至错误。现有的孤立点检测算法在通用性、有效性、用户友好性及处理高维大数据集的性能还不完善,为此,提出一种有效的全局孤立点检测方法,该方法进行凝聚层次聚类,根据聚类树和距离矩阵来可视化判断数据孤立程度,确定孤立点数目。从聚类树自顶向下,无监督地去除离群数据点。在多个数据集上的仿真实验结果表明,该方法能有效识别孤立程度最大的前n个全局孤立点,适用于不同形状的数据集,算法效率高,用户友好,且适用于大型高维数据集的孤立点检测。

关 键 词:孤立点检测  层次聚类  数据挖掘

Detection of top-n global outliers in datasets based on hierarchical clustering
LIANG Binmei. Detection of top-n global outliers in datasets based on hierarchical clustering[J]. Computer Engineering and Applications, 2012, 48(9): 101-103,107
Authors:LIANG Binmei
Affiliation:LIANG Binmei (1.College of Mathematics and Information Science, Guangxi University, Nanning 530004, China 2.College of Computer Science, Sichuan University, Chengdu 610065, China)
Abstract:The existance of outlier always leads to inaccurate, even wrong results in data mining. The outlier detection algorithm now available should be improved including its versatility, effectiveness, user-friendliness, and the performance in processing high-dimen- sional and large databases. An effective and global outlier detection method is proposed. Agglomerative hierarchical clustering is per- formed, and the isolated degree of the data can be visually judged by the clustering tree and distance matrix, and the number of the outli- ers can be determined and the outliers are identified unsupervisedly from the top to down of the clustering tree. Experimental results show that the method can effectively detect the top-n global outliers, and applicable to datasets of various shapes. Experimental results show that the algorithm is efficient, user-friendly, and applicable to detect the outliers for high-dimensional and large databases.
Keywords:outlier detection  hierarchical clustering  data mining
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号