首页 | 本学科首页   官方微博 | 高级检索  
     

基于聚类的两段式孤立点检测算法
引用本文:任建华,高立明. 基于聚类的两段式孤立点检测算法[J]. 计算机工程与应用, 2016, 52(20): 98-102
作者姓名:任建华  高立明
作者单位:辽宁工程技术大学 电子与信息工程学院,辽宁 葫芦岛 125105
摘    要:现有的大多数孤立点检测算法都需要预先设定孤立点个数,并且还缺乏对不均匀数据集的检测能力。针对以上问题,提出了基于聚类的两段式孤立点检测算法,该算法首先用DBSCAN聚类算法产生可疑孤立点集合,然后利用剪枝策略对数据集进行剪枝,并用基于改进距离的孤立点检测算法产生最可能孤立点排序集合,最终由两个集合的交集确定孤立点集合。该算法不必预先设定孤立点个数,具有较高的准确率与检测效率,并且对数据集的分布状况不敏感。数据集上的实验结果表明,该算法能够高效、准确地识别孤立点。

关 键 词:孤立点检测  距离  DBSCAN算法  剪枝  

Two-part outlier detection algorithm based on clustering
REN Jianhua,GAO Liming. Two-part outlier detection algorithm based on clustering[J]. Computer Engineering and Applications, 2016, 52(20): 98-102
Authors:REN Jianhua  GAO Liming
Affiliation:College of Electronics and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125105, China
Abstract:Most of the existing outlier detection algorithms need to preset the number of outliers, and also lack of detection capability of non-uniform data set. In view of the above problems, it puts forward the two-part outlier detection algorithm based on clustering, this algorithm first uses DBSCAN clustering algorithm to produce suspected outlier set, then pruning strategy is used for pruning data set, and the outlier detection algorithm based on improved distance is used to produce the sorting set of the points which most likely to be outliers. Eventually the isolated point set is determined by the intersection of the two sets. The algorithm doesn’t need to preset the number of outliers, with the higher accuracy and detection efficiency, and is not sensitive to the distribution of the data set. The experimental results on data set show that the algorithm can effectively and accurately identify the outliers.
Keywords:outlier detection  distance  DBSCAN algorithm  pruning  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号