首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于概率的孤立点检测方法
引用本文:张悦,刘杰,李航.一种基于概率的孤立点检测方法[J].计算机工程,2013,39(3):46-50,55.
作者姓名:张悦  刘杰  李航
作者单位:沈阳师范大学软件学院,沈阳,110034
基金项目:国家自然科学基金资助项目(60970112)
摘    要:现有孤立点检测方法大多数都需要预先设定孤立点个数,若设定不准确将降低孤立点检测的准确性。针对该问题,提出一种基于概率的孤立点检测方法。结合基于密度的DBSCAN算法与中位数求方差的方法,对待检测数据集进行聚类,提取出不包含在任何聚类中的可疑孤立点并进行分析,从而确定最终孤立点。该方法所检测的数据与时间因素线性无关,不必预先设定孤立点个数及聚类数,并且对噪声数据具有较强的抗干扰能力。IRIS测试数据集上的实验结果表明,该方法能够有效地识别孤立点。

关 键 词:孤立点  概率  中位数  DBSCAN算法  方差  聚类
收稿时间:2012-04-18

An Outlier Detection Method Based on Probability
ZHANG Yue , LIU Jie , LI Hang.An Outlier Detection Method Based on Probability[J].Computer Engineering,2013,39(3):46-50,55.
Authors:ZHANG Yue  LIU Jie  LI Hang
Affiliation:(Software College, Shenyang Normal University, Shenyang 110034, China)
Abstract:Existing outlier detection algorithms most require a predetermined number of outlier. If it is not accurate, it can greatly reduce the accuracy of outlier detection algorithm. Aiming at above problem, a detection method of outlier based on probability is proposed. The detection method combines the DBSCAN algorithm with variance from median algorithm to cluster detection data set, and extracts suspicious outliers which are not belonging to any cluster. These suspicious outliers are detected by the definition of outlier, and outliers are determined. The method are insensitivity with noisy data. The data disposed by this method is irrelative to the time scales. And it does not need to set the number of outlier and cluster. Experimental results on IRIS show that this algorithm can detect outliers effectively.
Keywords:outlier  probability  median  DBSCAN algorithm  variance  clustering
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号