首页 | 本学科首页   官方微博 | 高级检索  
     

基于密度偏倚抽样的局部距离异常检测方法
引用本文:付培国,胡晓惠.基于密度偏倚抽样的局部距离异常检测方法[J].软件学报,2017,28(10):2625-2639.
作者姓名:付培国  胡晓惠
作者单位:中国科学院大学, 北京 100049;天基综合信息系统重点实验室(中国科学院软件研究所), 北京 100190,天基综合信息系统重点实验室(中国科学院软件研究所), 北京 100190
基金项目:国家自然科学基金(U1435220);国家高技术研究发展计划(863)(2012AA011206).
摘    要:异常检测是数据挖掘的重要研究领域,当前基于距离或者最近邻概念的异常数据检测方法,在进行海量高维数据异常检测时,存在运算时间过长的问题.许多改进的异常检测方法虽然提高了算法运算效率,然而检测效果欠佳.基于此本文提出一种基于密度偏倚抽样的局部距离异常检测算法,首先利用基于密度偏倚的概率抽样方法对所需检测的数据集合进行概率抽样,之后对抽样数据利用基于局部距离的局部异常检测方法.对抽样集合进行局部异常系数计算,得到的异常系数既是抽样数据的局部异常系数,又是数据集的近似全局异常系数.之后对得到的每个数据点的局部异常系数进行排序,异常系数值越大的数据点越可能是异常点.实验结果表明,和已有的算法相比,本算法具有更高的检测精确度和更少的运算时间,并且该算法对各种维度和数据规模的数据都具有很好的检测效果,可扩展性强.

关 键 词:异常检测  局部异常系数  局部距离  密度偏倚抽样  SLDOF算法
收稿时间:2015/7/15 0:00:00
修稿时间:2016/9/7 0:00:00

Anomaly Detection Algorithm Based on the Local Distance of Density-Based Sampling Data
FU Pei-Guo and HU Xiao-Hui.Anomaly Detection Algorithm Based on the Local Distance of Density-Based Sampling Data[J].Journal of Software,2017,28(10):2625-2639.
Authors:FU Pei-Guo and HU Xiao-Hui
Affiliation:University of Chinese Academy of Sciences, Beijing 100049, China;Science and Technology on Integrated Information System Laboratory (Institute of Software, The Chinese Academy of Sciences), Beijing 100190, China and Science and Technology on Integrated Information System Laboratory (Institute of Software, The Chinese Academy of Sciences), Beijing 100190, China
Abstract:Anomaly detection is an important research area of data mining, the current outlier mining approaches based on the distance or the nearest neighbor result in too long operation time results when using for the high-dimensional and massive data. Many improvements have been proposed to improve the results of the algorithms, but the detection is ineffective. So, this paper presents a new anomaly detection algorithm based on the local distance of density-based sampling data. Firstly, we use the density-based of probability sampling method to detect the data required detection to have a subset; then we use the method based on the local distance of local outlier detection to calculate the abnormal factor of each object in the subset. Because of using the density-based of sample data, the abnormal factor both as local outlier factor of the subset and as the approximate value of global outlier factor of the hole data; after we have the abnormal factor of each object in the subset, the higher the score of the point is, the higher the degree of outliers. Experimental results show that, compared with the existing algorithms, this algorithm has higher detection accuracy and less computation time. Our algorithm has higher efficiency and stronger scalability for various dimensions and size of data points.
Keywords:anomaly detection  outlier factor of local set  local distance  density-based sampling  SLDOF algorithm
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号