首页 | 本学科首页   官方微博 | 高级检索  
     

局部离群点挖掘算法研究
引用本文:薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463.
作者姓名:薛安荣  鞠时光  何伟华  陈伟鹤
作者单位:江苏大学计算机科学与通信工程学院 江苏镇江212013
基金项目:国家自然科学基金 , 江苏省高校自然科学基金 , 江苏省自然科学基金
摘    要:离群点可分为全局离群点和局部离群点.在很多情况下,局部离群点的挖掘比全局离群点的挖掘更有意义.现有的基于局部离群度的离群点挖掘算法存在检测精度依赖于用户给定的参数、计算复杂度高等局限.文中提出将对象属性分为固有属性和环境属性,用环境属性确定对象邻域、固有属性计算离群度的方法克服上述局限;并以空间数据为例,将空间属性与非空间属性分开,用空间属性确定空间邻域,用非空间属性计算空间离群度,设计了空间离群点挖掘算法.实验结果表明,所提算法具有对用户依赖性少、检测精度高、可伸缩性强和运算效率高的优点.

关 键 词:离群点检测  局部离群系数  R*-树  数据挖掘  空间离群点  剔除平均  局部  离群点  挖掘算法  算法研究  Outlier  Detection  Local  Algorithms  运算效率  可伸缩性  依赖性  结果  实验  设计  非空间属性  空间数据  方法  属性计算  邻域  对象属性  环境属性
修稿时间:2007-03-05

Study on Algorithms for Local Outlier Detection
XUE An-Rong,JU Shi-Guang,HE Wei-Hua,CHEN Wei-He.Study on Algorithms for Local Outlier Detection[J].Chinese Journal of Computers,2007,30(8):1455-1463.
Authors:XUE An-Rong  JU Shi-Guang  HE Wei-Hua  CHEN Wei-He
Affiliation:School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013
Abstract:Outlier detection has attracted much attention recently. There are two kinds of outliers: global outliers and local outliers. In many scenarios, the detection of local outliers is more valuable than that of global outliers. To mine local outliers, it is more meaningful to assign to each object a degree of being an outlier. Some existing representative algorithms currently used for solving this problem are compared in detail, and their disadvantages are pointed out such as poor efficiency and the detection accuracy depending on the parameters given by the user. In general, the attributes of each data object can be categorized as the inherent attributes and the context attributes, the inherent attributes characterize the data object while the context attributes embody the relationship between this data object and the neighbor data objects. The context attributes is not intrinsic to the data object. In order to overcome those disadvantages mentioned above, this paper proposes to use the context attributes to determine the object neighborhood and use the inherent attributes to compute the outlier score. For spatial data, the attributes comprise the non-spatial dimensions and the spatial dimensions. The spatial attributes provide a location index to the data object. The neighborhood in the Euclidean space plays a very important role in the analysis of spatial data. The spatial attributes are used to determine spatial neighborhood and the non-spatial dimensions are used to compute the spatial outlier score. This paper also proposes a novel measure, spatial local outlier factor (SLOF), which captures the local behavior of datum in its spatial neighborhood. The experimental results show that proposed SLOF algorithm outperforms the other existing algorithms in detection accuracy, user dependency, scalability and efficiency.
Keywords:outlier detection  local outlier factor  R-tree  data mining  spatial outlier  trimmed mean
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号