首页 | 本学科首页   官方微博 | 高级检索  
     

基于网格和密度的海量数据增量式离群点挖掘算法
引用本文:张净, 孙志挥, 杨明, 倪巍伟, 杨宜东. 基于网格和密度的海量数据增量式离群点挖掘算法[J]. 计算机研究与发展, 2011, 48(5): 823-830.
作者姓名:张净  孙志挥  杨明  倪巍伟  杨宜东
作者单位:1. 东南大学计算机科学与工程系,南京,210096;江苏大学电气信息工程学院,江苏镇江,212001
2. 东南大学计算机科学与工程系,南京,210096
3. 南京师范大学计算机科学与技术学院,南京,210097
摘    要:处理海量和高维数据已经成为设计离群点算法面临的重要任务和挑战,针对海量数据的特点提出一种基于网格和密度的增量式离群点挖掘算法IGDLOF,算法的基本思想为:采用网格的七元组信息减少数据维数和数量,利用增量更新减少内存需求.通过代表点过滤相应的主体数据,先判断再进行近似密度计算的方法减少计算量,降低算法的复杂度.通过在真实和仿真数据集的测试表明,IGDLOF增量算法可与LOF算法保持相同的精确度,而执行效率得到显著的提高.

关 键 词:海量数据  网格  密度  离群点挖掘  增量  LOF算法

Fast Incremental Outlier Mining Algorithm Based on Grid and Capacity
Zhang Jing, Sun Zhihui, Yang Ming, Ni Weiwei, Yang Yidong. Fast Incremental Outlier Mining Algorithm Based on Grid and Capacity[J]. Journal of Computer Research and Development, 2011, 48(5): 823-830.
Authors:Zhang Jing  Sun Zhihui  Yang Ming  Ni Weiwei  Yang Yidong
Abstract:Outlier mining is an important branch in the area of data mining. It has been widely applied to many fields such as industrial and financial applications for IDS and detecting credit card fraud. Dealing with massive and high dimensional data has become tasks and challenges for outlier algorithm to be faced. Based on the definitions of density and grid, a fast incremental outlier mining algorithm is proposed. It introduces seven-tuple information grid to reduce the number and dimension of data, and use incremental updates to reduce memory requirements. Dense grid, sparseness grid and neighbor grid are defined, which could make computation deal with grid conveniently. Through the appropriate representative point filtering the main data, an approximate method to reduce computation and decrease the complexity of the algorithm is adopted. The experiments are performed on different initial datasets and incremental datasets. And the results demonstrate the detection rate, false rate alarm rate, precisions and average running time. The real and simulated data sets of tests show that the proposed algorithm can maintain the same accuracy with LOF algorithm, but the implementation efficiency is improved significantly.
Keywords:great capacity datasets  grid  density  outlier mining  increment  LOF algorithm
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号