首页 | 本学科首页   官方微博 | 高级检索  
     

GridOF:面向大规模数据集的高效离群点检测算法
引用本文:李存华,孙志挥.GridOF:面向大规模数据集的高效离群点检测算法[J].计算机研究与发展,2003,40(11):1586-1592.
作者姓名:李存华  孙志挥
作者单位:1. 东南大学计算机科学与工程系,南京,210018;淮海工学院计算机科学系,连云港,222005
2. 东南大学计算机科学与工程系,南京,210018
基金项目:国家自然科学基金(7997009),江苏省教育厅自然科学基金(02KJB520012)
摘    要:作为数据库知识发现研究的重要技术手段,现有离群点检测算法在运用于大型数据集时其时间与空间效率均无法令人满意.通过对数据集中离群点分布特征的分析,在数据空间网格划分的基础上,研究数据超方格层次上的密度近似计算与稠密数据主体滤除策略.给出通过简单的修正近似计算取代繁复的点对点密度函数值计算的方法.基于上述思想构造的离群点检测算法GlidOF在保持足够检测精度的同时显著降低了时空复杂度,运用于大规模数据集离群点检测具有良好的适用性和有效性.

关 键 词:离群点检测  修正近似  GridOF算法

GridOF: An Efficient Outlier Detection Algorithm for Very Large Datasets
LI Cun-Hua and SUN Zhi-Hui.GridOF: An Efficient Outlier Detection Algorithm for Very Large Datasets[J].Journal of Computer Research and Development,2003,40(11):1586-1592.
Authors:LI Cun-Hua and SUN Zhi-Hui
Abstract:Identifying the rare instances in datasets can lead to the discovery of unexpected and useful knowledge. However, existing algorithms for such outlier detection applications are not efficient when facing large datasets. With detailed discussion on the futures of outliers in datasets, a novel grid-based algorithm, called GridOF, is presented, which first filters out crowded grids and then finds outliers by computing adjusted mean approximation of the density function. While still keeping desirable outlier detection accuracy, the algorithm has a very high performance in both space and time usage. Results of experiments also demonstrate promising availabilities of this approach.
Keywords:outlier detection  adjusted mean approximation  GridOF algorithm  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号