首页 | 本学科首页   官方微博 | 高级检索  
     

对随机投影算法的离群数据挖掘技术研究
引用本文:李桥,周莹莲,黄胜,马翔.对随机投影算法的离群数据挖掘技术研究[J].计算机工程与应用,2013(24):122-129.
作者姓名:李桥  周莹莲  黄胜  马翔
作者单位:湖南涉外经济学院信息科学与工程学院,长沙410205
基金项目:2011年湖南省教育厅科学研究项目(No.11C0784).
摘    要:d维点集离群数据挖掘技术是目前数据挖掘领域的研究热点之一。当前基于距离或最近邻概念进行离群数据挖掘时,在高维数据情况下的挖掘效果不佳,鉴于此,将基于角度的离群因子应用到高维离群数据挖掘中,提出一种新的基于随机投影算法的离群数据挖掘方案,它只需要用接近线性时间的方法就能预测所有数据点的基于角度的离群因子。该方法可以用于并行环境进行并行加速。对近似质量进行了理论分析,以保证算法的可靠性。合成和真实数据集实验结果表明,对超高维数据集,该方法效率高、可伸缩性强。

关 键 词:离群数据挖掘  角度  随机投影算法  接近线性时间  可靠性  效率

Random projection algorithm for outlier mining technology research
LI Qiao,ZHOU Yinglian,HUANG Sheng,MA Xiang.Random projection algorithm for outlier mining technology research[J].Computer Engineering and Applications,2013(24):122-129.
Authors:LI Qiao  ZHOU Yinglian  HUANG Sheng  MA Xiang
Affiliation:School of Information Science and Engineering, Hunan International Economics University, Changsha 410205, China
Abstract:Outlier mining in ddimensional point sets is currently one of the hot areas of data mining. The current outlier mining approaches based on the distance or the nearest neighbor result in the poor mining results. To solve this problem, this paper investi gates the use of anglebased outlier factor in mining high dimensional outliers. It proposes a novel random projectionbased tech nique that is able to estimate the anglebased outlier factor for all data points in time nearlinear in the size of the data. Also, the approach is suitable to be performed in parallel environment to achieve a parallel speedup. It introduces a theoretical analysis of the quality of approximation to guarantee the reliability of the algorithm. The empirical experiments on synthetic and real world data sets demonstrate that the approach is efficient and scalable to very large high-dimensional data sets.
Keywords:outlier data mining  angle  random projection algorithm  near-linear time  reliability  efficiency
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号