首页 | 本学科首页   官方微博 | 高级检索  
     

基于信息论的高维海量数据离群点挖掘
引用本文:张 净,孙志挥,宋余庆,倪巍伟,晏燕华.基于信息论的高维海量数据离群点挖掘[J].计算机科学,2011,38(7):148-151,161.
作者姓名:张 净  孙志挥  宋余庆  倪巍伟  晏燕华
作者单位:1. 东南大学计算机科学与工程系,南京210096;江苏大学电气与信息工程学院,镇江212001
2. 东南大学计算机科学与工程系,南京,210096
3. 江苏大学计算机科学与通信工程学院,镇江,212001
基金项目:本文受国家自然科学基金(40871176,60841003)资助。
摘    要:针对高维海量数据集离群点挖掘存在“维数灾难”的问题,提出了基于信息论的高维海量数据的离群点挖掘算法。该算法采用属性选择,去除冗余属性降维。利用信息嫡作为离群点判断的度量标准,消除距离和密度量纲的弊端。在真实数据集上的实验结果表明,算法对高维海量数据离群点挖掘是有效可行的,其效率和精度得到了明显提高。

关 键 词:离群点挖掘  信息论  属性选择    互信息

Outlier Mining of the High-dimension Datasets Based on Information Theory
ZHANG Jing,SUN Zhi-hui,SONG Yu-qing,NI Wei-wei,YAN Yan-hua.Outlier Mining of the High-dimension Datasets Based on Information Theory[J].Computer Science,2011,38(7):148-151,161.
Authors:ZHANG Jing  SUN Zhi-hui  SONG Yu-qing  NI Wei-wei  YAN Yan-hua
Affiliation:(Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China);(College of Electronic and Information Engineering,Jiangsu University,Zhenjiang 212001,China);(College of Computer Science and Telecommunications Engineering,Jiangsu University,Zhenjiang 212001,China)
Abstract:Phenomena of "curse of dimensionality" deteriorate lots of existing outlier mining algorithms validity. Conconing thw problem, the outlier mining algorithm of high-dimension and large datasets based on information theory was proposed. This algorithm used the concept of information entropy and the mutual information in the information theory,carried on the feature selection after using estimated mutual information value objective basis entropy power sorting, and eliminated redundant attribute for dimensionality reduction. Outlier mining using information entropy as a measure standard to judge eliminated the drawbacks of distance and density metric. The experimental result in the real data sets indicates that the algorithm for outlicr mining in high-dimensional mass data is effective and feasible, its efficiency and accuracy arc significantly improved.
Keywords:Outlier mining  Information theory  Feature selection  Entropy  Mutual information
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号