首页 | 本学科首页   官方微博 | 高级检索  
     

一种面向高维混合属性数据的异常挖掘算法
引用本文:李庆华,李新,蒋盛益.一种面向高维混合属性数据的异常挖掘算法[J].计算机应用,2005,25(6):1353-1356.
作者姓名:李庆华  李新  蒋盛益
作者单位:1.华中科技大学计算机科学与技术学院; 2.衡阳师范学院计算机系
基金项目:国家自然科学基金资助项目(60273075)
摘    要:异常检测是数据挖掘领域研究的最基本的问题之一,它在欺诈甄别、气象预报、客户分类和入侵检测等方面有广泛的应用。针对网络入侵检测的需求提出了一种新的基于混合属性聚类的异常挖掘算法,并且依据异常点(outliers)是数据集中的稀有点这一本质,给出了一种新的数据相似性和异常度的定义。本文所提出算法具有线性时间复杂度,在KDDCUP99和WisconsinPrognosisBreastCancer数据集上的实验表明,算本法在提供了近似线性时间复杂度和很好的可扩展性的同时,能够较好的发现数据集中的异常点。

关 键 词:异常检测    聚类    数据挖掘
文章编号:1001-9081(2005)06-1353-04

New approach for outlier detection in high dimensional dataset with mixed attributes
LI Qing-hua,LI Xin,JIANG Sheng-yi.New approach for outlier detection in high dimensional dataset with mixed attributes[J].journal of Computer Applications,2005,25(6):1353-1356.
Authors:LI Qing-hua  LI Xin  JIANG Sheng-yi
Affiliation:1. Department of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan Hubei 430074, China; 2. Department of Computer Science, Hengyang Normal University,Hengyang Hunan 421008, China
Abstract:The outlier detection problem has important applications in the fields of fraud detection, weather prediction, customer segmentation1 and intrusion detection. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. In this paper we proposed a new algorithm to detect outlier in high dimensional domains with mixed attributes based on clustering, and proposed a new method to measure similarity and outlyingness of objects. The algorithm we proposed can give near linear performance. The experimental results on KDDCUP99 and Wisconsin Breast Cancer dataset show that our algorithm is not only effective and scalable but also leads to reasonable good accuracy.
Keywords:outlier detection  clustering  data ming
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号