首页 | 本学科首页   官方微博 | 高级检索  
     

基于k均值分区的数据流离群点检测算法
引用本文:倪巍伟,陆介平,陈耿,孙志挥.基于k均值分区的数据流离群点检测算法[J].计算机研究与发展,2006,43(9):1639-1643.
作者姓名:倪巍伟  陆介平  陈耿  孙志挥
作者单位:东南大学计算机科学与工程学院,南京,210096
基金项目:国家自然科学基金;高等学校博士学科点专项科研项目;江苏省自然科学基金
摘    要:离群知识发现是数据挖掘研究的一个重要方面,数据流离群点挖掘更因其挖掘对象具有动态性、不可复读性、数据量大等特点而成为离群知识发现研究的一个难点.提出一种基于k均值分区的流数据离群点发现算法,先对数据流进行分区做k均值聚类生成中间聚类结果(均值参考点集),随后在这些均值参考点中,根据离群点的定义找出可能存在的离群点.理论分析和实验结果表明,算法可以有效解决数据流离群点检测问题,算法是有效可行的.

关 键 词:数据挖掘  离群点检测  均值参考点  聚合
收稿时间:11 13 2005 12:00AM
修稿时间:2005-11-132006-04-25

An Efficient Data Stream Outliers Detection Algorithm Based on k-Means Partitioning
Ni Weiwei,Lu Jieping,Chen Geng,Sun Zhihui.An Efficient Data Stream Outliers Detection Algorithm Based on k-Means Partitioning[J].Journal of Computer Research and Development,2006,43(9):1639-1643.
Authors:Ni Weiwei  Lu Jieping  Chen Geng  Sun Zhihui
Affiliation:College of Computer Science and Engineering, Southeast University, Nanjing 210096
Abstract:Outliers detection is an important issue in data mining. It is difficult to find outliers in data streams because data streams are dynamic, one pass readable and of large amount of data. In this paper, a data stream outliers detection algorithm based on k-means partioning-DSOKP is proposed, which applies k means clustering on each partition of the data stream to generate mean reference point set, and subsequently picks out those potential outliers of each periods according to the definition of outliers. Theoretic analysis and experimental results indicate that DSOKP is effective and efficient.
Keywords:data mining  outliers detection  mean reference point  clustering
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号