首页 | 本学科首页   官方微博 | 高级检索  
     

基于核密度估计的分布数据流离群点检测
引用本文:杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测[J].计算机研究与发展,2005,42(9):1498-1504.
作者姓名:杨宜东  孙志挥  张净
作者单位:1. 东南大学计算机科学与工程系,南京,210096
2. 东南大学计算机科学与工程系,南京,210096;江苏大学电气信息工程学院,镇江,212001
基金项目:国家自然科学基金项目(70371015);教育部高等学校博士学科点科研基金项目(20040286009)
摘    要:基于数据流数据的挖掘算法研究受到了越来越多的重视.针对分布式数据流环境,提出基于核密度估计的分布数据流离群点检测算法.算法将各分布节点上的数据流作为全局数据流的子集,通过分布节点与中心节点的通信,维护基于全局数据流的分布密度估计.各分布节点基于该估计对其上的分布数据流进行离群点检测,从而得到基于全局数据流的离群点集合.对节点之间的交互以及离群点检测算法的细节进行了讨论.通过实验验证了算法的适用性和有效性.

关 键 词:分布数据流  离群点检测  核密度估计
收稿时间:2005-02-28
修稿时间:2005-02-282005-05-27

Finding Outliers in Distributed Data Streams Based on Kernel Density Estimation
Yang Yidong,Sun Zhihui,Zhang Jing.Finding Outliers in Distributed Data Streams Based on Kernel Density Estimation[J].Journal of Computer Research and Development,2005,42(9):1498-1504.
Authors:Yang Yidong  Sun Zhihui  Zhang Jing
Abstract:Recently, there has been occurring more and more applications based on data stream models. Data mining in data stream, such as clustering, classifying, etc, becomes a hot research field. This paper presents an algorithm for outlier detection in distributed data streams. The data stream on every distributed node is taken for a subset of the global data stream, which consists of data on all distributed nodes. Because of huge network traffic, it is impossible to send all data to a central node and do detection. Based on the communication of distribution information between distributed nodes and the central node, the algorithm maintains the density estimation for the union of all streams. On every distributed node, global outliers can be detected by the estimation. Details of communication schedule and outlier detection are also discussed in this paper. Experimental results show promising availabilities of the approach.
Keywords:distributed data streams  outlier detection  kernel density estimation
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号