首页 | 本学科首页   官方微博 | 高级检索  
     

数据流环境下基于距离的离群点检测算法
引用本文:祝一帆,安云哲,夏秀峰.数据流环境下基于距离的离群点检测算法[J].计算机系统应用,2023,32(1):214-223.
作者姓名:祝一帆  安云哲  夏秀峰
作者单位:沈阳航空航天大学 计算机学院, 沈阳 110136
基金项目:国家自然科学基金(61702344)
摘    要:面向滑动窗口的连续离群点检测问题是数据流管理领域中的重要问题.该问题在信用卡欺诈检测、网络入侵防御,地质灾害预警等诸多领域发挥着重要作用.现有算法大多需要利用范围查询判断对象之间的位置关系,而范围查询的查询代价大,无法满足实时性要求.本文提出基于滑动窗口模型下的查询处理框架GBEH(grid-based excepted heap).首先,它以网格为基础构建索引GQBI(grid queue based index)管理数据流.该索引一方面维护数据流之间的位置关系,另一方面利用队列维护数据流的时序关系.其次, GBEH提出离群点检测算法PBH(priority based heap).该算法利用查询范围与网格单元格的相交面积计算该单元格中包含于查询范围对象数目的数学期望,并以此为基础构建基于小顶堆执行范围查询,从而有效降低范围查询代价,实现高效检测.理论分析和实验验证GBEH的高效性和稳定性.

关 键 词:数据流  离群点  基于距离  对象维护
收稿时间:2022/4/28 0:00:00
修稿时间:2022/6/1 0:00:00

Distance-based Outlier Detection Algorithm in Data Streams
ZHU Yi-Fan,AN Yun-Zhe,XIA Xiu-Feng.Distance-based Outlier Detection Algorithm in Data Streams[J].Computer Systems& Applications,2023,32(1):214-223.
Authors:ZHU Yi-Fan  AN Yun-Zhe  XIA Xiu-Feng
Affiliation:College of Computer Science, Shenyang Aerospace University, Shenyang 110136, China
Abstract:The detection of continuous outliers for sliding windows is an important problem in data stream management, which plays an important role in many fields such as credit card fraud detection, network intrusion prevention, and early warning for geological hazards. Most of the existing algorithms require the use of the range query to determine the positional relationship between objects, but the cost of the range query is usually high, which cannot meet real-time requirements. Therefore, this study proposes the grid-based excepted heap (GBEH), a query processing framework based on sliding windows. Specifically, GBEH proposes a grid queue based index (GQBI) on the basis of the grid to manage data streams, which maintains the positional relationship between data streams and the temporal relationship of data streams. Furthermore, GBEH proposes an outlier detection algorithm, namely, the priority based heap. This algorithm calculates the mathematical expectation of the number of objects in the cell that is included in the query range by use of the intersection area of the query range and the cell and on this basis, establishes an execution range query based on the min-heap. In this way, it effectively reduces the cost of range queries and achieves efficient detection. Theoretical analysis and experiments verify the efficiency and stability of GBEH.
Keywords:data stream  outlier  distance-based  objects maintenance
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号