基于网格的数据分析方法以网格为单位处理数据,避免了数据对象点对点的计算,极大提高了数据分析的效率。但是,传统基于网格的方法在数据分析过程中独立处理网格,忽略了网格之间的耦合关系,影响了分析的精确度。在应用网格检测数据流异常的过程中不再独立处理网格,而是考虑了网格之间的耦合关系,提出了一种基于网格耦合的数据流异常检测算法GCStream-OD。该算法通过网格耦合精确地表达了数据流对象之间的相关性,并通过剪枝策略提高算法的效率。在5个真实数据集上的实验结果表明,GCStream-OD算法具有较高的异常检测质量和效率。  相似文献   

孤立数据的存在使数据挖掘结果不准确,甚至错误。现有的孤立点检测算法在通用性、有效性、用户友好性及处理高维大数据集的性能还不完善,为此,提出一种有效的全局孤立点检测方法,该方法进行凝聚层次聚类,根据聚类树和距离矩阵来可视化判断数据孤立程度,确定孤立点数目。从聚类树自顶向下,无监督地去除离群数据点。在多个数据集上的仿真实验结果表明,该方法能有效识别孤立程度最大的前n个全局孤立点,适用于不同形状的数据集,算法效率高,用户友好,且适用于大型高维数据集的孤立点检测。  相似文献   

Spatial outliers represent locations which are significantly different from their neighborhoods even though they may not be significantly different from the entire population. Identification of spatial outliers can lead to the discovery of unexpected, interesting, and implicit knowledge, such as local instability. In this paper, we first provide a general definition of S-outliers for spatial outliers. This definition subsumes the traditional definitions of spatial outliers. Second, we characterize the computation structure of spatial outlier detection methods and present scalable algorithms. Third, we provide a cost model of the proposed algorithms. Finally, we experimentally evaluate our algorithms using a Minneapolis-St. Paul (Twin Cities) traffic data set.  相似文献   

为打破“数据孤岛”,推动气象数据跨部门融合使用,充分发挥气象数据科学价值,针对海南省气象部门与外部门数据共享方式杂乱、共享水平低、管理滞后等问题,依托专线网络,从气象数据需求类型、数据格式出发,设计了一套气象数据跨部门的分发处理流程。在气象大数据云平台数据环境的基础上,整合各类数据资源,基于B/S架构、SSH框架,采用分布式存储技术、数据缓存技术,建立了跨部门跨行业的气象数据共享平台。目前,该平台已经投入业务使用, 为海南省气象数据跨部门跨行业使用提供稳定、高效的实时数据共享服务,取得了较好的应用效果。  相似文献   

传统的离群点挖掘算法无法有效挖掘数据流中的离群点。针对数据流的无限输入和动态变化等特点,提出一种新的基于距离的数据流离群点挖掘算法。通过Hoeffding定理及独立同分布中心极限定理,对数据流概率分布变化进行动态检测,利用检测结果自适应调整滑动窗口大小对数据流离群点进行挖掘。实验结果表明,该算法在人工数据集和真实数据集KDD-CUP99中可以对数据流中的离群点进行有效挖掘。  相似文献   

在分析了当前基于距离的离群数据挖掘算法的基础上,提出了一种基于SOM的离群数据挖掘集成框架,其具有可扩展性、可预测性、交互性、适应性、简明性等特征.实验结果表明,基于SOM的离群数据挖掘是有效的.  相似文献   

The ever-increasing volume of spatial data has greatly challenged our ability to extract useful but implicit knowledge from them. As an important branch of spatial data mining, spatial outlier detection aims to discover the objects whose non-spatial attribute values are significantly different from the values of their spatial neighbors. These objects, called spatial outliers, may reveal important phenomena in a number of applications including traffic control, satellite image analysis, weather forecast, and medical diagnosis. Most of the existing spatial outlier detection algorithms mainly focus on identifying single attribute outliers and could potentially misclassify normal objects as outliers when their neighborhoods contain real spatial outliers with very large or small attribute values. In addition, many spatial applications contain multiple non-spatial attributes which should be processed altogether to identify outliers. To address these two issues, we formulate the spatial outlier detection problem in a general way, design two robust detection algorithms, one for single attribute and the other for multiple attributes, and analyze their computational complexities. Experiments were conducted on a real-world data set, West Nile virus data, to validate the effectiveness of the proposed algorithms.
针对过程工业中强噪声环境下实时采集的控制过程海量数据难以在线精确检测的问题,提出了基于阶数自学习自回归隐马尔可夫模型(ARHMM)的工业控制过程异常数据在线检测方法.该算法采用自同归(AR)模型对时间序列进行拟合,利用隐马尔科夫模型(HMM)作为数据检测的工具,避免了传统检测方法中需要预先设定检测阈值的问题,并将传统的...  相似文献   

气象数据在传输过程中,可能会出现数据丢失、重复、错误等问题,为准确分析气象问题表现情况,实现对气象问题的实时监控,设计了基于Hadoop技术的气象数据实时传输监控系统。联合JTAG调试电路与数据采集器,调节风速风向、雨雪、雷电、温湿度四个基础监控模块,完成气象数据实时传输监控系统的硬件设计。根据Hadoop架构体系连接形式,搭建Hadoop架构体系,预处理气象数据。在此基础上,匹配数据库机制与业务信息表单,实现应用系统的实时传输监控功能,再联合相关系统硬件,完成基于Hadoop技术的气象数据实时传输监控系统的设计。实验结果表明,在本文设计系统作用下,能够有效实现对风、雨、雪、雷四种气象问题的实时监测,可以准确分析气象问题表现情况。  相似文献   

受到灾害防御区外界环境干扰,观测站服务数据监测结果不精准,提出灾害防御区域气象观测站服务数据监测系统设计;利用TS910测控通信设备采集区域灾害防御气象观测站的服务数据,结合现场的视频图像,获取预警信息;报警子系统采用不同颜色的灾害预警指示灯,结合采用控制中心和探测引擎构建防御子系统,抵御外部攻击,避免受到外部环境的干扰;利用ArcGIS模块处理业务数据,根据所设计的监控和预警功能,结合WebSocket通信协议设计出三维可视化流程,并将结果实时地显示在计算机上;将冷空气等级作为评判未来天气趋势预判指标,以此为实验对象进行验证分析;由实验结果可知,该系统天气数据监测结果与实际结果一致,具有精准监测效果。  相似文献   

In this paper, we develop a novel framework, called Monitoring Vehicle Outliers based on a Clustering technique (MVOC), for monitoring vehicle outliers caused by complex vehicle states. The vehicle outlier monitoring is a method to continuously check the current vehicle conditions. Most of previous monitoring methods have conducted simple operations depending on uncomplicated analyses or expected lifetimes in regard to vehicle components. However, many serious vehicle outliers such as turning off during a drive result from the complex vehicle states influenced by correlated components. The proposed method monitors the current vehicle conditions based on not simple components like the previous methods but more complex and various vehicle states using a clustering technique. We perform vehicle data clustering and then analyze the generated clusters with information of vehicle outliers caused by complex correlations of vehicle components. Thus, we can learn vehicle information in more detail. To facilitate MVOC, we also propose related techniques such as sampling cluster data with representative attributes and deciding cluster characteristics on the basis of relations between vehicle data and states. Then, we demonstrate the performance of our approach in terms of monitoring vehicle outliers on the basis of real complex correlations between outliers and vehicle data through various experiments. Experimental results show that the proposed method can not only monitor the complex outliers by predicting their occurrence possibilities in advance but also outperform a standard technique. Moreover, we present statistical significance of the results through significance tests.  相似文献   

In this work, we focus on distance-based outliers in a metric space, where the status of an entity as to whether it is an outlier is based on the number of other entities in its neighborhood. In recent years, several solutions have tackled the problem of distance-based outliers in data streams, where outliers must be mined continuously as new elements become available. An interesting research problem is to combine the streaming environment with massively parallel systems to provide scalable stream-based algorithms. However, none of the previously proposed techniques refer to a massively parallel setting. Our proposal fills this gap and investigates the challenges in transferring state-of-the-art techniques to Apache Flink, a modern platform for intensive streaming analytics. We thoroughly present the technical challenges encountered and the alternatives that may be applied, of which a micro-clustering-based one is the most efficient. We show speed-ups of up to 2.27 times over advanced non-parallel solutions, by using just an ordinary four-core machine and a real-world dataset. When moving to a three-machine cluster, due to less contention, we manage to achieve both better scalability in terms of the window slide size and the data dimensionality, and even higher speed-ups, e.g., by a factor of more than 11X. Overall, our results demonstrate that outlier mining can be achieved in an efficient and scalable manner. The resulting techniques have been made publicly available as open-source software.  相似文献   

Anomaly detection is considered an important data mining task, aiming at the discovery of elements (known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria leads to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x. The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility in the sense that they assume a single configuration of the k and R parameters. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, are more efficient than previously proposed techniques and offer significant flexibility with regard to the input parameters. Experiments performed on real-life and synthetic data sets verify our theoretical study.  相似文献   

空间孤立点是指与邻居具有不连续性的空间点,或者是偏离观测值以至使人们认为是由不同的体系产生的。空间孤立点检测在交通、生态、公共安全、卫生健康、地震、海啸等领域有广泛应用。传统的根据一个非空间属性值进行孤立点判断的方法客易引起孤立点判断失误。作者在针对多个属性进行考虑的基础上,提出以空间维确定邻居关系,非空间维定义距离函数,使用Mahalanobis距离检测孤立点,研究一种新的检测空间孤立点的算法。并时时间复杂度进行分析。仿真实验说明算法可以有效地发现大规模空间数据中的孤立点。  相似文献   

由于受周围环境干扰和传感器短暂性失效等因素的影响,无人机风场测量数据中包含较大野值或成片野值,影响测量数据的准确性.根据无人机测风的特点,结合Kalman滤波算法、强跟踪滤波算法和抗野值修正算法的优点,通过对滤波发散趋势的分析,提出上述滤波算法的使用条件,构造了一种抗野值抑制发散滤波算法,并进行了实验论证.实验结果表明:该算法能有效克服野值对滤波造成的不良影响,具有良好的抗野值能力、跟踪能力,保证了滤波精度,可适用于无人机风场测量.  相似文献   

本文针对时序数据提出了一种基于小波的异常探测算法。首先应用小波变换将时域空间的时序数据分解成不同的频率成份,通过低频信号的特性缩短待处理的数据处理。对于变换后的数据,再采用基于密度的LOF异常探测方法挖掘异常数据。最后,对某烟草公司的烟叶收购数据序列进行了实验,结果表明了该算法的有效性。  相似文献   

The region quadtree is a very popular hierarchical data structure for the representation of binary images (regional data) and it is heavily used at the physical level of many spatial databases. Random sampling algorithms obtain approximate answers of aggregate queries on these databases efficiently. In the present report, we examine how four different sampling methods are applied to specific quadtree implementations (to the most widely used linear implementations). In addition, we examine how two probabilistic models (a parametric model of random images and a model of random trees) can be used for analysing the cost of these methods.  相似文献   

数据挖掘以发现常规模式为主体,但离群数据在欺诈分析及安全领域具有重要分析价值,离群数据检测已成为数据挖掘的重要内容。对聚类与分类以及关联规则分析中典型的常规数据挖掘算法如何处理离群数据进行全面分析与总结,讨论了BIRCH、CURE、Chameleon、DBSCAN以及基于共享最近邻的聚类算法以及基于不平衡分类和基于非频繁模式的离群检测技术,给出了一种利用K-最近邻算法的离群数据检测方法,并报告了测试结果。  相似文献   

