首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Success of anomaly detection, similar to other spatial data mining techniques, relies on neighborhood definition. In this paper, we argue that the anomalous behavior of spatial objects in a neighborhood can be truly captured when both (a) spatial autocorrelation (similar behavior of nearby objects due to proximity) and (b) spatial heterogeneity (distinct behavior of nearby objects due to difference in the underlying processes in the region) are taken into consideration for the neighborhood definition. Our approach begins by generating micro neighborhoods around spatial objects encompassing all the information about a spatial object. We selectively merge these based on spatial relationships accounting for autocorrelation and inferential relationships accounting for heterogeneity, forming macro neighborhoods. In such neighborhoods, we then identify (i) spatio-temporal outliers, where individual sensor readings are anomalous, (ii) spatial outliers, where the entire sensor is an anomaly, and (iii) spatio-temporally coalesced outliers, where a group of spatio-temporal outliers in the macro neighborhood are separated by a small time lag indicating the traversal of the anomaly. We demonstrate the effectiveness of our approach in neighborhood formation and anomaly detection with experimental results in (i) water monitoring and (ii) highway traffic monitoring sensor datasets. We also compare the results of our approach with an existing approach for spatial anomaly detection.  相似文献   

2.
Exploring spatial datasets with histograms   总被引:2,自引:0,他引:2  
As online spatial datasets grow both in number and sophistication, it becomes increasingly difficult for users to decide whether a dataset is suitable for their tasks, especially when they do not have prior knowledge of the dataset. In this paper, we propose browsing as an effective and efficient way to explore the content of a spatial dataset. Browsing allows users to view the size of a result set before evaluating the query at the database, thereby avoiding zero-hit/mega-hit queries and saving time and resources. Although the underlying technique supporting browsing is similar to range query aggregation and selectivity estimation, spatial dataset browsing poses some unique challenges. In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the contains, contained and the overlap relations. We prove a lower bound on the storage required to answer queries about the contains relation accurately at a given resolution. We then present three storage-efficient approximation algorithms which we believe to be the first to estimate query results about these spatial relations. We evaluate these algorithms with both synthetic and real world datasets and show that they provide highly accurate estimates for datasets with various characteristics. Recommended by: Sunil Prabhakar Work supported by NSF grants IIS 02-23022 and CNF 04-23336. An earlier version of this paper appeared in the 17th International Conference on Data Engineering (ICDE 2001).  相似文献   

3.
A new way of implementing two local anomaly detectors in a hyperspectral image is presented in this study. Generally, most local anomaly detector implementations are carried out on the spatial windows of images, because the local area of the image scene is more suitable for a single statistical model than for global data. These detectors are applied by using linear projections. However, these detectors are quite improper if the hyperspectral dataset is adopted as the nonlinear manifolds in spectral space. As multivariate data, the hyperspectral image datasets can be considered to be low-dimensional manifolds embedded in the high-dimensional spectral space. In real environments, the nonlinear spectral mixture occurs more frequently, and these manifolds could be nonlinear. In this case, traditional local anomaly detectors are based on linear projections and cannot distinguish weak anomalies from background data. In this article, local linear manifold learning concepts have been adopted, and anomaly detection algorithms have used spectral space windows with respect to the linear projection. Output performance is determined by comparison between the proposed detectors and the classic spatial local detectors accompanied by the hyperspectral remote-sensing images. The result demonstrates that the effectiveness of the proposed algorithms is promising to improve detection of weak anomalies and to decrease false alarms.  相似文献   

4.
董林  舒红  李莎 《计算机应用研究》2013,30(8):2330-2333
为简化空间频繁模式挖掘的预处理步骤并提高挖掘效率, 提出一种可以直接以空间矢量和栅格图层作为输入的挖掘算法FISA(fast intersect spatial Apriori)。该算法利用图层求交和面积计算操作实现谓词集支持度计数进而实现频繁谓词集和关联规则挖掘。相对于基于事务空间关联规则挖掘算法, FISA不需要预先进行空间数据事务化处理, 并且所得结果均有对应图层, 便于实现结果的可视化; 相对于其他基于空间分析的挖掘算法, FISA支持空间数据的矢量和栅格格式, 且引入了快速求交方法以保证其可伸缩性。实验结果表明该算法可以直接从空间数据中高效正确地挖掘出频繁模式。  相似文献   

5.
针对现有基于距离的离群点检测算法在处理大规模数据时效率低的问题,提出一种基于聚类和索引的分布式离群点检测(DODCI) 算法。首先利用聚类方法将大数据集划分成簇;然后在分布式环境中的各节点处并行创建各个簇的索引;最后使用两个优化策略和两条剪枝规则以循环的方式在各节点处进行离群点检测。在合成数据集和整理后的KDD CUP数据集上的实验结果显示,在数据量较大时该算法比Orca和iDOoR算法快近一个数量级。理论和实验分析表明,该算法可以有效提高大规模数据中离群点的检测效率。  相似文献   

6.
Networks of dynamic systems, including social networks, the World Wide Web, climate networks, and biological networks, can be highly clustered. Detecting clusters, or communities, in such dynamic networks is an emerging area of research; however, less work has been done in terms of detecting community-based anomalies. While there has been some previous work on detecting anomalies in graph-based data, none of these anomaly detection approaches have considered an important property of evolutionary networks??their community structure. In this work, we present an approach to uncover community-based anomalies in evolutionary networks characterized by overlapping communities. We develop a parameter-free and scalable algorithm using a proposed representative-based technique to detect all six possible types of community-based anomalies: grown, shrunken, merged, split, born, and vanished communities. We detail the underlying theory required to guarantee the correctness of the algorithm. We measure the performance of the community-based anomaly detection algorithm by comparison to a non?Crepresentative-based algorithm on synthetic networks, and our experiments on synthetic datasets show that our algorithm achieves a runtime speedup of 11?C46 over the baseline algorithm. We have also applied our algorithm to two real-world evolutionary networks, Food Web and Enron Email. Significant and informative community-based anomaly dynamics have been detected in both cases.  相似文献   

7.
8.
Comparing, clustering and merging ellipsoids are problems that arise in various applications, e.g., anomaly detection in wireless sensor networks and motif-based patterned fabrics. We develop a theory underlying three measures of similarity that can be used to find groups of similar ellipsoids in p-space. Clusters of ellipsoids are suggested by dark blocks along the diagonal of a reordered dissimilarity image (RDI). The RDI is built with the recursive iVAT algorithm using any of the three (dis) similarity measures as input and performs two functions: (i) it is used to visually assess and estimate the number of possible clusters in the data; and (ii) it offers a means for comparing the three similarity measures. Finally, we apply the single linkage and CLODD clustering algorithms to three two-dimensional data sets using each of the three dissimilarity matrices as input. Two data sets are synthetic, and the third is a set of real WSN data that has one known second order node anomaly. We conclude that focal distance is the best measure of elliptical similarity, iVAT images are a reliable basis for estimating cluster structures in sets of ellipsoids, and single linkage can successfully extract the indicated clusters.  相似文献   

9.
在使用支持向量机分类技术的异常检测系统中,提出控制查全率和查准率的方法,该方法采用遗传算法优化特征选择和训练模型,其中染色体由特征选择和训练模型组成,适应度是用ξα-estimate方法计算的查全率和查准率的组合,通过设置其中一个参数η达到控制查全率和查准率的目的,实验中采用异常检测标准数据分析该方法的使用效果,结果表明随着η增大,查全率也增大,而查准率却减小,使得用户可以通过设置η的值控制查全率和查准率。  相似文献   

10.
Identifying anomalies, especially weak anomalies in constantly changing targets, is more difficult than in stable targets. In this article, we borrow the dynamics metrics and propose the concept of dynamics signature (DS) in multi-dimensional feature space to efficiently distinguish the abnormal event from the normal behaviors of a variable star. The corresponding dynamics criterion is proposed to check whether a star's current state is an anomaly. Based on the proposed concept of DS, we develop a highly optimized DS algorithm that can automatically detect anomalies from millions of stars' high cadence sky survey data in real-time. Microlensing, which is a typical anomaly in astronomical observation, is used to evaluate the proposed DS algorithm. Two datasets, parameterized sinusoidal dataset containing 262,440 light curves and real variable stars based dataset containing 462,996 light curves are used to evaluate the practical performance of the proposed DS algorithm. Experimental results show that our DS algorithm is highly accurate, sensitive to detecting weak microlensing events at very early stages, and fast enough to process 176,000 stars in less than 1 s on a commodity computer.  相似文献   

11.
化探异常识别是成矿预测的重要依据。化探异常识别本质上是一不均衡数据的分类问题。异常识别过程中面临的主要问题是高维数据的处理问题,流形学习通过非线性降维方法实现维数约简。提出了一种基于流形学习的异常识别算法,通过流形学习进行维数约简,结合AdaCost技术,以改善不平衡数据的分类性能。以某锡铜多金属矿床的数据为研究对象进行仿真实验,实验结果表明该算法能够更准确地圈定区域化探异常,为成矿预测与评价提供了新的解决途径。  相似文献   

12.
一种基于聚类的无监督异常检测方法   总被引:2,自引:0,他引:2  
为了解决无监督异常检测方法无法检测突发性的大规模攻击的问题,提出了一种基于聚类的无监督异常检测模型,该模型从多个聚类器中选取DB指数最小的分簇结果,并利用最小簇内距离、最大簇内距离对每个簇进行分类,从而识别出攻击。实验表明该模型明显提高了检测率、降低了误报率。  相似文献   

13.
14.
Anomaly detection in large populations is a challenging but highly relevant problem. It is essentially a multi-hypothesis problem, with a hypothesis for every division of the systems into normal and anomalous systems. The number of hypothesis grows rapidly with the number of systems and approximate solutions become a necessity for any problem of practical interest. In this paper we take an optimization approach to this multi-hypothesis problem. It is first shown to be equivalent to a non-convex combinatorial optimization problem and then is relaxed to a convex optimization problem that can be solved distributively on the systems and that stays computationally tractable as the number of systems increase. An interesting property of the proposed method is that it can under certain conditions be shown to give exactly the same result as the combinatorial multi-hypothesis problem and the relaxation is hence tight.  相似文献   

15.
Polygons provide natural representations for many types of geospatial objects, such as countries, buildings, and pollution hotspots. Thus, polygon-based data mining techniques are particularly useful for mining geospatial datasets. In this paper, we propose a polygon-based clustering and analysis framework for mining multiple geospatial datasets that have inherently hidden relations. In this framework, polygons are first generated from multiple geospatial point datasets by using a density-based contouring algorithm called DCONTOUR. Next, a density-based clustering algorithm called Poly-SNN with novel dissimilarity functions is employed to cluster polygons to create meta-clusters of polygons. Finally, post-processing analysis techniques are proposed to extract interesting patterns and user-guided summarized knowledge from meta-clusters. These techniques employ plug-in reward functions that capture a domain expert’s notion of interestingness to guide the extraction of knowledge from meta-clusters. The effectiveness of our framework is tested in a real-world case study involving ozone pollution events in Texas. The experimental results show that our framework can reveal interesting relationships between different ozone hotspots represented by polygons; it can also identify interesting hidden relations between ozone hotspots and several meteorological variables, such as outdoor temperature, solar radiation, and wind speed.  相似文献   

16.
A real-time anomaly detection solution indicates a continuous stream of operational and labelled data that must satisfy several resources and latency requirements. Traditional solutions to the problem rely heavily on well-defined features and prior supervised knowledge, where most techniques refer to hand-crafted rules derived from known conditions. While successful in controlled situations, these rules assume that good data is available for them to detect anomalies; indicating that these rules will fail to generalise beyond known scenarios.To investigate these issues, current literature is examined for solutions that can be used to detect known and unknown anomalous instances whilst functioning as an out-of-the-box approach for efficient decision-making. The applicability of the isolation forest is discussed for engineering applications using the Aero-Propulsion System Simulation dataset as a benchmark where it is shown to outperform other unsupervised distance-based approaches. Also, the authors have carried out real-time experiments on an unmanned aerial vehicle to highlight further applications of the method. Finally, some conclusions are drawn concerning its simplicity and robustness in handling diagnostic problems.  相似文献   

17.
Two exploratory data analysis techniques the comap and the quad plot are shown to have both strengths and shortcomings when analysing spatial multivariate datasets. A hybrid of these two techniques is proposed: the quad map which is shown to overcome the outlined shortcomings when applied to a dataset containing weather information for disaggregate incidents of urban fires. Common to the quad plot, the quad map uses Polya models in order to articulate the underlying assumptions behind histograms. The Polya model formalises the situation in which past fire incident counts are computed and displayed in (multidimensional) histograms as appropriate assessments of conditional probability providing valuable diagnostics such as posterior variance i.e. sensitivity to new information. Finally we discuss how new technology in particular Online Analytics Processing (OLAP) and Geographical Information Systems (GISs) offer potential in automating exploratory spatial data analyses techniques, such as the quad map.  相似文献   

18.
The motivation for regional association rule mining and scoping is driven by the facts that global statistics seldom provide useful insight and that most relationships in spatial datasets are geographically regional, rather than global. Furthermore, when using traditional association rule mining, regional patterns frequently fail to be discovered due to insufficient global confidence and/or support. In this paper, we systematically study this problem and address the unique challenges of regional association mining and scoping: (1) region discovery: how to identify interesting regions from which novel and useful regional association rules can be extracted; (2) regional association rule scoping: how to determine the scope of regional association rules. We investigate the duality between regional association rules and regions where the associations are valid: interesting regions are identified to seek novel regional patterns, and a regional pattern has a scope of a set of regions in which the pattern is valid. In particular, we present a reward-based region discovery framework that employs a divisive grid-based supervised clustering for region discovery. We evaluate our approach in a real-world case study to identify spatial risk patterns from arsenic in the Texas water supply. Our experimental results confirm and validate research results in the study of arsenic contamination, and our work leads to the discovery of novel findings to be further explored by domain scientists.  相似文献   

19.
针对基于日志聚类的异常检测方法(LogCluster)处理的日志类型单一的问题,提出一种改进的基于LogCluster的日志异常检测方法,SW-LogCluster。通过使用滑动窗口(sliding window)的方式将日志划分为日志序列,将划分后的日志序列向量化来进行特征提取,使其既能检测带标记符的日志,也能检测不带标记符的日志,扩展原始方法的应用范围。实验结果表明,SW-LogCluster方法能对所有类型的非结构化日志进行检测,有效扩展了LogCluster方法的适用性。  相似文献   

20.
This paper aims to address the problem of modelling video behaviour captured in surveillancevideos for the applications of online normal behaviour recognition and anomaly detection. A novelframework is developed for automatic behaviour profiling and online anomaly sampling/detectionwithout any manual labelling of the training dataset. The framework consists of the followingkey components: (1) A compact and effective behaviour representation method is developed basedon discrete scene event detection. The similarity between behaviour patterns are measured basedon modelling each pattern using a Dynamic Bayesian Network (DBN). (2) Natural grouping ofbehaviour patterns is discovered through a novel spectral clustering algorithm with unsupervisedmodel selection and feature selection on the eigenvectors of a normalised affinity matrix. (3) Acomposite generative behaviour model is constructed which is capable of generalising from asmall training set to accommodate variations in unseen normal behaviour patterns. (4) A run-timeaccumulative anomaly measure is introduced to detect abnormal behaviour while normal behaviourpatterns are recognised when sufficient visual evidence has become available based on an onlineLikelihood Ratio Test (LRT) method. This ensures robust and reliable anomaly detection and normalbehaviour recognition at the shortest possible time. The effectiveness and robustness of our approachis demonstrated through experiments using noisy and sparse datasets collected from both indoorand outdoor surveillance scenarios. In particular, it is shown that a behaviour model trained usingan unlabelled dataset is superior to those trained using the same but labelled dataset in detectinganomaly from an unseen video. The experiments also suggest that our online LRT based behaviourrecognition approach is advantageous over the commonly used Maximum Likelihood (ML) methodin differentiating ambiguities among different behaviour classes observed online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号