共查询到20条相似文献,搜索用时 0 毫秒
1.
In massive spatio-temporal datasets, anomalies that deviate from the global or local distributions are not just useless noise but possibly imply significant changes, surprising patterns, and meaningful insights, and because of this, detection of spatio-temporal anomalies has become an important research hotspot in spatio-temporal data mining. For spatio-temporal flow data (e.g., traffic flow data), the existing anomaly detection methods cannot handle the embedded dynamic characteristic. Therefore, this paper proposes the approach of constructing dynamic neighbourhoods to detect the anomalies in spatio-temporal flow data (called spatio-temporal flow anomalies). In this approach, the dynamic spatio-temporal flow is first modelled based on the real-time attribute values of the flow data, e.g., the velocity of vehicles. The dynamic neighbourhoods are then constructed by considering attribute similarity in the spatio-temporal flow. On this basis, global and local anomalies are detected by employing the idea of the G⁎ statistic and the problem of multiple hypothesis testing is further addressed to control the false discovery rate. The effectiveness and practicality of our proposed approach are demonstrated through comparative experiments on traffic flow data from the central road network of central London for both weekdays and weekends. 相似文献
2.
When monitoring safety levels in deep pit foundations using sensors, anomalies (e.g., highly correlated variables) and noise (e.g., high dimensionality) exist in the extracted time series data, impacting the ability to assess risks. Our research aims to address the following question: How can we detect anomalies and de-noise monitoring data from sensors in real time to improve its quality and use it to assess geotechnical safety risks? In addressing this research question, we develop a hybrid smart data approach that integrates Extended Isolation Forest and Variational Mode Decomposition models to detect anomalies and de-noise data effectively. We use real-life data obtained from sensors to validate our smart data approach while constructing a deep pit foundation. Our smart data approach can detect anomalies with a root mean square error and signal-to-noise ratio of 0.0389 and 24.09, respectively. To this end, our smart data approach can effectively pre-process data enabling improved decision-making and the management of safety risks. 相似文献
3.
4.
Ji Zhang Qigang Gao Hai Wang Hua Wang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(6):1195-1215
In this paper, we study the problem of anomaly detection in wireless network streams. We have developed a new technique, called
Stream Projected Outlier deTector (SPOT), to deal with the problem of anomaly detection from multi-dimensional or high-dimensional
data streams. We conduct a detailed case study of SPOT in this paper by deploying it for anomaly detection from a real-life
wireless network data stream. Since this wireless network data stream is unlabeled, a validating method is thus proposed to
generate the ground-truth results in this case study for performance evaluation. Extensive experiments are conducted and the
results demonstrate that SPOT is effective in detecting anomalies from wireless network data streams and outperforms existing
anomaly detection methods. 相似文献
5.
Chen Change Loy Author Vitae Tao Xiang Author Vitae Author Vitae 《Pattern recognition》2011,44(1):117-132
This paper aims to address the problem of anomaly detection and discrimination in complex behaviours, where anomalies are subtle and difficult to detect owing to the complex temporal dynamics and correlations among multiple objects’ behaviours. Specifically, we decompose a complex behaviour pattern according to its temporal characteristics or spatial-temporal visual contexts. The decomposed behaviour is then modelled using a cascade of Dynamic Bayesian Networks (CasDBNs). In contrast to existing standalone models, the proposed behaviour decomposition and cascade modelling offers distinct advantage in simplicity for complex behaviour modelling. Importantly, the decomposition and cascade structure map naturally to the structure of complex behaviour, allowing for a more effective detection of subtle anomalies in surveillance videos. Comparative experiments using both indoor and outdoor data are carried out to demonstrate that, in addition to the novel capability of discriminating different types of anomalies, the proposed framework outperforms existing methods in detecting durational anomalies in complex behaviours and subtle anomalies that are difficult to detect when objects are viewed in isolation. 相似文献
6.
Modelling spatio-temporal environmental data 总被引:1,自引:0,他引:1
Jussi Rasinmki 《Environmental Modelling & Software》2003,18(10):877-Technology
A conceptual model for environmental data is presented with special emphasis on the ability to store spatio-temporal references of the data. Other aspects of the model are the ability to handle hierarchical data and semantics of the measurements. The model was tested with an implementation on an object-relational database management system. As a part of the test implementation, a forestry data set covering 75 years and 4900 hectares was loaded onto the database. 相似文献
7.
Having an effective data structure regards to fast data changing is one of the most important demands in spatio-temporal data. Spatio-temporal data have special relationships in regard to spatial and temporal values. Both types of data are complex in terms of their numerous attributes and the changes exhibited over time. A data model that is able to increase the performance of data storage and inquiry responses from a spatio-temporal system is demanded. The structure of the relationships between spatio-temporal data mimics the biological structure of the hair, which has a ‘Root’ (spatial values) and a ‘Shaft’ (temporal values) and undergoes growth. This paper aims to show the mathematical formulation of a Hair-Oriented Data Model (HODM) for spatio-temporal data and to demonstrate the model's performance by measuring storage size and query response time. The experiment was conducted by using more than 178,000 records of climate change spatio-temporal data that were implemented in implemented in an object-relational database using nested tables. The data structure and operations are implemented by SQL statements that are related to the concepts of Object-Relational databases. The performances of file storage and execution query are compared using a tabular and normalized entity relationship model that engages various types of queries. The results show that HODM has a lower storage size and a faster query response time for all studied types of spatio-temporal queries. The significances of the work are elaborated by doing comparison with the generic data models. The experimental results showed that the proposed data model is easier to develop and more efficient. 相似文献
8.
Jose R. Rios Viqueira Nikos A. Lorentzos 《The VLDB Journal The International Journal on Very Large Data Bases》2007,16(2):179-200
An SQL extension is formalized for the management of spatio-temporal data, i.e. of spatial data that evolves with respect
to time. The extension is dedicated to applications such as topography, cartography, and cadastral systems, hence it considers
discrete changes both in space and in time. It is based on the rigid formalization of data types and of SQL constructs. Data types are defined in terms of time and
spatial quanta. The SQL constructs are defined in terms of a kernel of few relational algebra operations, composed of the well-known operations of the 1NF model and of two more, Unfold and Fold. In conjunction with previous work, it enables the uniform management of 1NF structures that may contain not only spatio-temporal
but also either purely temporal or purely spatial or conventional data. The syntax and semantics of the extension is fully
consistent with the {SQL:2003} standard. 相似文献
9.
Existing studies on episode mining mainly concentrate on the discovery of (global) frequent episodes in sequences. However, frequent episodes are not suited for data streams because they do not capture the dynamic nature of the streams. This paper focuses on detecting dynamic changes in frequencies of episodes over time-evolving streams. We propose an efficient method for the online detection of abrupt emerging episodes and abrupt submerging episodes over streams. Experimental results on synthetic data show that the proposed method can effectively detect the defined patterns and meet the strict requirements of stream processing, such as one-pass, real-time update and return of results, plus limited time and space consumption. Experimental results on real data demonstrate that the patterns detected by our method are natural and meaningful. The proposed method has wide applications in stream monitoring and analysis as the discovered patterns indicate dynamic emergences/disappearances of noteworthy events/phenomena hidden in the streams. 相似文献
10.
The research presented in this paper supports the identification of common subexpressions as candidates for potential materialized views that form the basis of multiple query optimization in a loosely-coupled distributed system where query expressions access heterogeneous data sources, including relations and data-centric XML. This paper introduces a unifying mixed multigraph formalism to represent SQL, XQuery, and LINQ queries in a common query graph model and a heuristics-based algorithm to detect common subexpressions. The identified common subexpressions represent an opportunity for defining a materialized view to avoid repeating computation. The common subexpressions may access only relations, only XML, or a combination of relations and XML. The mixed multigraph model and the heuristic rules presented in this paper have distinguished advantages over the existing approaches that consider only relational or XML data sources individually. The mixed multigraph model can present SQL, XQuery, and LINQ queries in a single graph model and the heuristic rules are designed to consider the identical and subsumed conditions at the same time. A prototype implementation of the algorithm illustrates the applicability of the approach using various examples from the research literature as well as scenarios over a Criminal Justice enterprise that include common subexpressions across relational and XML data sources. 相似文献
11.
12.
《Journal of Visual Languages and Computing》2007,18(3):255-279
Spatio-temporal data sets are often very large and difficult to analyze and display. Since they are fundamental for decision support in many application contexts, recently a lot of interest has arisen toward data-mining techniques to filter out relevant subsets of very large data repositories as well as visualization tools to effectively display the results. In this paper we propose a data-mining system to deal with very large spatio-temporal data sets. Within this system, new techniques have been developed to efficiently support the data-mining process, address the spatial and temporal dimensions of the data set, and visualize and interpret results. In particular, two complementary 3D visualization environments have been implemented. One exploits Google Earth to display the mining outcomes combined with a map and other geographical layers, while the other is a Java3D-based tool for providing advanced interactions with the data set in a non-geo-referenced space, such as displaying association rules and variable distributions. 相似文献
13.
14.
本体是由特定信息领域中的相关术语集合及这些术语之间的关联所组成的,是语义丰富的元数据,通过它可以获取关于底层数据库的相关信息。基于现有的地理数据库和已创建的地理信息领域本体,建立了适合于地理数据集的应用本体数据库;通过描述逻辑指定相应的规则知识,在空间数据库和本体数据库之间、本体库之间分别建立一定的关联关系,提出了本体驱动的时空数据查询方法。当需要对某个时空实体对象进行查询时,通过在本体数据库中进行的逻辑运算,从而得到查询结果,最后返回查询结果。并以数字烟草中烟草的种植查询为例,验证了该方法的可行性和有效性。 相似文献
15.
网络流量在正常运行的情况下是具有一定的周期性、稳定性的,异常流量会打破这种规律使流量产生异常波动。提出了一种基于NetFlow时间序列滑动窗口检测网络异常的方法,利用时间序列异常发现算法发现网络流量的异常波动从而实现了实时高效的异常流量发现及预警。已经被检测到的网络异常会持续产生预警信息并影响后续的异常检测,为此还提出了两种平抑异常的方法。实验结果表明该方法能够有效地发现网络异常。 相似文献
16.
Donghwan Kim Alain Laraque Raphael M. Tshimanga Ting Yuan Hahn Chul Jung 《International journal of remote sensing》2017,38(23):7021-7040
Previous studies using synthetic aperture radar (SAR) backscattering coefficients have been used to distinguish vegetation types, to monitor flood conditions, and to assess soil moisture variations over the wetlands. Here, we attempted to estimate spatio-temporal water level variations over the central Congo mainstem covered with aquatic plants using the backscattering coefficients from the Advanced Land Observing Satellite (ALOS) Phased Array type L-band Synthetic Aperture Radar (PALSAR) Scanning SAR (ScanSAR) images and water levels from Envisat altimetry data. First, permanent open water, forest, macrophytes, and herbaceous plants have been classified over the central Congo Basin based on statistics of the backscattering coefficient values. Second, we generated multi-temporal water level maps over part of the Congo mainstem based on the relationship between Envisat altimetry-derived river-level changes and PALSAR ScanSAR backscattering coefficient variations. Finally, the water level maps were validated with Ice, Cloud and land Elevation Satellite (ICESat) altimetry-derived water levels. We obtained overall root mean square difference (RMSD) of 67.27 cm at 100-m scale resolution of PALSAR ScanSAR. Our study shows that we can obtain reasonable estimates of water levels of the rivers covered with seasonally floating or emergent macrophytes from backscattering coefficients. Furthermore, it is expected that the generated water level maps can be used as a ‘true’ data set to perform pre-launch study of the Surface Water Ocean Topography (SWOT) mission to be launched in 2021. 相似文献
17.
Publishing and sharing open government data in Linked Data format provides many opportunities in terms of data aggregation/integration and creation of information mashups. Statistical data, that contains various performance indicators and their evolution through time, is an example of data that can be used as the foundation for policy prediction, planning and adjustments, and can be re-used in different applications. However, due to Linked Data being relatively a new field, currently there is a lack of tools that enable efficient exploration and analysis of linked geospatial statistical datasets. Therefore, ESTA-LD (Exploratory Spatio-Temporal Analysis) tool was developed to address some of the Linked statistical Data management issues, such as crossing the statistical and the geographical dimensions, producing statistical maps, visualizing different measures, and comparing statistical indicators of different regions through time. This paper discusses the modeling approach that was adopted so that the published data conform to the established standards for representing statistical, spatial and temporal data in Linked Data format. The main contribution is related to the delivery of state-of-the-art open-source tools for retrieving, quality assessment, exploration and analysis of statistical Linked Data that is made available through a SPARQL endpoint. 相似文献
18.
19.
20.
Deepak Agarwal 《Knowledge and Information Systems》2007,11(1):29-44
We consider the problem of detecting anomalies in data that arise as multidimensional arrays with each dimension corresponding to the levels of a categorical variable. In typical data mining applications, the number of cells in such arrays are usually large. Our primary focus is detecting anomalies by comparing information at the current time to historical data. Naive approaches advocated in the process control literature do not work well in this scenario due to the multiple testing problem—performing multiple statistical tests on the same data produce excessive number of false positives. We use an empirical Bayes method which works by fitting a two-component Gaussian mixture to deviations at current time. The approach is scalable to problems that involve monitoring massive number of cells and fast enough to be potentially useful in many streaming scenarios. We show the superiority of the method relative to a naive “per component error rate” procedure through simulation. A novel feature of our technique is the ability to suppress deviations that are merely the consequence of sharp changes in the marginal distributions. This research was motivated by the need to extract critical application information and business intelligence from the daily logs that accompany large-scale spoken dialog systems. We illustrate our method on one such system.
Deepak Agarwal received his Ph.D. in statistics in 2001 from the University of Connecticut, Storrs. He was a research staff member at AT&T Research Labs from 2001 to 2005 and is currently a Senior Research Scientist at Yahoo! Research. His main research interests are in the areas of time series analysis, anomaly detection, social networks, and hierarchical Bayesian models. He received the best application paper award at the Siam Data Mining Conference in 2004 and has served on several program committees and panels. He has published several papers both in statistics and data mining. 相似文献