首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Little work has been reported in the literature to support k-nearest neighbor (k-NN) searches/queries in hybrid data spaces (HDS). An HDS is composed of a combination of continuous and non-ordered discrete dimensions. This combination presents new challenges in data organization and search ordering. In this paper, we present an algorithm for k-NN searches using a multidimensional index structure in hybrid data spaces. We examine the concept of search stages and use the properties of an HDS to derive a new search heuristic that greatly reduces the number of disk accesses in the initial stage of searching. Further, we present a performance model for our algorithm that estimates the cost of performing such searches. Our experimental results demonstrate the effectiveness of our algorithm and the accuracy of our performance estimation model.  相似文献   

2.
A review on time series data mining   总被引:5,自引:0,他引:5  
Time series is an important class of temporal data objects and it can be easily obtained from scientific and financial applications. A time series is a collection of observations made chronologically. The nature of time series data includes: large in data size, high dimensionality and necessary to update continuously. Moreover time series data, which is characterized by its numerical and continuous nature, is always considered as a whole instead of individual numerical field. The increasing use of time series data has initiated a great deal of research and development attempts in the field of data mining. The abundant research on time series data mining in the last decade could hamper the entry of interested researchers, due to its complexity. In this paper, a comprehensive revision on the existing time series data mining research is given. They are generally categorized into representation and indexing, similarity measure, segmentation, visualization and mining. Moreover state-of-the-art research issues are also highlighted. The primary objective of this paper is to serve as a glossary for interested researchers to have an overall picture on the current time series data mining development and identify their potential research direction to further investigation.  相似文献   

3.
We introduce a new representation for time series, the Multiresolution Vector Quantized (MVQ) approximation, along with a distance function. Similar to Discrete Wavelet Transform, MVQ keeps both local and global information about the data. However, instead of keeping low-level time series values, it maintains high-level feature information (key subsequences), facilitating the introduction of more meaningful similarity measures. The method is fast and scales linearly with the database size and dimensionality. Contrary to previous methods, the vast majority of which use the Euclidean distance, MVQ uses a multiresolution/hierarchical distance function. In our experiments, the proposed technique consistently outperforms the other major methods.  相似文献   

4.
We study the problem of searching similar patterns in time series data for variable length queries. Recently, a multi-resolution indexing technique (MRI) was proposed in (Kahveci and Singh, in proceedings of the international conference on data engineering, pp. 273–282, 2001; Kahveci and Singh, IEEE Trans Knowl Data Eng 16(4):418–433, 2004) to address this problem, which uses compression as an additional step to reduce the index size. In this paper, we propose an alternative technique, called compact MRI (CMRI), which uses adaptive piecewise constant approximation (APCA) representation as dimensionality reduction technique, and which occupies much less space without requiring compression. We implemented both MRI and CMRI, and conducted extensive experiments to evaluate and compare their performance on real stock data as well as synthetic. Our results indicate that CMRI provides a much better precision ranging from 0.75 to 0.89 on real data, and from 0.80 to 0.95 on synthetic data, while for MRI, these ranges are from 0.16 to 0.34, and from 0.03 to 0.65, respectively. Compared to sequential scan, we found that CMRI is 4–30 times faster and the number of disk I/Os it required is close to minimal. In terms of storage utilization, CMRI occupies 1% of the memory occupied by MRI. These results and analysis show CMRI to be an efficient and scalable indexing technique for large time series databases.  相似文献   

5.
Similarity search and detection is a central problem in time series data processing and management. Most approaches to this problem have been developed around the notion of dynamic time warping, whereas several dimensionality reduction techniques have been proposed to improve the efficiency of similarity searches. Due to the continuous increasing of sources of time series data and the cruciality of real-world applications that use such data, we believe there is a challenging demand for supporting similarity detection in time series in a both accurate and fast way. Our proposal is to define a concise yet feature-rich representation of time series, on which the dynamic time warping can be applied for effective and efficient similarity detection of time series. We present the Derivative time series Segment Approximation (DSA) representation model, which originally features derivative estimation, segmentation and segment approximation to provide both high sensitivity in capturing the main trends of time series and data compression. We extensively compare DSA with state-of-the-art similarity methods and dimensionality reduction techniques in clustering and classification frameworks. Experimental evidence from effectiveness and efficiency tests on various datasets shows that DSA is well-suited to support both accurate and fast similarity detection.  相似文献   

6.
Dynamic time warping (DTW) is a powerful technique in the time-series similarity search. However, its performance on large-scale data is unsatisfactory because of its high computational cost and the fact that it cannot be indexed directly. The lower bound technique for DTW is an effective solution to this problem. In this paper, we explain the existing lower-bound functions from a unified perspective and show that they are only special cases under our framework. We then propose a group of lower-bound functions for DTW and compare their performances through extensive experiments. The experimental results show that the new methods are better than the existing ones in most cases, and a theoretical explanation of the results is also given. We further implement an index structure based on the new lower-bound function. Experimental results demonstrate a similar conclusion.  相似文献   

7.
基于提前终止的加速时间序列弯曲算法   总被引:3,自引:0,他引:3  
动态时间弯曲(DTW)距离是时间序列相似搜索的一种重要距离度量,但其精确计算是一个性能瓶颈。针对此问题,提出一种名为EA_DTW的方法用于加速DTW距离的精确计算,该方法在计算累积距离矩阵中每个方格的距离时都判断其是否超过阈值,一旦超过则提前终止其余相关方格的距离计算;并对EA_DTW的过程进行了理论分析。实验对比表明,EA_DTW能够提高DTW的计算效率,在阈值与DTW距离相比较小时更加明显。  相似文献   

8.
刘琨  吴绍春 《计算机工程与设计》2007,28(16):3998-4000,4003
时间序列模式在很多领域中存在,时序模式的表示及存储查询是时间序列数据挖掘的重要任务之一.分析和研究了地震前兆时序模式的特点,采用半结构化语言XML并利用分段线性表示法表示地震前兆时序模式,在此基础上提出了针对Java、PL/SQL、命令行3种不同环境下地震前兆时序模式存储及查询方法,既保证了时序模式的存储查询效率,又满足了不同平台下针对时序模式的处理,从而进一步为地震预报服务.  相似文献   

9.
Optimal algorithms for the online time series search problem   总被引:1,自引:0,他引:1  
In the problem of online time series search introduced by El-Yaniv et al. (2001) [1], a player observes prices one by one over time and shall select exactly one of the prices on its arrival without the knowledge of future prices, aiming to maximize the selected price. In this paper, we extend the problem by introducing profit function. Considering two cases where the search duration is either known or unknown beforehand, we propose two optimal deterministic algorithms respectively. The models and results in this paper generalize those of El-Yaniv et al. (2001) [1].  相似文献   

10.
Coping with time series cases is becoming an important issue in applications of case based reasoning in medical cares. This paper develops a knowledge discovery approach to discovering significant sequences for depicting symbolic time series cases. The input is a case library containing time series cases consisting of consecutive discrete patterns. The proposed approach is able to find from the given case library all qualified sequences that are non-redundant and indicative. A sequence as such is termed as a key sequence. It is shown that the key sequences discovered are highly valuable in case characterization to capture important properties while ignoring random trivialities. The main idea is to transform an original (lengthy) time series into a more concise representation in terms of the detected occurrences of key sequences. Four alternative ways to develop case indexes based on key sequences are suggested and discussed in detail. These indexes are simply vectors of numbers that are easily usable when matching two time series cases for case retrieval. Preliminary experiment results have revealed that such case indexes utilizing key sequence information result in substantial performance improvement for the underlying case-based reasoning system.  相似文献   

11.
Time series data mining (TSDM) techniques permit exploring large amounts of time series data in search of consistent patterns and/or interesting relationships between variables. TSDM is becoming increasingly important as a knowledge management tool where it is expected to reveal knowledge structures that can guide decision making in conditions of limited certainty. Human decision making in problems related with analysis of time series databases is usually based on perceptions like “end of the day”, “high temperature”, “quickly increasing”, “possible”, etc. Though many effective algorithms of TSDM have been developed, the integration of TSDM algorithms with human decision making procedures is still an open problem. In this paper, we consider architecture of perception-based decision making system in time series databases domains integrating perception-based TSDM, computing with words and perceptions, and expert knowledge. The new tasks which should be solved by the perception-based TSDM methods to enable their integration in such systems are discussed. These tasks include: precisiation of perceptions, shape pattern identification, and pattern retranslation. We show how different methods developed so far in TSDM for manipulation of perception-based information can be used for development of a fuzzy perception-based TSDM approach. This approach is grounded in computing with words and perceptions permitting to formalize human perception-based inference mechanisms. The discussion is illustrated by examples from economics, finance, meteorology, medicine, etc.  相似文献   

12.
A sieve bootstrap procedure for constructing interpolation intervals for a general class of linear processes is proposed. This sieve bootstrap provides consistent estimators of the conditional distribution of the missing values, given the observed data. A Monte Carlo experiment is used to show the finite sample properties of the sieve bootstrap and finally, the performance of the proposed method is illustrated with a real data example.  相似文献   

13.
Aiming at the diversity of user features, the uncertainty and the variation characteristics of quality of service (QoS), by exploiting the continuous monitoring data of cloud services, this paper proposes a multi-valued collaborative approach to predict the unknown QoS values via time series analysis for potential users. In this approach, the multi-valued QoS evaluations consisting of single-value data and time series data from consumers are transformed into cloud models, and the differences between potential users and other consumers in every period are measured based on these cloud models. Against the deficiency of existing methods of similarity measurement between cloud models, this paper presents a new vector comparison method combining the orientation similarity and dimension similarity to improve the precision of similarity calculation. The fuzzy analytic hierarchy process method is used to help potential users determine the objective weight of every period, and the neighboring users are selected for the potential user according to their comprehensive similarities of QoS evaluations in multiple periods. By incorporating the multi-valued QoS evaluations with the objective weights among multiple periods, the predicted results can remain consistent with the periodic variations of QoS. Finally, the experiments based on a real-world dataset demonstrate that this approach can provide high accuracy of collaborative QoS prediction for multi-valued evaluations in the cloud computing paradigm.  相似文献   

14.
Improving the recall of information retrieval systems for similarity search in time series databases is of great practical importance. In the manufacturing domain, these systems are used to query large databases of manufacturing process data that contain terabytes of time series data from millions of parts. This allows domain experts to identify parts that exhibit specific process faults. In practice, the search often amounts to an iterative query–response cycle in which users define new queries (time series patterns) based on results of previous queries. This is a well-documented phenomenon in information retrieval and not unique to the manufacturing domain. Indexing manufacturing databases to speed up the exploratory search is often not feasible as it may result in an unacceptable reduction in recall. In this paper, we present a novel adaptive search algorithm that refines the query based on relevance feedback provided by the user. Additionally, we propose a mechanism that allows the algorithm to self-adapt to new patterns without requiring any user input. As the search progresses, the algorithm constructs a library of time series patterns that are used to accurately find objects of the target class. Experimental validation of the algorithm on real-world manufacturing data shows, that the recall for the retrieval of fault patterns is considerably higher than that of other state-of-the-art adaptive search algorithms. Additionally, its application to publicly available benchmark data sets shows, that these results are transferable to other domains.  相似文献   

15.
Temporal data produced by industrial, human, and natural phenomena typically contain deterministic and stochastic influences, being the first ideally modelled using Dynamical Systems while the second is appropriately addressed using Statistical tools. Although such influences have been widely studied as individual components, specific tools are required to support their decomposition for a proper modeling and analysis. This article addresses a comprehensive survey of the main time-series decomposition strategies and their relative performances in different application domains. The following strategies are discussed: i) Fourier Transform, ii) Wavelet transforms, iii) Moving Average, iv) Singular Spectrum Analysis, v) Lazy, vi) GHKSS, and vii) other approaches based on the Empirical Mode Decomposition method. In order to assess these strategies, we employ diverse and complementary performance measures: i) Mean Absolute Error, Mean Squared and Root Mean Squared Errors; ii) Minkowski Distances; iii) Complexity-Invariant Distance; iv) Pearson correlation; v) Mean Distance from the Diagonal Line; and vi) Mean Distance from Attractors. Each decomposition strategy is better devoted to particular scenarios, however, without any previous knowledge on data, GHKSS confirmed to work as a fair and general baseline besides its time complexity.  相似文献   

16.
P.W.  Y.R. 《Pattern recognition》1995,28(12):1916-1925
Spatial reasoning and similarity retrieval are two important functions of any image information system. Good spatial knowledge representation for images is necessary to adequately support these two functions. In this paper, we propose a new spatial knowledge representation, called the SK-set based on morphological skeleton theories. Spatial reasoning algorithms which achieve more accurate results by directly analysing skeletons are described. SK-set facilitates browsing and progressive visualization. We also define four new types of similarity measures and propose a similarity retrieval algorithm for performing image retrieval. Moreover, using SK-set as a spatial knowledge representation will reduce the storage space required by an image database significantly.  相似文献   

17.
18.
用浮动搜索算法对时间序列进行特征选择得到低维特征参数,采用WSTB方法实现对高维时序的相似性搜索。首先用浮动搜常算法对高维时间序列降维处理,得到特征参数后进行样本线性分段,建立时序曲线箱和相应索引。其次对样本序列和相似距离进行快速计算,不用逐个检查子序列箱的内容就进行快速索引。最后还验证了该疗法的通用性和有效性。  相似文献   

19.
Towards the evaluation of time series protection methods   总被引:1,自引:0,他引:1  
The goal of statistical disclosure control (SDC) is to modify statistical data so that it can be published without releasing confidential information that may be linked to specific respondents. The challenge for SDC is to achieve this variation with minimum loss of the detail and accuracy sought by final users. There are many approaches to evaluate the quality of a protection method. However, all these measures are only applicable to numerical or categorical attributes.In this paper, we present some recent results about time series protection and re-identification. We propose a complete framework to evaluate time series protection methods. We also present some empirical results to show how our framework works.  相似文献   

20.
We describe a new multi-phase, color-based image retrieval system (FOCUS) which is capable of identifying multi-colored query objects in an image in the presence of significant, interfering backgrounds. The query object may occur in arbitrary sizes, orientations, and locations in the database images. Scale and rotation invariant color features have been developed to describe an image, such that the matching process is fast even in the case of complex images. The first phase of processing matches the query object color with the color content of an image computed as the peaks in the color histogram of the image. The second phase matches the spatial relationships between color regions in the image with the query using a spatial proximity graph (SPG) structure designed for the purpose. Processing at coarse granularity is preferred over pixel-level processing to produce simpler graphs, which significantly reduces computation time during matching. The speed of the system and the small storage overhead make it suitable for use in large databases with online user interfaces. Test results with multi-colored query objects from man-made and natural domains show that FOCUS is quite effective in handling interfering backgrounds and large variations in scale. The experimental results on a database of diverse images highlights the capabilities of the system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号