首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
Moving average transform is very useful in finding the trend of time-series data by reducing the effect of noise, and has been used in many areas such as econometrics. Previous subsequence matching methods with moving average transform, however, are problematic in that, since they must build multiple indexes in supporting transform of arbitrary order, they incur index overhead both in storage space and in update maintenance. To solve this problem, we propose a single-index approach to subsequence matching that supports moving average transform of arbitrary order in time-series databases. Using the single-index approach, we can reduce both the storage space and the index maintenance overhead. In explaining the single-index approach, we first introduce the notion of poly-order moving average transform by generalizing the original definition of moving average transform. We then formally prove the correctness of poly-order transform-based subsequence matching. We also propose two subsequence matching methods based on poly-order transform that efficiently support moving average transform of arbitrary order. Experimental results for real stock data show that, compared with the sequential scan, our methods improve average performance significantly, by a factor of 22.6-33.6. Also, compared with cases in which an index is built for every moving average order, our methods reduce storage space and maintenance effort significantly while incurring only marginal performance degradation. Our approach entails the additional advantage of being generalized to support many other transforms in addition to moving average transform. Therefore, we believe that our approach will be widely used in many transform-based subsequence matching methods.  相似文献   

2.
Efficient processing of streaming time-series generated by remote sensors and mobile devices has become an important research area. As in traditional time-series applications, similarity matching on streaming time-series is also an essential research issue. To obtain more accurate similarity search results in many time-series applications, preprocessing is performed on the time-series before they are compared. The preprocessing removes distortions such as offset translation, amplitude scaling, linear trends, and noise inherent in time-series. In this paper, we propose an algorithm for distortion-free predictive streaming time-series matching. Similarity matching on streaming time-series is saliently different from traditional time-series in that it is not feasible to directly apply the traditional algorithms for streaming time-series. Our algorithm is distortion-free in the sense that it performs preprocessing on streaming time-series to remove offset translation and amplitude scaling distortions at the same time. Our algorithm is also predictive, since it performs streaming time-series matching against the predicted most recent subsequences in the near future, and thus improves search performance. To the best of our knowledge, no streaming time-series matching algorithm currently performs preprocessing and predicts future search results simultaneously.  相似文献   

3.
We propose a new similar sequence matching method that efficiently supports variable-length and variable-tolerance continuous query sequences on time-series data stream. Earlier methods do not support variable lengths or variable tolerances adequately for continuous query sequences if there are too many query sequences registered to handle in main memory. To support variable-length query sequences, we use the window construction mechanism that divides long sequences into smaller windows for indexing and searching the sequences. To support variable-tolerance query sequences, we present a new notion of intervaled sequences whose individual entries are an interval of real numbers rather than a real number itself. We also propose a new similar sequence matching method based on these notions, and then, formally prove correctness of the method. In addition, we show that our method has the prematching characteristic, which finds future candidates of similar sequences in advance. Experimental results show that our method outperforms the naive one by 2.6-102.1 times and the existing methods in the literature by 1.4-9.8 times over the entire ranges of parameters tested when the query selectivities are low (<32%), which are practically useful in large database applications.  相似文献   

4.
MatchSim: a novel similarity measure based on maximum neighborhood matching   总被引:1,自引:1,他引:0  
Measuring object similarity in a graph is a fundamental data- mining problem in various application domains, including Web linkage mining, social network analysis, information retrieval, and recommender systems. In this paper, we focus on the neighbor-based approach that is based on the intuition that ??similar objects have similar neighbors?? and propose a novel similarity measure called MatchSim. Our method recursively defines the similarity between two objects by the average similarity of the maximum-matched similar neighbor pairs between them. We show that MatchSim conforms to the basic intuition of similarity; therefore, it can overcome the counterintuitive contradiction in SimRank. Moreover, MatchSim can be viewed as an extension of the traditional neighbor-counting scheme by taking the similarities between neighbors into account, leading to higher flexibility. We present the MatchSim score computation process and prove its convergence. We also analyze its time and space complexity and suggest two accelerating techniques: (1) proposing a simple pruning strategy and (2) adopting an approximation algorithm for maximum matching computation. Experimental results on real-world datasets show that although our method is less efficient computationally, it outperforms classic methods in terms of accuracy.  相似文献   

5.
6.
Multimedia Tools and Applications - In this paper, we deal with the problem of boundary image matching which finds similar boundary images regardless of partial noise exploiting time-series...  相似文献   

7.
主要研究拟(h,k)阶存贮有限自动机的延迟k步与k+1步弱可逆性,以及它的弱逆,得到了拟(h,k)阶存贮有限自动机的延迟k步与k+1步弱可逆的充分必要条件,并且通过所得结果可以比较简便地构造出延迟k步与k+1步弱可逆拟(h,k)阶存贮有限自动机的延迟k步与k+1步弱逆。  相似文献   

8.
A time-series database is a set of data sequences, each of which is a list of changing values of an object in a given period of time. Subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence in a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We claim that index interpolation is a fairly effective tool to solve this problem. Index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their distinct sizes. For index interpolation, we need to decide the sizes of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes from the perspective of physical database design. Given a set of pairs 〈lengthfrequency〉 of query sequences to be performed in a target application and a set of window sizes for building multiple indexes, we devise a formula that estimates the overall cost of all the subsequence matchings performed in a target application. By using this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally prove the optimality as well as the effectiveness of the algorithm. Finally, we show the superiority of our approach by performing extensive experiments with a real-life stock data set and a large volume of synthetic data sets.  相似文献   

9.
In this paper, we show that zoom-endoscopy images can be well classified according to the pit-pattern classification scheme by using texture-analysis methods in different wavelet domains. We base our approach on three different variants of the wavelet transform and propose that the color channels of the RGB and LAB color model are an important source for computing image features with high discriminative power. Color-channel information is incorporated by either using simple feature vector concatenation and cross-cooccurrence matrices in the wavelet domain. Our experimental results based on k-nearest neighbor classification and forward feature selection exemplify the advantages of the different wavelet transforms and show that color-image analysis is superior to grayscale-image analysis regarding our medical image classification problem.  相似文献   

10.
Choosing the best location for starting a business or expanding an existing enterprize is an important issue. A number of location selection problems have been discussed in the literature. They often apply the Reverse Nearest Neighbor as the criterion for finding suitable locations. In this paper, we apply the Average Distance as the criterion and propose the so-called k-most suitable locations (k-MSL) selection problem. Given a positive integer k and three datasets: a set of customers, a set of existing facilities, and a set of potential locations. The k-MSL selection problem outputs k locations from the potential location set, such that the average distance between a customer and his nearest facility is minimized. In this paper, we formally define the k-MSL selection problem and show that it is NP-hard. We first propose a greedy algorithm which can quickly find an approximate result for users. Two exact algorithms are then proposed to find the optimal result. Several pruning rules are applied to increase computational efficiency. We evaluate the algorithms’ performance using both synthetic and real datasets. The results show that our algorithms are able to deal with the k-MSL selection problem efficiently.  相似文献   

11.
In this paper, we present an approach for 3D face recognition from frontal range data based on the ridge lines on the surface of the face. We use the principal curvature, kmax, to represent the face image as a 3D binary image called ridge image. The ridge image shows the locations of the ridge points around the important facial regions on the face (i.e., the eyes, the nose, and the mouth). We utilized the robust Hausdorff distance and the iterative closest points (ICP) for matching the ridge image of a given probe image to the ridge images of the facial images in the gallery. To evaluate the performance of our approach for 3D face recognition, we performed experiments on GavabDB face database (a small size database) and Face Recognition Grand Challenge V2.0 (a large size database). The results of the experiments show that the ridge lines have great capability for 3D face recognition. In addition, we found that as long as the size of the database is small, the performance of the ICP-based matching and the robust Hausdorff matching are comparable. But, when the size of the database increases, ICP-based matching outperforms the robust Hausdorff matching technique.  相似文献   

12.
Watershed transformation is a common technique for image segmentation. However, its use for automatic medical image segmentation has been limited particularly due to oversegmentation and sensitivity to noise. Employing prior shape knowledge has demonstrated robust improvements to medical image segmentation algorithms. We propose a novel method for enhancing watershed segmentation by utilizing prior shape and appearance knowledge. Our method iteratively aligns a shape histogram with the result of an improved k-means clustering algorithm of the watershed segments. Quantitative validation of magnetic resonance imaging segmentation results supports the robust nature of our method.  相似文献   

13.
Querying polyphonic music from a large data collection is an interesting topic. Recently, researchers have attempted to provide efficient methods for content-based retrieval in polyphonic music databases where queries are polyphonic. However, most of them do not work well for similarity search, which is important to many applications. In this paper, we propose three polyphonic representations with the associated similarity measures and a novel method to retrieve k music works that contain segments most similar to the query. In general, most of the index-based methods for similarity search generate all the possible answers to the query and then perform exact matching on the index for each possible answer. Based on the edit distance, our method generates only a few possible answers by performing the deletion and/or replacement operations on the query. Each possible answer is then used to perform exact matching on a list-based index, which allows the insertion operations to be performed. For each possible answer, its edit distance to the query is regarded as a lower bound of the edit distances between the matched results and the query. Based on the kNN results that match a possible answer, the possible answers that cannot provide better results are skipped. By using this mechanism, we design a method for efficient kNN search in polyphonic music databases. The experimental results show that our method outperforms the previous methods in efficiency. We also evaluate the effectiveness of our method by showing the search results to the musician and nonmusician user groups. The experimental results provide useful guidelines on the design of a polyphonic music database.  相似文献   

14.
15.
Recently there has been a considerable interest in dynamic textures due to the explosive growth of multimedia databases. In addition, dynamic texture appears in a wide range of videos, which makes it very important in applications concerning to model physical phenomena. Thus, dynamic textures have emerged as a new field of investigation that extends the static or spatial textures to the spatio-temporal domain. In this paper, we propose a novel approach for dynamic texture segmentation based on automata theory and k-means algorithm. In this approach, a feature vector is extracted for each pixel by applying deterministic partially self-avoiding walks on three orthogonal planes of the video. Then, these feature vectors are clustered by the well-known k-means algorithm. Although the k-means algorithm has shown interesting results, it only ensures its convergence to a local minimum, which affects the final result of segmentation. In order to overcome this drawback, we compare six methods of initialization of the k-means. The experimental results have demonstrated the effectiveness of our proposed approach compared to the state-of-the-art segmentation methods.  相似文献   

16.
Three-dimensional free form shape matching is a fundamental problem in both the machine vision and pattern recognition literatures. However, the automatic approach to 3D free form shape matching still remains open. In this paper, we propose using k closest points in the second view for the automatic 3D free form shape matching. For the sake of computational efficiency, the optimised k-D tree is employed for the search of the k closest points. Since occlusion and appearance and disappearance of points almost always occur, slack variables have to be employed, explicitly modelling outliers in the process of matching. Then the relative quality of each possible point match is estimated using the graduated assignment algorithm, leading the camera motion parameters to be estimated by the quaternion method in the weighted least-squares sense. The experimental results based on both synthetic data and real images without any pre-processing show the effectiveness and efficiency of the proposed algorithm for the automatic matching of overlapping 3D free form shapes with either sparse or dense points.  相似文献   

17.
Similarity search in graph databases has been widely investigated. It is worthwhile to develop a fast algorithm to support similarity search in large-scale graph databases. In this paper, we investigate a k-NN (k-Nearest Neighbor) similarity search problem by locality sensitive hashing (LSH). We propose an innovative fast graph search algorithm named LSH-GSS, which first transforms complex graphs into vectorial representations based on prototypes in the database and later accelerates a query in Euclidean space by employing LSH. Because images can be represented as attributed graphs, we propose an approach to transform attributed graphs into n-dimensional vectors and apply LSH-GSS to execute further image retrieval. Experiments on three real graph datasets and two image datasets show that our methods are highly accurate and efficient.  相似文献   

18.
We propose a new variant of the bit-parallel NFA of Baeza-Yates and Navarro (BPD) for approximate string matching [R. Baeza-Yates, G. Navarro, Faster approximate string matching, Algorithmica 23 (1999) 127-158]. BPD is one of the most practical approximate string matching algorithms under moderate pattern lengths and error levels [G. Myers, A fast bit-vector algorithm for approximate string matching based on dynamic programming, J. ACM 46 (3) 1989 395-415; G. Navarro, M. Raffinot, Flexible Pattern Matching in Strings—Practical On-line Search Algorithms for Texts and Biological Sequences, Cambridge University Press, Cambridge, UK, 2002]. Given a length-m pattern and an error threshold k, the original BPD requires (mk)(k+2) bits of space to represent an NFA with (mk)(k+1) states. In this paper we remove redundancy from the original NFA representation. Our variant requires (mk)(k+1) bits of space, which is optimal in the sense that exactly one bit per state is used. The space efficiency is achieved by using an alternative, but equally or even more efficient, simulation algorithm for the bit-parallel NFA. We also present experimental results to compare our modified NFA against the original BPD and its main competitors. Our new variant is more efficient than the original BPD, and it hence takes over/extends the role of the original BPD as one of the most practical approximate string matching algorithms under moderate values of k and m.  相似文献   

19.
Isometric mapping (Isomap) is a popular nonlinear dimensionality reduction technique which has shown high potential in visualization and classification. However, it appears sensitive to noise or scarcity of observations. This inadequacy may hinder its application for the classification of microarray data, in which the expression levels of thousands of genes in a few normal and tumor sample tissues are measured. In this paper we propose a double-bounded tree-connected variant of Isomap, aimed at being more robust to noise and outliers when used for classification and also computationally more efficient. It differs from the original Isomap in the way the neighborhood graph is generated: in the first stage we apply a double-bounding rule that confines the search to at most k nearest neighbors contained within an ε-radius hypersphere; the resulting subgraphs are then joined by computing a minimum spanning tree among the connected components. We therefore achieve a connected graph without unnaturally inflating the values of k and ε. The computational experiences show that the new method performs significantly better in terms of accuracy with respect to Isomap, k-edge-connected Isomap and the direct application of support vector machines to data in the input space, consistently across seven microarray datasets considered in our tests.  相似文献   

20.
Despite advances in machine learning technologies a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of ??possible mappings?? between the schemas may be derived from the matching result. In this paper, we study problems related to managing possible mappings between two heterogeneous XML schemas. First, we study how to efficiently generate possible mappings for a given schema matching task. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. Second, storing and querying a large set of possible mappings can incur large storage and evaluation overhead. For XML schemas, we observe that their possible mappings often exhibit a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner and can be efficiently generated. Moreover, it facilitates the evaluation of a probabilistic twig query (PTQ), which returns the non-zero probability that a fragment of an XML document matches a given query. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ and present an efficient solution for it. An extensive evaluation on real-world data sets shows that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号