共查询到20条相似文献,搜索用时 62 毫秒
1.
2.
3.
4.
5.
6.
基于斜率表示的时间序列相似性度量方法 总被引:5,自引:0,他引:5
时间序列相似性搜索是数据挖掘领域的一个热点研究方向,相似性距离度量方法是其中的一个重要问题.针对含有大量噪声并存在数据缺失的高维多元时间序列数据,本文提出一种基于斜率表示的时间序列相似性度量方法.该方法是在线性分段的基础上,对两个序列间的斜率差进行加权,因而物理概念更为明确.文中还证明斜率距离完全满足相似性度量的基本准则.实例证明了算法的有效性. 相似文献
7.
时间序列的相似性度量是时间序列数据挖掘研究中的一个重要问题,是进行序列查询、分类、预测的一项基础工作。寻求一种好的度量对提高挖掘任务的效率和准确性有着至关重要的意义。目前从事这方面的研究除了少许理论论述外,几乎都采用一种固定的方法,即提出具体要求并提供实验数据。然而,大多数实验方法不是使用范围有限就是侧重点不同。为了提供一个比较全面的实验验证,用1NN分类算法进行了大量的时间序列交叉验证实验,重新评估了其中的弹性度量,并使用不同应用领域的28个时间序列数据集进行比较,结果表明,该方法具有更高的准确性。 相似文献
8.
基于事件的时间序列相似性度量方法 总被引:2,自引:0,他引:2
为了在时间序列相似性度量过程中更好地体现用户的需求,提高相似性度量的准确度,提出了基于事件的时间序列相似性度量方法(SMBE)。首先将用户的需求定义为事件,将原始时间序列转化为事件序列;然后,构建了基于事件序列的相似性度量模型(SMBE),SMBE定义了不同事件序列中各元素之间的相似性,并构成相应的相似性矩阵,对相似性矩阵进行搜索得到最优路径的值作为序列之间的相似性度量;最后,提出了基于SMBE的聚类方法。实验表明,在参数设置合理的情况下,能获得接近0.90的聚类精度。 相似文献
9.
时间序列的相似性度量是时间序列分析的基础工作之一,是进行相似匹配的关键。针对欧几里德距离描述分段趋势的不足和各种模式距离对应分段之间距离值的离散化问题,提出一种基于形态相似距离的时间序列相似性度量方法,标准数据集上完成的识别和聚类实验表明了该方法的可行性和有效性。 相似文献
10.
时间序列的相似性搜索是时间序列知识发现的重要方面。该文提出了一种新的基于距离度量的时间序列相似性搜索算法。该算法采用分段线性表示,同时使用改进的模式距离来度量序列间的距离。 相似文献
11.
为了减少噪声数据对查询最优序列的影响,避免Euclidean距离对形态的敏感性,以及要求序列等长的缺点,提出了面向噪声数据的时间序列相似性搜索算法.运用SPC方法去除序列中的噪声数据;采用DTW距离作为度量函数,使用规范化方法使序列处于相同的分辨率下;采用LB_ Keogh下界函数对候选序列集合进行筛选.仿真实验结果表明,该算法在阈值较小时,对含有噪声数据序列的匹配能力较强. 相似文献
12.
For more than a decade, researchers have actively explored the area of image/video analysis and retrieval. Yet one fundamental
problem remains largely unsolved: how to measure perceptual similarity between two objects. For this purpose, most researchers
employ a Minkowski-type metric. Unfortunately, the Minkowski metric does not reliably find similarities in objects that are
obviously alike. Through mining a large set of visual data, our team has discovered a perceptual distance function. We call
the discovered function the dynamic partial function (DPF). When we empirically compare DPF to Minkowski-type distance functions in image retrieval and in video shot-transition
detection using our image features, DPF performs significantly better. The effectiveness of DPF can be explained by similarity theories in cognitive psychology. 相似文献
13.
Querying time series data based on similarity 总被引:3,自引:0,他引:3
We study similarity queries for time series data where similarity is defined, in a fairly general way, in terms of a distance function and a set of affine transformations on the Fourier series representation of a sequence. We identify a safe set of transformations supporting a wide variety of comparisons and show that this set is rich enough to formulate operations such as moving average and time scaling. We also show that queries expressed using safe transformations can efficiently be computed without prior knowledge of the transformations. We present a query processing algorithm that uses the underlying multidimensional index built over the data set to efficiently answer similarity queries. Our experiments show that the performance of this algorithm is competitive to that of processing ordinary (exact match) queries using the index, and much faster than sequential scanning. We propose a generalization of this algorithm for simultaneously handling multiple transformations at a time, and give experimental results on the performance of the generalized algorithm 相似文献
14.
Efficient query filtering for streaming time series with applications to semisupervised learning of time series classifiers 总被引:1,自引:1,他引:1
Li Wei Eamonn Keogh Helga Van Herle Agenor Mafra-Neto Russell J. Abbott 《Knowledge and Information Systems》2007,11(3):313-344
In this paper, we define time series query filtering, the problem of monitoring the streaming time series for a set of predefined patterns. This problem is of great practical
importance given the massive volume of streaming time series available through sensors, medical patient records, financial
indices and space telemetry. Since the data may arrive at a high rate and the number of predefined patterns can be relatively
large, it may be impossible for the comparison algorithm to keep up. We propose a novel technique that exploits the commonality
among the predefined patterns to allow monitoring at higher bandwidths, while maintaining a guarantee of no false dismissals.
Our approach is based on the widely used envelope-based lower-bounding technique. As we will demonstrate on extensive experiments
in diverse domains, our approach achieves tremendous improvements in performance in the offline case, and significant improvements
in the fastest possible arrival rate of the data stream that can be processed with guaranteed no false dismissals. As a further
demonstration of the utility of our approach, we demonstrate that it can make semisupervised learning of time series classifiers
tractable.
Li Wei is a Ph.D. candidate in the Department of Computer Science & Engineering at the University of California, Riverside. She
received her B.S. and M.S. degrees from Fudan University, China. Her research interests include data mining and information
retrieval.
Eamonn Keogh is an Assistant Professor of computer science at the University of California, Riverside. His research interests include
data mining, machine learning and information retrieval. Several of his papers have won best paper awards, including papers
at SIGKDD and SIGMOD. Dr. Keogh is the recipient of a 5-year NSF Career Award for “Efficient Discovery of Previously Unknown Patterns and Relationships in Massive Time Series Databases”.
Helga Van Herle is an Assistant Clinical Professor of medicine at the Division of Cardiology of the Geffen School of Medicine at UCLA. She
received her M.D. from UCLA in 1993; completed her residency in internal medicine at the New York Hospital (Cornell University;
1993–1996) and her cardiology fellowship at UCLA (1997–2001). Dr. Van Herle holds an M.Sc. in bioengineering from Columbia
University (1987) and a B.Sc. in chemical engineering from UCLA (1985).
Agenor Mafra-Neto, Ph.D., is the CEO of ISCA Technologies, Inc., in California and the founder of ISCA Technologies, LTDA, in Brazil. His research
interests include the analysis of insect behavior and communication systems, the manipulation of insect behavior, and the
automation of pest monitoring and pest control. Dr. Mafra-Neto is currently coordinating the deployment of area-wide smart
sensor and effector networks to micromanage agricultural and public health pests in the field in an automatic fashion.
Russell J. Abbott is a Professor of computer science at California State University, Los Angeles, and a member of the staff at the Aerospace
Corporation, El Segundo, CA. His primary interests are in the field of complex systems. He is currently organizing a workshop
to bring together people working in the fields of complex systems and systems engineering. 相似文献
15.
We consider the problem of finding similar patterns in a time sequence. Typical applications of this problem involve large databases consisting of long time sequences of different lengths. Current time sequence search techniques work well for queries of a prespecified length, but not for arbitrary length queries. We propose a novel indexing technique that works well for arbitrary length queries. The proposed technique stores index structures at different resolutions for a given data set. We prove that this index structure is superior to existing index structures that use a single resolution. We propose a range query and nearest neighbor query technique on this index structure and prove the optimality of our index structure for these search techniques. The experimental results show that our method is 4 to 20 times faster than the current techniques, including sequential scan, for range queries and 3 times faster than sequential scan and other techniques for nearest neighbor queries. Because of the need to store information at multiple resolution levels, the storage requirement of our method could potentially be large. In the second part, we show how the index information can be compressed with minimal information loss. According to our experimental results, even after compressing the size of the index to one fifth, the total cost of our method is 3 to 15 times less than the current techniques. 相似文献
16.
Vit Niennattrakul Pongsakorn Ruengronghirunya Chotirat Ann Ratanamahatana 《Data mining and knowledge discovery》2010,21(3):509-541
Among many existing distance measures for time series data, Dynamic Time Warping (DTW) distance has been recognized as one of the most accurate and suitable distance measures due to its flexibility in sequence alignment. However, DTW distance calculation is computationally intensive. Especially in very large time series databases, sequential scan through the entire database is definitely impractical, even with random access that exploits some index structures since high dimensionality of time series data incurs extremely high I/O cost. More specifically, a sequential structure consumes high CPU but low I/O costs, while an index structure requires low CPU but high I/O costs. In this work, we therefore propose a novel indexed sequential structure called TWIST (Time Warping in Indexed Sequential sTructure) which benefits from both sequential access and index structure. When a query sequence is issued, TWIST calculates lower bounding distances between a group of candidate sequences and the query sequence, and then identifies the data access order in advance, hence reducing a great number of both sequential and random accesses. Impressively, our indexed sequential structure achieves significant speedup in a querying process. In addition, our method shows superiority over existing rival methods in terms of query processing time, number of page accesses, and storage requirement with no false dismissal guaranteed. 相似文献
17.
Abdullah Mueen Nikan Chavoshi Noor Abu-El-Rub Hossein Hamooni Amanda Minnich Jonathan MacCarthy 《Knowledge and Information Systems》2018,54(1):237-263
Dynamic time warping (DTW) distance has been effectively used in mining time series data in a multitude of domains. However, in its original formulation DTW is extremely inefficient in comparing long sparse time series, containing mostly zeros and some unevenly spaced nonzero observations. Original DTW distance does not take advantage of this sparsity, leading to redundant calculations and a prohibitively large computational cost for long time series. We derive a new time warping similarity measure (AWarp) for sparse time series that works on the run-length encoded representation of sparse time series. The complexity of AWarp is quadratic on the number of observations as opposed to the range of time of the time series. Therefore, AWarp can be several orders of magnitude faster than DTW on sparse time series. AWarp is exact for binary-valued time series and a close approximation of the original DTW distance for any-valued series. We discuss useful variants of AWarp: bounded (both upper and lower), constrained, and multidimensional. We show applications of AWarp to three data mining tasks including clustering, classification, and outlier detection, which are otherwise not feasible using classic DTW, while producing equivalent results. Potential areas of application include bot detection, human activity classification, search trend analysis, seismic analysis, and unusual review pattern mining. 相似文献
18.
Similarity search aims to find all objects similar to a query object. Typically, some base similarity measures for the different properties of the objects are defined, and light-weight similarity indexes for these measures are built. A query plan specifies which similarity indexes to use with which similarity thresholds and how to combine the results. Previous work creates only a single, static query plan to be used by all queries. In contrast, our approach creates a new plan for each query. 相似文献
19.
Yuanyuan Cai Qingchuan Zhang Wei Lu Xiaoping Che 《Journal of Intelligent Information Systems》2018,51(1):23-47
As a valuable tool for text understanding, semantic similarity measurement enables discriminative semantic-based applications in the fields of natural language processing, information retrieval, computational linguistics and artificial intelligence. Most of the existing studies have used structured taxonomies such as WordNet to explore the lexical semantic relationship, however, the improvement of computation accuracy is still a challenge for them. To address this problem, in this paper, we propose a hybrid WordNet-based approach CSSM-ICSP to measuring concept semantic similarity, which leverage the information content(IC) of concepts to weight the shortest path distance between concepts. To improve the performance of IC computation, we also develop a novel model of the intrinsic IC of concepts, where a variety of semantic properties involved in the structure of WordNet are taken into consideration. In addition, we summarize and classify the technical characteristics of previous WordNet-based approaches, as well as evaluate our approach against these approaches on various benchmarks. The experimental results of the proposed approaches are more correlated with human judgment of similarity in term of the correlation coefficient, which indicates that our IC model and similarity detection approach are comparable or even better for semantic similarity measurement as compared to others. 相似文献
20.
由于时间序列的长度很大,并且不确定时间序列在每个采样点的取值具有不确定性,导致时间序列在相似性匹配和聚类挖掘中时间复杂度很高,为了解决该问题,提出了基于趋势的时间序列相似性度量方法和聚类方法.其中基于趋势的相似性度量方法根据时间序列的整体变化趋势,将时间序列映射为短的趋势符号序列,并利用各趋势的一阶连接性指数和塔尼莫特系数完成相似性度量;基于趋势的聚类方法通过定义趋势高度,并对趋势符号序列迭代进行区间划分和趋势判断,并以此构建趋势树,最后将趋势树根节点中趋势符号相同的序列聚集为一类.实验结果表明:a)五种趋势符号的一阶连接性指数可唯一地表示一条时间序列;b)基于趋势的相似性度量方法在多项式时间内可有效完成时间序列的相似性匹配;c)基于趋势的聚类方法将序列的相似性度量和聚类过程集中在一起,聚类效果显著. 相似文献