首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
We devise a skyline algorithm that can efficiently mitigate the enormous overhead of processing millions of tuples on totally- and partially-ordered domains (henceforth, TODs and PODs). With massive datasets, existing techniques spend a significant amount of time on a dominance comparison because of both a large number of skyline points and the unprogressive method of skyline computing with PODs. (If data has high dimensionality, the situation is undoubtedly aggravated.) The progressiveness property turns out to be the key feature for solving all remaining problems. This article presents a FAST-SKY algorithm that deals successfully with these two obstacles and improves skyline query processing time strikingly, even with high-dimensional data. Progressive skyline evaluation with PODs is guaranteed by new index structures and topological sorting order. A stratification technique is adopted to index data on PODs, and we propose two new index structures: stratified R-trees (SR-trees) for low-dimensional data and stratified MinMax treaps (SM-treaps) for high-dimensional data. A fast dominance comparison is achieved by using a reporting query instead of a dominance query, and a dimensionality reduction technique. Experimental results suggest that in general cases (anti-correlated and uniform distributions) FAST-SKY is orders of magnitude faster than existing algorithms.  相似文献   

2.
高效的时间序列下界技术   总被引:3,自引:0,他引:3       下载免费PDF全文
针对时间序列数据,提出一种新的基于动态时间弯曲的下界技术,该技术首先基于分段聚集近似的线性表示对原始序列进行降维,同时生成查询序列的网格最小边界矩形近似表示,然后利用基于动态时间弯曲距离对两者下界距离度量。实验结果表明,该下界技术与以往相关技术相比,能够产生更大的下界距离,具有更强的紧凑度、裁剪搜索空间能力以及更短的运行时间,有利于时间序列数据挖掘。  相似文献   

3.
The problem of similarity search in large time series databases has attracted much attention recently. It is a non-trivial problem because of the inherent high dimensionality of the data. The most promising solutions involve first performing dimensionality reduction on the data, and then indexing the reduced data with a spatial access method. Three major dimensionality reduction techniques have been proposed: Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and more recently the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Piecewise Aggregate Approximation (PAA). We theoretically and empirically compare it to the other techniques and demonstrate its superiority. In addition to being competitive with or faster than the other methods, our approach has numerous other advantages. It is simple to understand and to implement, it allows more flexible distance measures, including weighted Euclidean queries, and the index can be built in linear time. Received 16 May 2000 / Revised 18 December 2000 / Accepted in revised form 2 January 2001  相似文献   

4.
In a typical Web recommendation system, objects are often described by many attributes. It also needs to serve many users with a diversified range of preferences. In other words, it must be capable to efficiently support high dimensional preference queries that allow the user to explore the data space effectively without imposing specific preference weightings for each dimension. The skyline query, which can produce a set of objects guaranteed to contain all top ranked objects for any linear attribute preference combination, has been proposed to support this type of recommendation applications. However, it suffers from the problem known as ‘dimensionality curse’ as the size of skyline query result set can grow exponentially with the number of dimensions. Therefore, when the dimensionality is high, a large percentage of objects can become skyline points. This problem makes such a recommendation system less usable for users. In this paper, we propose a stronger type of skyline query, called core skyline query, that adopts a new quality measure called vertical dominance to return only an interesting subset of the traditional skyline points. An efficient query processing method is proposed to find core skyline points using a novel indexing structure called Linked Multiple B’-trees (LMB). Our approach can find such superior skyline points progressively without the need of computing the entire set of skyline points first.  相似文献   

5.
基于小波变换的时间序列相似模式匹配   总被引:21,自引:1,他引:21  
提出了一种新的时序相似模式匹配方法,它采用小波分析的方法实现时间序列数据的降维,采用小波序列表示原序列,将小波序列组织为多维索引结构R-tree存储,在该索引结构基础上,基于一种表示相似性的距离函数,定义了范围查询和最近邻查询算法,实验结果证明这种方法性能优于传统的基于傅立叶变换的相似模式匹配方法。  相似文献   

6.
Hailin Li  Chonghui Guo 《Knowledge》2011,24(4):492-500
Many researchers focus on dimensionality reduction techniques for the efficient data mining in large time series database. Meanwhile, corresponding distance measures are provided for describing the relationships between two different time series in reduced space. In this paper, we propose a novel approach which we call piecewise cloud approximation (PWCA) to reduce the dimensionality of time series. This representation not only allows dimensionality reduction but also gives a new way to measure the similarity between time series well. Cloud, a qualitative and quantitative transformation model, is used to describe the features of subsequences of time series. Furthermore, a new way to measure the similarity between two cloud models is defined by an overlapping area of their own expectation curves. We demonstrate the performance of the proposed representation and similarity measure used in time series mining tasks, including clustering, classification and similarity search. The results of experiments indicate that PWCA is an effective representation for time series mining.  相似文献   

7.
Spatiotemporal objects – that is, objects that evolve over time – appear in many applications. Due to the nature of such applications, storing the evolution of objects through time in order to answer historical queries (queries that refer to past states of the evolution) requires a very large specialized database, what is termed in this article a spatiotemporal archive. Efficient processing of historical queries on spatiotemporal archives requires equally sophisticated indexing schemes. Typical spatiotemporal indexing techniques represent the objects using minimum bounding regions (MBR) extended with a temporal dimension, which are then indexed using traditional multidimensional index structures. However, rough MBR approximations introduce excessive overlap between index nodes, which deteriorates query performance. This article introduces a robust indexing scheme for answering spatiotemporal queries more efficiently. A number of algorithms and heuristics are elaborated that can be used to preprocess a spatiotemporal archive in order to produce finer object approximations, which, in combination with a multiversion index structure, will greatly improve query performance in comparison to the straightforward approaches. The proposed techniques introduce a query efficiency vs. space tradeoff that can help tune a structure according to available resources. Empirical observations for estimating the necessary amount of additional storage space required for improving query performance by a given factor are also provided. Moreover, heuristics for applying the proposed ideas in an online setting are discussed. Finally, a thorough experimental evaluation is conducted to show the merits of the proposed techniques. Edited by B. Seeger A short version of this article appeared as “Efficient indexing of spatiotemporal objects” in the Proceedings of Extending Database Technology 2002 [19]. This work was partially supported by NSF grants IIS-9907477, EIA-9983445, NSF IIS 9984729, NSF ITR 0220148, NSF IIS-0133825, NRDRP, and the U.S. Department of Defense.  相似文献   

8.
Among many existing distance measures for time series data, Dynamic Time Warping (DTW) distance has been recognized as one of the most accurate and suitable distance measures due to its flexibility in sequence alignment. However, DTW distance calculation is computationally intensive. Especially in very large time series databases, sequential scan through the entire database is definitely impractical, even with random access that exploits some index structures since high dimensionality of time series data incurs extremely high I/O cost. More specifically, a sequential structure consumes high CPU but low I/O costs, while an index structure requires low CPU but high I/O costs. In this work, we therefore propose a novel indexed sequential structure called TWIST (Time Warping in Indexed Sequential sTructure) which benefits from both sequential access and index structure. When a query sequence is issued, TWIST calculates lower bounding distances between a group of candidate sequences and the query sequence, and then identifies the data access order in advance, hence reducing a great number of both sequential and random accesses. Impressively, our indexed sequential structure achieves significant speedup in a querying process. In addition, our method shows superiority over existing rival methods in terms of query processing time, number of page accesses, and storage requirement with no false dismissal guaranteed.  相似文献   

9.
基于DCT的时序数据相似性搜索   总被引:2,自引:0,他引:2  
数据的高维度是造成时序数据相似性搜索困难的主要原因。最有效的解决方法是对时序数据进行维归约,然后对压缩后的数据建立空间索引。目前维归约的方法主要是离散傅立叶变换(DFT)和离散小波变换(DWT)。提出了一种新的方法,利用离散余弦变换(DCT)进行维归约,并在此基础上给出了对时序数据进行范围查询和近邻查询的相似性搜索方法。与基于DFT、DWT的搜索方法相比,该方法在理论分析和实验结果上都显示出较高的效率。  相似文献   

10.
Skyline query processing has recently received a lot of attention in database and data-mining communities. To the best of our knowledge, the existing researches mainly focus on considering how to efficiently return the whole skyline set. However, when the cardinality and dimensionality of input objects increase, the number of skylines grows exponentially, and hence this “huge” skyline set is completely useless to users. On the other hand, in most real applications, the objects are usually clustered, and therefore many objects have similar attribute values. Motivated by the above facts, in this paper, we present a novel type of SkyCluster query to capture the skyline diversity and improve the usefulness of skyline result. The SkyCluster query integrates K-means clustering into skyline computation, and returns K “representative” and “diverse” skyline objects to users. To process such query, a straightforward approach is to simply integrate the existing techniques developed for skyline-only and clustering-only together. But this approach is costly since both skyline computation and K-means clustering are all CPU-sensitive. We propose an efficient evaluation approach which is based on the circinal index to seamlessly integrate subspace skyline computation, K-means clustering and representatives selection. Also, we present a novel optimization heuristic to further improve the query performance. Experimental study shows that our approach is both efficient and effective.  相似文献   

11.
基于矢量量化的快速图像检索   总被引:7,自引:0,他引:7  
叶航军  徐光祐 《软件学报》2004,15(5):712-719
传统索引方法对高维数据存在"维数灾难"的困难.而对数据分布的精确描述及对数据空间的有效划分是高维索引机制中的关键问题.提出一种基于矢量量化的索引方法.该方法使用高斯混合模型描述数据的整体分布,并训练优化的矢量量化器划分数据空间.高斯混合模型能更好地描述真实图像库的数据分布;而矢量量化的划分方法可以充分利用维之间的统计相关性,能够对数据向量构造出更加精确的近似表示,从而提高索引结构的过滤效率并减少需要访问的数据向量.在大容量真实图像库上的实验表明,该方法显著减少了支配检索时间的I/O开销,提高了索引性能.  相似文献   

12.
Recently, databases have been used to store multimedia data such as images, maps, video clips, and music clips. In order to search them, they should be represented by various features, which are composed of high-dimensional vectors. As a result, the dimensionality of data is increased considerably, which causes ‘the curse of dimensionality’. The increase of data dimensionality causes poor performance of index structures. To overcome the problem, the research on the dimensionality reduction has been conducted. However, some reduction methods do not guarantee no false dismissal, while others incur high computational cost. This paper proposes dimensionality reduction techniques that guarantee no false dismissal while providing efficiency considerable by approximating distances with a few values. To provide the no false dismissal property, approximated distances should always be smaller than original distances. The Cauchy–Schwarz inequality and two trigonometrical equations are used as well as the dimension partitioning technique is applied to approximate distances in such a way to reduce the difference between the approximated distance and the original distance. As a result, the proposed techniques reduce the candidate set of a query result for efficient query processing.  相似文献   

13.
索引大规模时序数据库是高效时序搜索中的关键问题.提出了一种新颖的索引方案RQI, 它包括3种过滤策略: 即first-k过滤、索引低边界和上边界以及三角不等式修剪.基本的思想为首先运用Haar小波变换计算每个时序的小波系数,利用前面的k个小波系数形成一个最小边界矩阵,以利用点过滤方法;然后将预先计算每个时序的低边界特征和上边界特征存放到索引当中;最后采用三角不等式来修剪不相似的序列并确保没有漏报.同时提出了一种新的低边界距离函数SLBS和聚类算法CSA.通过CSA可保持索引良好的聚类特征以提高点过滤方法的效率,从而引入了一种更好的算法RQIC.在合成数据集和实时数据集的大量对比实验表明,RQIC是有效的且具备较高的查询效率.  相似文献   

14.
在时间序列数据库中,大多数现有的相似性搜索方法都集中在如何提高算法的效率,而对于由不精确数据组成的时间序列如何进行相似性搜索,则研究比较少,不精确数据经常用区间数据来表示;通过识别区间数时间序列中的重要区间数,使得区间数时间序列的维数大幅度降低,该文针对由区间数组成的时间序列,提出了一种基于低分率聚类的索引方法。实验表明,该方法加快了区间数时间序列的查找过程,不会出现漏报现象。  相似文献   

15.
为提高时间序列相似匹配的精度和效率,提出一种基于小波包变换的时间序列相似匹配算法.首先利用小波包可对信号进行精细分析的特点,对时间序列进行维数约简,用变换后的低频系数和部分高频均值系数作为特征向量表示原始序列;然后用多维索引结构R树存储这些特征向量,将欧几里德距离作为相似尺度,在此基础上实现了范围查询和k近邻查询,对电力负荷时间序列数据的仿真实验结果表明了算法的有效性。  相似文献   

16.
Skyline query is of great importance in many applications, such as multi-criteria decision making and business planning. In particular, a skyline point is a data object in the database whose attribute vector is not dominated by that of any other objects. Previous methods to retrieve skyline points usually assume static data objects in the database (i.e. their attribute vectors are fixed), whereas several recent work focus on skyline queries with dynamic attributes. In this paper, we propose a novel variant of skyline queries, namely metric skyline, whose dynamic attributes are defined in the metric space (i.e. not limited to the Euclidean space). We illustrate an efficient and effective pruning mechanism to answer metric skyline queries through a metric index. Most importantly, we formalize the query performance of the metric skyline query in terms of the pruning power, by a cost model, in light of which we construct an optimized metric index aiming to maximize the pruning power of metric skyline queries. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed pruning techniques as well as the constructed index in answering metric skyline queries.  相似文献   

17.

Computer vision techniques enhanced by the advent of deep learning has become a quintessential part of our day-to-day life. The application of such computer vision techniques in image retrieval can be termed as query based image retrieval process. Conventional methods have limitations such as increased dimensionality, reduced accuracy, high time consumption, and dependence on indexing for retrieval. In order to overcome these limitations, this research work aims to develop a new image retrieval system by developing an image preprocessing mechanism via target prediction technique, which isolates object from the background. Further, a Micro-structure based Pattern Extraction (MPE) technique is implemented to extract the patterns from the preprocessed image, where the diagonal patterns are generated for increasing the accuracy of the retrieval process. Consequently, the Convolutional Neural Network (CNN) is utilized to reduce the dimensionality of the features, and the similarity learning approach is utilized to map the selected features with trained features based on the distance metric. The performance of the proposed system is evaluated by using various measures. Thereby, the efficiency of the proposed technique is ascertained by comparing it with the existing techniques.

  相似文献   

18.
Experiencing SAX: a novel symbolic representation of time series   总被引:15,自引:3,他引:15  
Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. While many symbolic representations of time series have been introduced over the past decades, they all suffer from two fatal flaws. First, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Second, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series. In this work we formulate a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measures defined on the original series. As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation, while producing identical results to the algorithms that operate on the original data. In particular, we will demonstrate the utility of our representation on various data mining tasks of clustering, classification, query by content, anomaly detection, motif discovery, and visualization.  相似文献   

19.
用基于移动均值的索引实现时间序列相似查询   总被引:2,自引:0,他引:2  
林子雨  杨冬青  王腾蛟 《软件学报》2008,19(9):2349-2361
提出了基于移动均值的索引来解决子序列匹配中的"ε-查询"问题:提出并证明了基于移动均值的缩距定理和缩距比关系定理,后者具有很好的"裁减"能力,可以在相似查询时淘汰大部分不符合条件的候选时间序列,从而达到快速相似查找的目的;引入了由Jagadish等人提出的BATON~*-树,并在此基础上适当修改,建立了MABI索引,极大地加快了相似查询过程;最后,在一个股票交易数据集上进行了实验,证明了MABI索引的良好性能.  相似文献   

20.
数据广播环境下位置相关skyline查询是同时涉及空间位置属性和非空间多维属性的一类新的skyline查询类型,可广泛地应用于地理信息系统、城市规划、智能交通等领域。与传统环境下的位置相关skyline查询相比,数据广播环境下位置skyline查询面临一些新的问题,如广播信道的线性特性、移动设备资源受限性等。针对这些问题,本文提出了基于数据共享的位置相关查询算法,该方法通过共享邻近移动设备缓存的查询结果来改进查询算法的性能。广泛的实验结果显示,在移动设备密度较大的对等网络中,本文提出的算法具有较明显的优势,能显著地提升查询性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号