首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Approximate pattern matching algorithms have become an important tool in computer assisted music analysis and information retrieval. The number of different problem formulations has greatly expanded in recent years, not least because of the subjective nature of measuring musical similarity. From an algorithmic perspective, the complexity of each problem depends crucially on the exact definition of the difference between two strings. We present an overview of advances in approximate string matching in this field focusing on new measures of approximation.  相似文献   

2.
Mining of music data is one of the most important problems in multimedia data mining. In this paper, two research issues of mining music data, i.e., online mining of music query streams and change detection of music query streams, are discussed. First, we proposed an efficient online algorithm, FTP-stream (Frequent Temporal Pattern mining of streams), to mine all frequent melody structures over sliding windows of music melody sequence streams. An effective bit-sequence representation is used in the proposed algorithm to reduce the time and memory needed to slide the windows. An effective list structure is developed in the FTP-stream algorithm to overcome the performance bottleneck of 2-candidate generation. Experiments show that the proposed algorithm FTP-stream only needs a half of memory requirement of original melody sequence data, and just scans the music query stream once. After mining frequent melody structures, we developed a simple online algorithm, MQS-change (changes of Music Query Streams), to detect the changes of frequent melody structures in current user-centered music query streams. Two music melody structures (set of chord-sets and string of chord-sets) are maintained and four melody structure changes (positive burst, negative burst, increasing change and decreasing change) are monitored in a new summary data structure, MSC-list (a list of Music Structure Changes). Experiments show that the MQS-change algorithm is an effective online method to detect the changes of music melody structures over continuous music query streams.
Hua-Fu LiEmail:
  相似文献   

3.
针对具有子孙轴(//)和谓词([])结构特征的XPath对具有不同递归深度的XML数据流进行递归查询处理问题,提出了基于下推自动机技术的处理方法,通过将XPath各类置步转化成相对应的处理模块,由算法将各类处理模块组合起来,建立了自上而下的树状查询模型.由于查询过程中将会发生多重匹配,从而会产生大量的匹配模式,该模型通过有效的匹配策略和缓存操作,对匹配模式进行保存及检验,成功地实现XML数据流递归查询.实验结果表明,该算法在性能上要优于传统方法.  相似文献   

4.
Compressed Data Cube for Approximate OLAP Query Processing   总被引:4,自引:0,他引:4       下载免费PDF全文
Approximate query processing has emerged as an approach to dealing with the huge data volume and complex queries in the environment of data warehouse.In this paper,we present a novel method that provides approximate answers to OLAP queries.Our method is based on building a compressed (approximate) data cube by a clustering technique and using this compressed data cube to provide answers to queries directly,so it improves the performance of the queries.We also provide the algorithm of the OLAP queries and the confidence intervals of query results.An extensive experimental study with the OLAP council benchmark shows the effectiveness and scalability of our cluster-based approach compared to sampling.  相似文献   

5.
K.  Wen-Syan  M.   《Data & Knowledge Engineering》2000,35(3):259-298
Since media-based evaluation yields similarity values, results to a multimedia database query, Q(Y1,…,Yn), is defined as an ordered list SQ of n-tuples of the form X1,…,Xn. The query Q itself is composed of a set of fuzzy and crisp predicates, constants, variables, and conjunction, disjunction, and negation operators. Since many multimedia applications require partial matches, SQ includes results which do not satisfy all predicates. Due to the ranking and partial match requirements, traditional query processing techniques do not apply to multimedia databases. In this paper, we first focus on the problem of “given a multimedia query which consists of multiple fuzzy and crisp predicates, providing the user with a meaningful final ranking”. More specifically, we study the problem of merging similarity values in queries with multiple fuzzy predicates. We describe the essential multimedia retrieval semantics, compare these with the known approaches, and propose a semantics which captures the requirements of multimedia retrieval problem. We then build on these results in answering the related problem of “given a multimedia query which consists of multiple fuzzy and crisp predicates, finding an efficient way to process the query.” We develop an algorithm to efficiently process queries with unordered fuzzy predicates (sub-queries). Although this algorithm can work with different fuzzy semantics, it benefits from the statistical properties of the semantics proposed in this paper. We also present experimental results for evaluating the proposed algorithm in terms of quality of results and search space reduction.  相似文献   

6.
针对数据流上连续查询处理的特征,我们从选择率和执行时间的角度出发,考虑内存使用量和输出延迟适应性因素,提出一种适应性的查询处理策略—HoliAdapt。该策略基于查询窗口动态地收集统计信息,利用数学方法不断地优化查询计划,通过核心调度方法,对操作符进行适应性的调度,有效地减少时间延迟和内存使用量,提高系统查询的效率。  相似文献   

7.
Query processing in data grids is a difficult issue due to the heterogeneous, unpredictable and volatile behaviors of the grid resources. Applying join operations on remote relations in data grids is a unique and interesting problem. However, to the best of our knowledge, little is done to date on multi-join query processing in data grids. An approach for processing multi-join queries is proposed in this paper. Firstly, a relation-reduction algorithm for reducing the sizes of operand relations is presented in order to minimize data transmission cost among grid nodes. Then, a method for scheduling computer nodes in data grids is devised to parallel process multi-join queries. Thirdly, an innovative method is developed to efficiently execute join operations in a pipeline fashion. Finally, a complete algorithm for processing multi-join queries is given. Analytical and experimental results show the effectiveness and efficiency of the proposed approach.  相似文献   

8.
There is growing interest in algorithms for processing and querying continuous data streams (i.e., data seen only once in a fixed order) with limited memory resources. In its most general form, a data stream is actually an update stream, i.e., comprising data-item deletions as well as insertions. Such massive update streams arise naturally in several application domains (e.g., monitoring of large IP network installations or processing of retail-chain transactions). Estimating the cardinality of set expressions defined over several (possibly distributed) update streams is perhaps one of the most fundamental query classes of interest; as an example, such a query may ask what is the number of distinct IP source addresses seen in passing packets from both router R 1 and R 2 but not router R 3?. Earlier work only addressed very restricted forms of this problem, focusing solely on the special case of insert-only streams and specific operators (e.g., union). In this paper, we propose the first space-efficient algorithmic solution for estimating the cardinality of full-fledged set expressions over general update streams. Our estimation algorithms are probabilistic in nature and rely on a novel, hash-based synopsis data structure, termed 2-level hash sketch. We demonstrate how our 2-level hash sketch synopses can be used to provide low-error, high-confidence estimates for the cardinality of set expressions (including operators such as set union, intersection, and difference) over continuous update streams, using only space that is significantly sublinear in the sizes of the streaming input (multi-)sets. Furthermore, our estimators never require rescanning or resampling of past stream items, regardless of the number of deletions in the stream. We also present lower bounds for the problem, demonstrating that the space usage of our estimation algorithms is within small factors of the optimal. Finally, we propose an optimized, time-efficient stream synopsis (based on 2-level hash sketches) that provides similar, strong accuracy-space guarantees while requiring only guaranteed logarithmic maintenance time per update, thus making our methods applicable for truly rapid-rate data streams. Our results from an empirical study of our synopsis and estimation techniques verify the effectiveness of our approach.Received: 20 October 2003, Accepted: 16 April 2004, Published online: 14 September 2004Edited by: J. Gehrke and J. Hellerstein.Sumit Ganguly: sganguly@cse.iitk.ac.in Current affiliation: Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India  相似文献   

9.
Approximate query processing using wavelets   总被引:7,自引:0,他引:7  
Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data. Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001  相似文献   

10.
Query matching on XML streams is challenging work for querying efficiency when the amount of queried stream data is huge and the data can be streamed in continuously. In this paper, the method Syntactic Twig-Query Matching (STQM) is proposed to process queries on an XML stream and return the query results continuously and immediately. STQM matches twig queries on the XML stream in a syntactic manner by using a lexical analyzer and a parser, both of which are built from our lexical-rules and grammar-rules generators according to the user's queries and document schema, respectively. For query matching, the lexical analyzer scans the incoming XML stream and the parser recognizes XML structures for retrieving every twig-query result from the XML stream. Moreover, STQM obtains query results without a post-phase for excluding false positives, which are common in many streaming query methods. Through the experimental results, we found that STQM matches the twig query efficiently and also has good scalability both in the queried data size and the branch degree of the twig query. The proposed method takes less execution time than that of a sequence-based approach, which is widely accepted as a proper solution to the XML stream query.  相似文献   

11.
基于流数据技术的连续查询处理   总被引:1,自引:1,他引:0  
高源  刘佳  刘国华  宋驰 《计算机工程与设计》2004,25(12):2312-2314,2330
网络提供了一个无限的、变化的信息源,使得数据库由传统的静态存储变成了动态的存储,数据变成了流数据,查询变成了连续查询。为了有一个完善的系统来处理大量的数据及查询,提出了一个流查询处理方案,同时满足普通用户和高级用户(有计算机语言基础),并从几个重要的指标来阐述了如何提高查询效率,减少了系统瓶颈,并在原型系统中得到了验证。  相似文献   

12.
Recently, in the area of big data, some popular applications such as web search engines and recommendation systems, face the problem to diversify results during query processing. In this sense, it is both significant and essential to propose methods to deal with big data in order to increase the diversity of the result set. In this paper, we firstly define the diversity of a set and the ability of an element to improve the overall diversity. Based on these definitions, we propose a diversification framework which has good performance in terms of effectiveness and efficiency. Also, this framework has theoretical guarantee on probability of success. Secondly, we design implementation algorithms based on this framework for both numerical and string data. Thirdly, for numerical and string data respectively, we carry out extensive experiments on real data to verify the performance of our proposed framework, and also perform scalability experiments on synthetic data.  相似文献   

13.
可重构数据流SPJ查询处理器的研究   总被引:1,自引:1,他引:0  
数据流的实时处理需要很高的处理速度,一种解决方法是使用协处理器。然而协处理器硬布线是不变的,查询不断变化使其一定时间内综合性能达不到最优。为提高数据流处理速度和资源利用率,采用了可重构的数据流SPJ查询处理器,在具备选择、投影和连接三种查询模块及相应指令集的基础上,根据输入查询的查询树调用相应的模块自适应对FPGA编程,改变自身的硬布线,实现数据流的处理。通过大量实验验证了处理器不仅正确,而且具备高速度和灵活性。  相似文献   

14.
复杂事件处理是一种动态环境下对事件流进行分析的技术。复杂事件处理技术通常基于有限状态自动机实现,匹配过程中会在事件流上产生大量且重叠的部分匹配,有限状态自动机需维护大量的重复匹配状态,导致基于该技术的方法都会出现冗余计算的问题。为了提高复杂事件处理的匹配效率,提出了使用复杂事件实例覆盖技术来实现复杂事件处理的方法。通过设计临时匹配链式分区存储结构以及基于此结构的匹配算法,来利用复杂事件实例覆盖减少冗余计算,从而实现匹配效率的提升。在模拟数据集和真实数据集上进行了实验测试与分析,与两种常用的复杂事件处理技术进行比较。实验表明,提出方法能够在保证匹配正确性的同时有效地减少匹配过程中的冗余计算,提高整体匹配效率。  相似文献   

15.
匹配预处理对XML查询的优化   总被引:1,自引:0,他引:1       下载免费PDF全文
在基于匹配预处理的XML查询算法中,利用现有的三种树匹配模型,按照匹配代价高低得出数据集匹配结果。并在此基础上对现有算法加以改进,引入"匹配预处理"功能,进行一系列的实验。结果表明,当数据规模庞大时,该算法去除树中的无用结点,提高了数据集的查询效率,特别是查全率、查准率以及平均响应时间均令人满意。该算法应用于科技资源数据库的统一检索系统中,实现了资源导航,缩小了查找范围,提高系统的易用度。  相似文献   

16.
根据Web服务的特点,提出一种采用两级匹配、带有本体语义功能,提高Web服务匹配的效率,具有基于领域本体能表达清楚自己需要的信息、精确用户查询和基于模糊本体的逐步精确的信息查询模型,对动态电子商务系统的个性化及界面集成提供一个值得借鉴的解决方案。  相似文献   

17.
We introduce the notion of XML Stream Attribute Grammars (XSAGs). XSAGs are the first scalable query language for XML streams (running strictly in linear time with bounded memory consumption independent of the size of the stream) that allows for actual data transformations rather than just document filtering. XSAGs are also relatively easy to use for humans. Moreover, the XSAG formalism provides a strong intuition for which queries can or cannot be processed scalably on streams. We introduce XSAGs together with the necessary language-theoretic machinery, study their theoretical properties such as expressiveness and complexity, and discuss their implementation.  相似文献   

18.
Distributed Hash Tables (DHTs) are scalable, self‐organizing, and adaptive to underlying topology changes, thus being a promising infrastructure for hosting large‐scale distributed applications. The ever‐wider use of DHT infrastructures has found more and more applications that require support for range queries. Recently, a number of DHT‐based range query schemes have been proposed. However, most of them suffer from high query delay or imbalanced load distribution. To address these problems, in this paper we first present an efficient indexing structure called Balanced Kautz (BK) tree that uniformly maps the m‐dimensional data space onto DHT nodes, and then propose a BK tree‐based range query scheme called ERQ that processes range queries in a parallel fashion and guarantees to return the results in a bounded delay. In a DHT with N nodes, ERQ can answer any range of query in less than rmlog N(2loglog N + 1) hops in a load‐balanced manner, irrespective of the queried range, the whole space size, or the number of queried attributes. The effectiveness of our proposals is demonstrated through experiments. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

19.
随着新型数据应用的不断出现,针对流形态数据的数据流管理系统已经成为数据管理领域研究的新热点。针对目前通用数据流管理系统只支持基于操作符流图的查询表达方式这一不足,设计了一种新的持续型数据流查询语言,并在通用数据流处理系统Aurora上进行了实现。为验证新语言的表达能力,该系统使用新语言定义了数据流基准测试Linear Road Benchmark的查询集,在Aurora系统上部署运行。测试结果表明针对Linear Road Benchmark的测试用例,新语言具有较完备的语义和良好的表达能力。  相似文献   

20.
冯永  张洋 《计算机应用》2012,32(6):1688-1691
查询接口模式匹配是Deep Web信息集成中的关键部分,双重相关性挖掘方法(DCM)能有效利用关联挖掘方法解决复杂接口模式匹配问题。针对DCM方法在匹配效率、匹配准确性方面的不足,提出了一种基于匹配度和语义相似度的新模式匹配方法。该方法首先使用矩阵存储属性间的关联关系,然后采用匹配度计算属性间的相关度,最后利用语义相似度计算候选匹配的相似性。通过在美国伊利诺斯大学的BAMM数据集上进行实验,所提方法与DCM及其改进方法比较有更高的匹配效率和准确性,表明该方法能更好地处理接口之间模式匹配问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号