期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Duality for Goal-Driven Query Processing in Disjunctive Deductive Databases

Adnan H. Yahya 《Journal of Automated Reasoning》2002,28(1):1-34

Bottom-up query-answering procedures tend to explore a much larger search space than what is strictly needed. Top-down processing methods use the query to perform a more focused search that can result in more efficient query answering. Given a disjunctive deductive database, DB, and a query, Q, we establish a strong connection between model generation and clause derivability in two different representations of DB and Q. This allows us to use a bottom-up procedure for evaluating Q against DB in a top-down fashion. The approach requires no extensive rewriting of the input theory and introduces no new predicates. Rather, it is based on a certain duality principle for interpreting logical connectives. The duality transformation is achieved by reversing the direction of implication arrows in the clauses representing both the theory and the negation of the query. The application of a generic bottom-up procedure to the transformed clause set results in top-down query answering. Under favorable conditions efficiency gains are substantial, as shown by our preliminary testing. We give the logical meaning of the duality transformation and point to the conditions and sources of improved efficiency. We show how the duality approach can be used for refined query answering by specifying the minimal conditions (weakest updates) to DB under which Q becomes derivable. This is shown to be useful for view updates in disjunctive deductive databases as well as for other interesting applications. 相似文献

2.

Continuous query processing over music streams based on approximate matching mechanisms

Hung-Chen Chen Yi-Hung Wu Yu-Chi Soo Arbee L. P. Chen 《Multimedia Systems》2008,14(1):51-70

It is foreseen that more and more music objects in symbolic format and multimedia objects, such as audio, video, or lyrics, integrated with symbolic music representation (SMR) will be published and broadcasted via the Internet. The SMRs of the flowing songs or multimedia objects will form a music stream. Many interesting applications based on music streams, such as interactive music tutorials, distance music education, and similar theme searching, make the research of content-based retrieval over music streams much important. We consider multiple queries with error tolerances over music streams and address the issue of approximate matching in this environment. We propose a novel approach to continuously process multiple queries over the music streams for finding all the music segments that are similar to the queries. Our approach is based on the concept of n-grams, and two mechanisms are designed to reduce the heavy computation of approximate matching. One mechanism uses the clustering of query n-grams to prune the query n-grams that are irrelevant to the incoming data n-gram. The other mechanism records the data n-gram that matches a query n-gram as a partial answer and incrementally merges the partial answers of the same query. We implement a prototype system for experiments in which songs in the MIDI format are continuously broadcasted, and the user can specify musical segments as queries to monitor the music streams. Experiment results show the effectiveness and efficiency of the proposed approach. 相似文献

3.

Semantics and evaluation of top-<Emphasis Type="Italic">k</Emphasis> queries in probabilistic databases

Xi Zhang Jan Chomicki 《Distributed and Parallel Databases》2009,26(1):67-126

We study here fundamental issues involved in top-k query evaluation in probabilistic databases. We consider simple probabilistic databases in which probabilities are associated with individual tuples, and general probabilistic databases in which, additionally, exclusivity relationships between tuples can be represented. In contrast to other recent research in this area, we do not limit ourselves to injective scoring functions. We formulate three intuitive postulates for the semantics of top-k queries in probabilistic databases, and introduce a new semantics, Global-Topk, that satisfies those postulates to a large degree. We also show how to evaluate queries under the Global-Topk semantics. For simple databases we design dynamic-programming based algorithms. For general databases we show polynomial-time reductions to the simple cases, and provide effective heuristics to speed up the computation in practice. For example, we demonstrate that for a fixed k the time complexity of top-k query evaluation is as low as linear, under the assumption that probabilistic databases are simple and scoring functions are injective. Research partially supported by NSF grant IIS-0307434. An earlier version of some of the results in this paper was presented in [1]. 相似文献

4.

Holistically Stream-based Processing Xtwig Queries

Guoren Wang Bo Ning Ge Yu 《World Wide Web》2008,11(4):407-425

Unlike a twig query, an Xtwig query contains some selection predicates with reverse axes which are either ancestor or parent. To evaluate such queries in the stream-based context, some rewriting rules have been proposed to transform the paths with reverse axes into equivalent reverse-axis-free ones. However, the transformation method is expensive due to multiple scanning input streams and the generation of unnecessary intermediate results. To solve these problems, a holistic stream-based algorithm XtwigStack is proposed for Xtwig queries. Experiments show that XtwigStack is much more efficient than the transformation method. 相似文献

5.

Partial Evaluation of Views

Parke Godfrey Jarek Gryz 《Journal of Intelligent Information Systems》2001,16(1):21-39

Many database applications and environments, such as mediation over heterogeneous database sources and data warehousing for decision support, lead to complex queries. Queries are often nested, defined over previously defined views, and may involve unions. There are good reasons why one might want to remove pieces (sub-queries or sub-views) from such queries: some sub-views of a query may be effectively cached from previous queries, or may be materialized views; some may be known to evaluate empty, by reasoning over the integrity constraints; and some may match protected queries, which for security cannot be evaluated for all users.In this paper, we present a new evaluation strategy with respect to queries defined over views, which we call tuple-tagging, that allows for an efficient removal of sub-views from the query. Other approaches to this are to rewrite the query so the sub-views to be removed are effectively gone, then to evaluate the rewritten query. With the tuple tagging evaluation, no rewrite of the original query is necessary.We describe formally a discounted query (a query with sub-views marked that are to be considered as removed), present the tuple tagging algorithm for evaluating discounted queries, provide an analysis of the algorithm's performance, and present some experimental results. These results strongly support the tuple-tagging algorithm both as an efficient means to effectively remove sub-views from a view query during evaluation, and as a viable optimization strategy for certain applications. The experiments also suggest that rewrite techniques for this may perform worse than the evaluation of the original query, and much worse than the tuple tagging approach. 相似文献

6.

Indexing and querying XML using extended Dewey labeling scheme 总被引：1，自引：0，他引：1

Jiaheng LuAuthor Vitae Xiaofeng MengAuthor VitaeTok Wang LingAuthor Vitae 《Data & Knowledge Engineering》2011,70(1):35-59

Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag + level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance. 相似文献

7.

Incremental sequence-based frequent query pattern mining from XML queries

Guoliang Li Jianhua Feng Jianyong Wang Lizhu Zhou 《Data mining and knowledge discovery》2009,18(3):472-516

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i ⁺, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly. 相似文献

8.

An efficient processing of a chain join with the minimum communication cost in distributed database systems

Xuemin Lin Maria E. Orlowska 《Distributed and Parallel Databases》1995,3(1):69-83

This paper investigates the optimization problem when executing a join in a distributed database environment. The minimization of the communication cost for sending data through links has been adopted as an optimization criterion. We explore in this paper the approach of judiciously using join operations as reducers in distributed query processing. In general, this problem is computationally intractable. A restriction of the execution of a join in a pre-defined combinatorial order leads to a possible solution in polynomial time. An algorithm for a chain query computation has been proposed in [21]. The time complexity of the algorithm isO(m ² n ²+m ³ n), wheren is the number of sites in the network, andm is the number of relations (fragments) involved in the join. In this paper, we firstly present a proof of the intuitively well understood fact—that the eigenorder of a chain join will be the best pre-defined combinatorial order to implement the algorithm in [21]. Secondly, we show a sufficient and necessary condition for a chain query with the eigenordering to be a simple query. For the process of the class of simple queries, we show a significant reduction of the time complexity fromO(m ² n ²+m ³ n) toO(mn+m ²). It is encouraging that, in practice, the most frequent queries belong to the category of simple queries. Editor: Peter Apers 相似文献

9.

CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

Joachim Hammer Lixin Fu 《Distributed and Parallel Databases》2003,14(3):221-254

We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST⁺⁺ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST⁺⁺ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select andmaterialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems. 相似文献

10.

Algorithms for Nearest Neighbor Search on Moving Object Trajectories 总被引：2，自引：1，他引：1

Elias Frentzos Kostas Gratsias Nikos Pelekis Yannis Theodoridis 《GeoInformatica》2007,11(2):159-193

Nearest Neighbor (NN) search has been in the core of spatial and spatiotemporal database research during the last decade. The literature on NN query processing algorithms so far deals with either stationary or moving query points over static datasets or future (predicted) locations over a set of continuously moving points. With the increasing number of Mobile Location Services (MLS), the need for effective k-NN query processing over historical trajectory data has become the vehicle for data analysis, thus improving existing or even proposing new services. In this paper, we investigate mechanisms to perform NN search on R-tree-like structures storing historical information about moving object trajectories. The proposed (depth-first and best-first) algorithms vary with respect to the type of the query object (stationary or moving point) as well as the type of the query result (historical continuous or not), thus resulting in four types of NN queries. We also propose novel metrics to support our search ordering and pruning strategies. Using the implementation of the proposed algorithms on two members of the R-tree family for trajectory data (namely, the TB-tree and the 3D-R-tree), we demonstrate their scalability and efficiency through an extensive experimental study using large synthetic and real datasets.

Yannis Theodoridis (Corresponding author)Email: URL: http://dke.cti.gr http://isl.cs.unipi.gr/db

相似文献

11.

Approximate query processing using wavelets 总被引：7，自引：0，他引：7

Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(2-3):199-223

Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data. Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001 相似文献

12.

Efficient temporal counting with bounded error

Yufei Tao Xiaokui Xiao 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(5):1271-1292

This paper studies aggregate search in transaction time databases. Specifically, each object in such a database can be modeled as a horizontal segment, whose y-projection is its search key, and its x-projection represents the period when the key was valid in history. Given a query timestamp q _t and a key range , a count-query retrieves the number of objects that are alive at q _t, and their keys fall in . We provide a method that accurately answers such queries, with error less than , where N _alive(q _t) is the number of objects alive at time q _t, and ɛ is any constant in (0, 1]. Denoting the disk page size as B, and n = N / B, our technique requires O(n) space, processes any query in O(log_B n) time, and supports each update in O(log_B n) amortized I/Os. As demonstrated by extensive experiments, the proposed solutions guarantee query results with extremely high precision (median relative error below 5%), while consuming only a fraction of the space occupied by the existing approaches that promise precise results. 相似文献

13.

XFlat: Query-friendly encrypted XML view publishing

Jun Gao Tengjiao Wang 《Information Sciences》2008,178(3):774-787

The security of published XML data receives exceptional attention due to its sensitive nature in many applications. This paper proposes an XML view publishing method called XFlat. Compared with other methods, XFlat focuses on query performance over the published XML view while simultaneously protecting the sensitive data via encryption techniques. XFlat decomposes an XML tree into a set of sub-trees, in each of which multiple users have the same accessibility to all nodes, and may encrypt and store each sub-tree in a flat, sequential manner. This storage strategy can avoid the nested encryption cost in view construction and the nested decryption cost in query evaluation. In addition, we discuss how to generate a user-specific schema and how to minimize the total space cost of the published XML view when considering the overhead of the relationships among the sub-trees. We also propose an XML schema index to enhance query performance over the final XML view. The experimental results demonstrate the effectiveness and efficiency of the proposed XFlat method. 相似文献

14.

k-Nearest Neighbor Query Processing Algorithms for a Query Region in Road Networks

下载免费PDF全文

Hyeong-Il Kim Jae-Woo Chang 《计算机科学技术学报》2013,28(4):585-596

Recent development of wireless communication technologies and the popularity of smart phones are making location-based services (LBS) popular. However, requesting queries to LBS servers with users’ exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an effcient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To effciently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time. 相似文献

15.

Histograms based on the minimum description length principle

Hai Wang Kenneth C. Sevcik 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(3):419-442

相似文献

16.

基于图压缩的k可达查询处理

李鸣鹏高宏邹兆年《软件学报》2014,25(4):797-812

研究了基于图压缩的k可达查询处理,提出了一种支持k可达查询的图压缩算法k-RPC及无需解压缩的查询处理算法,k-RPC算法在所有基于等价类的支持k-reach查询的图压缩算法中是最优的.由于k-RPC算法是基于严格的等价关系,因此进一步又提出了线性时间的近似图压缩算法k-GRPC.k-GRPC算法允许从原始图中删除部分边,然后使用k-RPC获得更好的压缩比.提出了线性时间的无需解压缩的查询处理算法.真实数据上的实验结果表明,对于稀疏的原始图,两种压缩算法的压缩比分别可以达到45%,对于稠密的原始图,两种压缩算法的压缩比分别可以达到75%和67%;与在原始图上直接进行查询处理相比,两种基于压缩图的查询处理算法效率更好,在稀疏图上的查询效率可以提高2.5倍. 相似文献

17.

An algebraic transformation framework for multidatabase queries

Ee-Peng Lim Jaideep Srivastava San-Yih Hwang 《Distributed and Parallel Databases》1995,3(3):273-307

Existence of semantic conflicts between component databases severely impacts query processing in a multidatabase system. In this paper, we describe two types of semantic conflicts that have to be dealt with in the integration of databases modeling information about related sets of real-world entities. These are the entityidentification problem and theattribute value conflict problem. While thetwo-way outerjoin operation has been commonly used for resolving entity identification problem between two component relations, outerjoins using regular equality comparisons between component relation keys is shown to produce counter-intuitive entity identification result. We remedy this by defining a newkey-equality comparator in place of regular equality comparator, for outerjoins. For the attribute value conflict problem, we define aGeneralized Attribute Derivation (GAD) operation which allows user-defined attribute derivation functions to be used to compute new attributes from the component relations' attributes. By adding two-way outerjoin andGAD to the set of relational operations, the traditional algebraic transformation framework for relational queries is no longer adequate for multidatabase query processing and optimization. As a result, we introduceconstrained query tree as the multidatabase query representation. We show that some knowledge about query predicates and attribute derivation functions can be used to simplify queries. Such knowledge is modeled as an outerjoin graph attached to every outerjoin operation in the query tree. Based on this, we further extend the traditional algebraic transformation framework to include two-way outerjoins andGAD operations. Our framework demonstrates that properties of selection/join predicates and attribute derivation functions can be used to provide interesting transformation alternatives. This framework also serves as a formal ground for developing optimization strategies for multidatabase queries. Recommended by: Clement Yu 相似文献

18.

Query Size Estimation for Joins Using Systematic Sampling

A.H.H. Ngu B. Harangsri J. Shepherd 《Distributed and Parallel Databases》2004,15(3):237-275

We propose a new approach to the estimation of query result sizes for join queries. The technique, which we have called systematic sampling—SYSSMP, is a novel variant of the sampling-based approach. A key novelty of the systematic sampling is that it exploits the sortedness of data; the result of this is that the sample relation obtained well represents the underlying frequency distribution of the join attribute in the original relation.We first develop a theoretical foundation for systematic sampling which suggests that the method gives a more representative sample than the traditional simple random sampling. Subsequent experimental analysis on a range of synthetic relations confirms that the quality of sample relations yielded by systematic sampling is higher than those produced by the traditional simple random sampling.To ensure that sample relations produced by systematic sampling indeed assist in computing more accurate query result sizes, we compare systematic sampling with the most efficient simple random sampling called t_cross using a variety of relation configurations. The results obtained validate that systematic sampling uses the same amount of sampling but still provides more accurate query result sizes than t_cross. Furthermore, the extra sampling cost incurred by the use of systematic sampling pays off in a cheaper query execution cost at run-time. 相似文献

19.

能效查询处理与优化可行性初探

吴岩杨良怀潘建《计算机系统应用》2014,23(4):178-182

随着数据管理需求的不断增长,降低与控制数据中心的能耗成为一个挑战性问题. DBMS 是数据中心核心软件,能效查询处理与优化是其中一个重要议题. 本文提出了新型的能耗代价评估模型,通过评估查询计划的时间和能耗代价,考察了不同优化目标在不同硬件条件下对查询处理的影响. 实验表明,传统硬件下面向性能的优化与面向能耗的优化结果是一致的;在新硬件条件下,两者结果则不同,可以改进数据库系统能效. 相似文献

20.

Rule-based spatiotemporal query processing for video databases

Mehmet?Emin?D?nderler ?zgür?Ulusoy Email author Ugur?Güdükbay 《The VLDB Journal The International Journal on Very Large Data Bases》2004,13(1):86-103

In our earlier work, we proposed an architecture for a Web-based video database management system (VDBMS) providing an integrated support for spatiotemporal and semantic queries. In this paper, we focus on the task of spatiotemporal query processing and also propose an SQL-like video query language that has the capability to handle a broad range of spatiotemporal queries. The language is rule-based in that it allows users to express spatial conditions in terms of Prolog-type predicates. Spatiotemporal query processing is carried out in three main stages: query recognition, query decomposition, and query execution.Received: 11 October 2001, Accepted: 3 October 2003, Published online: 12 December 2003Edited by: A. Buchmann Correspondence to: Özgür UlusoyThis work is supported by the Scientific and Research Council of Turkey (TÜBITAK) under Project Code 199E025. This work was done while the first author was at Bilkent University. 相似文献