期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimizing multiple dimensional queries simultaneously in multidimensional databases 总被引：1，自引：0，他引：1

Weifa Liang Maria E. Orlowska Jeffrey X. Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):319-338

Some significant progress related to multidimensional data analysis has been achieved in the past few years, including the design of fast algorithms for computing datacubes, selecting some precomputed group-bys to materialize, and designing efficient storage structures for multidimensional data. However, little work has been carried out on multidimensional query optimization issues. Particularly the response time (or evaluation cost) for answering several related dimensional queries simultaneously is crucial to the OLAP applications. Recently, Zhao et al. first exploited this problem by presenting three heuristic algorithms. In this paper we first consider in detail two cases of the problem in which all the queries are either hash-based star joins or index-based star joins only. In the case of the hash-based star join, we devise a polynomial approximation algorithm which delivers a plan whose evaluation cost is $ O(n^{\epsilon }$) times the optimal, where n is the number of queries and is a fixed constant with . We also present an exponential algorithm which delivers a plan with the optimal evaluation cost. In the case of the index-based star join, we present a heuristic algorithm which delivers a plan whose evaluation cost is n times the optimal, and an exponential algorithm which delivers a plan with the optimal evaluation cost. We then consider a general case in which both hash-based star-join and index-based star-join queries are included. For this case, we give a possible improvement on the work of Zhao et al., based on an analysis of their solutions. We also develop another heuristic and an exact algorithm for the problem. We finally conduct a performance study by implementing our algorithms. The experimental results demonstrate that the solutions delivered for the restricted cases are always within two times of the optimal, which confirms our theoretical upper bounds. Actually these experiments produce much better results than our theoretical estimates. To the best of our knowledge, this is the only development of polynomial algorithms for the first two cases which are able to deliver plans with deterministic performance guarantees in terms of the qualities of the plans generated. The previous approaches including that of [ZDNS98] may generate a feasible plan for the problem in these two cases, but they do not provide any performance guarantee, i.e., the plans generated by their algorithms can be arbitrarily far from the optimal one. Received: July 21, 1998 / Accepted: August 26, 1999 相似文献

2.

Functional-join processing

R. Braumandl J. Claussen A. Kemper D. Kossmann 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):156-177

Inter-object references are one of the key concepts of object-relational and object-oriented database systems. In this work, we investigate alternative techniques to implement inter-object references and make the best use of them in query processing, i.e., in evaluating functional joins. We will give a comprehensive overview and performance evaluation of all known techniques for simple (single-valued) as well as multi-valued functional joins. Furthermore, we will describe special order-preserving\/ functional-join techniques that are particularly attractive for decision support queries that require ordered results. While most of the presentation of this paper is focused on object-relational and object-oriented database systems, some of the results can also be applied to plain relational databases because index nested-loop joins\/ along key/foreign-key relationships, as they are frequently found in relational databases, are just one particular way to execute a functional join. Received February 28, 1999 / Accepted September 27, 1999 相似文献

3.

Applying segmented right-deep trees to pipelining multiple hashjoins

Ming-Syan Chen Mingling Lo Yu P.S. Young H.C. 《Knowledge and Data Engineering, IEEE Transactions on》1995,7(4):656-668

The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join for processing. The execution of a query is usually denoted by a query execution tree. To improve the execution of pipelined hash joins, an innovative approach to query execution tree selection is proposed to exploit segmented right-deep trees, which are bushy trees of right-deep subtrees. We first derive an analytical model for the execution of a pipeline segment, and then, in the light of the model, we develop heuristic schemes to determine the query execution plan based on a segmented right-deep tree so that the query can be efficiently executed. As shown by our simulation, the proposed approach, without incurring additional overhead on plan execution, possesses more flexibility in query plan generation, and can lead to query plans of better performance than those achievable by the previous schemes using right-deep trees 相似文献

4.

On applying hash filters to improving the execution of multi-join queries 总被引：1，自引：0，他引：1

Ming-Syan Chen Hui-I Hsiao Philip S. Yu 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(2):121-131

In this paper, we explore an approach of interleaving a bushy execution tree with hash filters to improve the execution of multi-join queries. Similar to semi-joins in distributed query processing, hash filters can be applied to eliminate non-matching tuples from joining relations before the execution of a join, thus reducing the join cost. Note that hash filters built in different execution stages of a bushy tree can have different costs and effects. The effect of hash filters is evaluat ed first. Then, an efficient scheme to determine an effective sequence of hash filters for a bushy execution tree is developed, where hash filters are built and applied based on the join sequence specified in the bushy tree so that not only is the reduction effect optimized but also the cost associated is minimized. Various schemes using hash filters are implemented and evaluated via simulation. It is experimentally shown that the application of hash filters is in general a very powerful means to improve th e execution of multi-join queries, and the improvement becomes more prominent as the number of relations in a query increases. Edited by G. Gardarin. Received October 1994 / Accepted December 1995 相似文献

5.

Seeking the truth about ad hoc join costs

Laura M. Haas Michael J. Carey Miron Livny Amit Shukla 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(3):241-256

In this paper, we re-examine the results of prior work on methods for computing ad hoc joins. We develop a detailed cost model for predicting join algorithm performance, and we use the model to develop cost formulas for the major ad hoc join methods found in the relational database literature. We show that various pieces of “common wisdom” about join algorithm performance fail to hold up when analyzed carefully, and we use our detailed cost model to derive op timal buffer allocation schemes for each of the join methods examined here. We show that optimizing their buffer allocations can lead to large performance improvements, e.g., as much as a 400% improvement in some cases. We also validate our cost model's predictions by measuring an actual implementation of each join algorithm considered. The results of this work should be directly useful to implementors of relational query optimizers and query processing systems. Edited by M. Adiba. Received May 1993 / Accepted April 1996 相似文献

6.

Approximate query processing using wavelets 总被引：7，自引：0，他引：7

Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(2-3):199-223

Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data. Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001 相似文献

7.

Supporting top-<Emphasis Type="Italic">k</Emphasis><Emphasis Type="Italic">join</Emphasis> queries in relational databases

Ihab?F.?Ilyas Email author Walid?G.?Aref Ahmed?K.?Elmagarmid 《The VLDB Journal The International Journal on Very Large Data Bases》2004,13(3):207-221

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004Edited by: S. AbiteboulExtended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765 相似文献

8.

Cost-based holistic twig joins

《Information Systems》2015

An evaluation of XML queries such as XQuery or XPath expressions represents a challenging task due to its complexity. Many algorithms have been introduced to cope with this problem. Some of them, called binary joins, evaluate separated parts of a query and subsequently merge intermediate results, while the others, called holistic twig joins, evaluate a query as a whole. Moreover, these algorithms also differ in what index data structure they use to handle XML data. There exist cost-based approaches utilizing binary joins and various index data structures; however, they share a limitation. The limitation is that they cannot perform a join between query nodes not having a direct XPath relationship. Such a join can be advantageous especially if their joint selectivity is high. Since holistic joins work with all query nodes they overcome this limitation. In this article, we introduce such a holistic twig join called CostTwigJoin. To the best of our knowledge, CostTwigJoin is the first holistic join capable of combining various index data structures during an evaluation of an XML query. Usage of the holistic join has yet another advantage for cost-based approaches: an optimizer does not have to resolve the order of binary joins; therefore, the search space is reduced. In this article, we perform thorough experiments on hundreds of queries to evaluate our approach and demonstrate its advantages. 相似文献

9.

A configurable type hierarchy index for OODB

Thomas A. Mueck Martin L. Polaschek 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(4):312-332

With respect to the specific requirements of advanced OODB applications, index data structures for type hierarchies in OODBMS have to provide efficient support for multiattribute queries and have to allow index optimization for a particular query profile. We describe the multikey type index and an efficient implementation of this indexing scheme. It meets both requirements: in addition to its multiattribute query capabilities it is designed as a mediator between two standard design alternatives, key-grouping and type-grouping. A prerequisite for the multikey type index is a linearization algorithm which maps type hierarchies to linearly ordered attribute domains in such a way that each subhierarchy is represented by an interval of this domain. The algorithm extends previous results with respect to multiple inheritance. The subsequent evaluation of our proposal focuses on storage space overhead as well as on the number of disk I/O operations needed for query execution. The analytical results for the multikey type index are compared to previously published figures for well-known single-key search structures. The comparison clearly shows the superiority of the multikey type index for a large class of query profiles. Edited by E. Bertino. Received October 7, 1996 / Accepted March 28, 1997 相似文献

10.

RDF partitioning for scalable SPARQL query processing

Xiaoyan WANG Tao YANG Jinchuan CHEN Long HE Xiaoyong DU 《Frontiers of Computer Science》2015,9(6):919

The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even totally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically partitioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these proposed approaches have been evaluated by extensive experiments over large RDF data sets. 相似文献