共查询到10条相似文献,搜索用时 171 毫秒
1.
Optimizing multiple dimensional queries simultaneously in multidimensional databases 总被引:1,自引:0,他引:1
Weifa Liang Maria E. Orlowska Jeffrey X. Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):319-338
Some significant progress related to multidimensional data analysis has been achieved in the past few years, including the
design of fast algorithms for computing datacubes, selecting some precomputed group-bys to materialize, and designing efficient
storage structures for multidimensional data. However, little work has been carried out on multidimensional query optimization
issues. Particularly the response time (or evaluation cost) for answering several related dimensional queries simultaneously
is crucial to the OLAP applications. Recently, Zhao et al. first exploited this problem by presenting three heuristic algorithms.
In this paper we first consider in detail two cases of the problem in which all the queries are either hash-based star joins
or index-based star joins only. In the case of the hash-based star join, we devise a polynomial approximation algorithm which
delivers a plan whose evaluation cost is $ O(n^{\epsilon }$) times the optimal, where n is the number of queries and is a fixed constant with . We also present an exponential algorithm which delivers a plan with the optimal evaluation cost. In the case of the index-based
star join, we present a heuristic algorithm which delivers a plan whose evaluation cost is n times the optimal, and an exponential algorithm which delivers a plan with the optimal evaluation cost. We then consider
a general case in which both hash-based star-join and index-based star-join queries are included. For this case, we give a
possible improvement on the work of Zhao et al., based on an analysis of their solutions. We also develop another heuristic
and an exact algorithm for the problem. We finally conduct a performance study by implementing our algorithms. The experimental
results demonstrate that the solutions delivered for the restricted cases are always within two times of the optimal, which
confirms our theoretical upper bounds. Actually these experiments produce much better results than our theoretical estimates.
To the best of our knowledge, this is the only development of polynomial algorithms for the first two cases which are able
to deliver plans with deterministic performance guarantees in terms of the qualities of the plans generated. The previous
approaches including that of [ZDNS98] may generate a feasible plan for the problem in these two cases, but they do not provide
any performance guarantee, i.e., the plans generated by their algorithms can be arbitrarily far from the optimal one.
Received: July 21, 1998 / Accepted: August 26, 1999 相似文献
2.
R. Braumandl J. Claussen A. Kemper D. Kossmann 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):156-177
Inter-object references are one of the key concepts of object-relational and object-oriented database systems. In this work,
we investigate alternative techniques to implement inter-object references and make the best use of them in query processing,
i.e., in evaluating functional joins. We will give a comprehensive overview and performance evaluation of all known techniques
for simple (single-valued) as well as multi-valued functional joins. Furthermore, we will describe special order-preserving\/ functional-join techniques that are particularly attractive for decision support queries that require ordered results. While
most of the presentation of this paper is focused on object-relational and object-oriented database systems, some of the results
can also be applied to plain relational databases because index nested-loop joins\/ along key/foreign-key relationships, as they are frequently found in relational databases, are just one particular way to
execute a functional join.
Received February 28, 1999 / Accepted September 27, 1999 相似文献
3.
Ming-Syan Chen Mingling Lo Yu P.S. Young H.C. 《Knowledge and Data Engineering, IEEE Transactions on》1995,7(4):656-668
The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join for processing. The execution of a query is usually denoted by a query execution tree. To improve the execution of pipelined hash joins, an innovative approach to query execution tree selection is proposed to exploit segmented right-deep trees, which are bushy trees of right-deep subtrees. We first derive an analytical model for the execution of a pipeline segment, and then, in the light of the model, we develop heuristic schemes to determine the query execution plan based on a segmented right-deep tree so that the query can be efficiently executed. As shown by our simulation, the proposed approach, without incurring additional overhead on plan execution, possesses more flexibility in query plan generation, and can lead to query plans of better performance than those achievable by the previous schemes using right-deep trees 相似文献
4.
Ming-Syan Chen Hui-I Hsiao Philip S. Yu 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(2):121-131
In this paper, we explore an approach of interleaving a bushy execution tree with hash filters to improve the execution of
multi-join queries. Similar to semi-joins in distributed query processing, hash filters can be applied to eliminate non-matching
tuples from joining relations before the execution of a join, thus reducing the join cost. Note that hash filters built in
different execution stages of a bushy tree can have different costs and effects. The effect of hash filters is evaluat
ed first. Then, an efficient scheme to determine an effective sequence of hash filters for a bushy execution tree is developed,
where hash filters are built and applied based on the join sequence specified in the bushy tree so that not only is the reduction
effect optimized but also the cost associated is minimized. Various schemes using hash filters are implemented and evaluated
via simulation. It is experimentally shown that the application of hash filters is in general a very powerful means to improve
th
e execution of multi-join queries, and the improvement becomes more prominent as the number of relations in a query increases.
Edited by G. Gardarin. Received October 1994 / Accepted December 1995 相似文献
5.
Laura M. Haas Michael J. Carey Miron Livny Amit Shukla 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(3):241-256
In this paper, we re-examine the results of prior work on methods for computing ad hoc joins. We develop a detailed cost model for predicting join algorithm performance, and we use the model to develop cost formulas
for the major ad hoc join methods found in the relational database literature. We show that various pieces of “common wisdom” about join algorithm
performance fail to hold up when analyzed carefully, and we use our detailed cost model to derive op
timal buffer allocation schemes for each of the join methods examined here. We show that optimizing their buffer allocations
can lead to large performance improvements, e.g., as much as a 400% improvement in some cases. We also validate our cost model's
predictions by measuring an actual implementation of each join algorithm considered. The results of this work should be directly
useful to implementors of relational query optimizers and query processing systems.
Edited by M. Adiba. Received May 1993 / Accepted April 1996 相似文献
6.
Approximate query processing using wavelets 总被引:7,自引:0,他引:7
Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(2-3):199-223
Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent
response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited
in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches
based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries
over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool
for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building
wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms
that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex
queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine
can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational
tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses
in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets
to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate
that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query
execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times
that scale linearly with the size of the data.
Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001 相似文献
7.
Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004Edited by: S. AbiteboulExtended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765 相似文献
8.
An evaluation of XML queries such as XQuery or XPath expressions represents a challenging task due to its complexity. Many algorithms have been introduced to cope with this problem. Some of them, called binary joins, evaluate separated parts of a query and subsequently merge intermediate results, while the others, called holistic twig joins, evaluate a query as a whole. Moreover, these algorithms also differ in what index data structure they use to handle XML data. There exist cost-based approaches utilizing binary joins and various index data structures; however, they share a limitation. The limitation is that they cannot perform a join between query nodes not having a direct XPath relationship. Such a join can be advantageous especially if their joint selectivity is high. Since holistic joins work with all query nodes they overcome this limitation. In this article, we introduce such a holistic twig join called CostTwigJoin. To the best of our knowledge, CostTwigJoin is the first holistic join capable of combining various index data structures during an evaluation of an XML query. Usage of the holistic join has yet another advantage for cost-based approaches: an optimizer does not have to resolve the order of binary joins; therefore, the search space is reduced. In this article, we perform thorough experiments on hundreds of queries to evaluate our approach and demonstrate its advantages. 相似文献
9.
Thomas A. Mueck Martin L. Polaschek 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(4):312-332
With respect to the specific requirements of advanced OODB applications, index data structures for type hierarchies in OODBMS
have to provide efficient support for multiattribute queries and have to allow index optimization for a particular query profile.
We describe the multikey type index and an efficient implementation of this indexing scheme. It meets both requirements: in addition to its multiattribute query
capabilities it is designed as a mediator between two standard design alternatives, key-grouping and type-grouping. A prerequisite
for the multikey type index is a linearization algorithm which maps type hierarchies to linearly ordered attribute domains
in such a way that each subhierarchy is represented by an interval of this domain. The algorithm extends previous results
with respect to multiple inheritance. The subsequent evaluation of our proposal focuses on storage space overhead as well
as on the number of disk I/O operations needed for query execution. The analytical results for the multikey type index are
compared to previously published figures for well-known single-key search structures. The comparison clearly shows the superiority
of the multikey type index for a large class of query profiles.
Edited by E. Bertino. Received October 7, 1996 / Accepted March 28, 1997 相似文献
10.
Xiaoyan WANG Tao YANG Jinchuan CHEN Long HE Xiaoyong DU 《Frontiers of Computer Science》2015,9(6):919
The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even totally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically partitioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these proposed approaches have been evaluated by extensive experiments over large RDF data sets. 相似文献