共查询到20条相似文献,搜索用时 0 毫秒
1.
We propose a new algorithm, called Stripe-join, for performing a join given a join index. Stripe-join is inspired by an algorithm called ‘Jive-join’ developed by Li and Ross. Stripe-join makes a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a set of temporary files that contain tuple identifiers but no input tuples. Stripe-join performs this efficiently even when the input relations are much larger than main memory, as long as the number of blocks in main memory is of the order of the square root of the number of blocks in the participating relations. Stripe-join is particularly efficient for self-joins. To our knowledge, Stripe-join is the first algorithm that, given a join index and a relation significantly larger than main memory, can perform a self-join with just a single pass over the input relation and without storing input tuples in intermediate files. Almost all the I/O is sequential, thus minimizing the impact of seek and rotational latency. The algorithm is resistant to data skew. It can also join multiple relations while still making only a single pass over each input relation. Using a detailed cost model, Stripe-join is analyzed and compared with competing algorithms. For large input relations, Stripe-join performs significantly better than Valduriez's algorithm and hash join algorithms. We demonstrate circumstances under which Stripe-join performs significantly better than Jive-join. Unlike Jive-join, Stripe-join makes no assumptions about the order of the join index. 相似文献
2.
Heuristic and randomized optimization for the join ordering problem 总被引:11,自引:0,他引:11
Michael Steinbrunn Guido Moerkotte Alfons Kemper 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(3):191-208
Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective
optimization techniques for join expressions. In this paper many different algorithms that compute approximate solutions for
optimizing join orders are studied since traditional dynamic programming techniques are not appropriate for complex problems.
Two possible solution spaces, the space of left-deep and bushy processing trees, are evaluated from a statistical point of
view. The result is that the common limitation to left-deep processing trees is only advisable for certain join graph types.
Basically, optimizers from three classes are analysed: heuristic, randomized and genetic algorithms. Each one is extensively
scrutinized with respect to its working principle and its fitness for the desired application. It turns out that randomized
and genetic algorithms are well suited for optimizing join expressions. They generate solutions of high quality within a reasonable
running time. The benefits of heuristic optimizers, namely the short running time, are often outweighed by merely moderate
optimization performance. 相似文献
3.
Approximate query processing using wavelets 总被引:7,自引:0,他引:7
Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(2-3):199-223
Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent
response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited
in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches
based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries
over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool
for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building
wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms
that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex
queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine
can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational
tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses
in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets
to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate
that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query
execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times
that scale linearly with the size of the data.
Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001 相似文献
4.
Spatial database operations are typically performed in two steps. In the filtering step, indexes and the minimum bounding rectangles (MBRs) of the objects are used to quickly determine a set of candidate objects. In the refinement step, the actual geometries of the objects are retrieved and compared to the query geometry or each other. Because of the complexity of the computational geometry algorithms involved, the CPU cost of the refinement step is usually the dominant cost of the operation for complex geometries such as polygons. Although many run-time and pre-processing-based heuristics have been proposed to alleviate this problem, the CPU cost still remains the bottleneck. In this paper, we propose a novel approach to address this problem using the efficient rendering and searching capabilities of modern graphics hardware. This approach does not require expensive pre-processing of the data or changes to existing storage and index structures, and is applicable to both intersection and distance predicates. We evaluate this approach by comparing the performance with leading software solutions. The results show that by combining hardware and software methods, the overall computational cost can be reduced substantially for both spatial selections and joins. We integrated this hardware/software co-processing technique into a popular database to evaluate its performance in the presence of indexes, pre-processing and other proprietary optimizations. Extensive experimentation with real-world data sets show that the hardware-accelerated technique not only outperforms the run-time software solutions but also performs as well if not better than pre-processing-assisted techniques. 相似文献
5.
Fast template matching using bounded partial correlation 总被引:8,自引:0,他引:8
This paper describes a novel, fast template-matching technique, referred to as bounded partial correlation (BPC), based on
the normalised cross-correlation (NCC) function. The technique consists in checking at each search position a suitable elimination
condition relying on the evaluation of an upper-bound for the NCC function. The check allows for rapidly skipping the positions
that cannot provide a better degree of match with respect to the current best-matching one. The upper-bounding function incorporates
partial information from the actual cross-correlation function and can be calculated very efficiently using a recursive scheme.
We show also a simple improvement to the basic BPC formulation that provides additional computational benefits and renders
the technique more robust with respect to the parameters choice.
Received: 2 November 2000 / Accepted: 25 July 2001
Correspondence to: L. Di Stefano 相似文献
6.
Fast image retrieval using color-spatial information 总被引:1,自引:0,他引:1
Beng Chin Ooi Kian-Lee Tan Tat Seng Chua Wynne Hsu 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(2):115-128
In this paper, we present an image retrieval system that employs both the color and spatial information of images to facilitate
the retrieval process. The basic unit used in our technique is a single-colored cluster, which bounds a homogeneous region of that color in an image. Two clusters from two images are similar if they are of the
same color and overlap in the image space. The number of clusters that can be extracted from an image can be very large, and
it affects the accuracy of retrieval. We study the effect of the number of clusters on retrieval effectiveness to determine
an appropriate value for “optimal' performance. To facilitate efficient retrieval, we also propose a multi-tier indexing
mechanism called the Sequenced Multi-Attribute Tree (SMAT). We implemented a two-tier SMAT, where the first layer is used to prune away clusters that are of different colors,
while the second layer discriminates clusters of different spatial locality. We conducted an experimental study on an image
database consisting of 12,000 images. Our results show the effectiveness of the proposed color-spatial approach, and the efficiency
of the proposed indexing mechanism.
Received August 1, 1997 / Accepted December 9, 1997 相似文献
7.
Jocelyn Sérot Dominique Ginhac Roland Chapuis Jean-Pierre Dérutin 《Machine Vision and Applications》2001,12(6):271-290
We present a design methodology for real-time vision applications aiming at significantly reducing the design-implement-validate
cycle time on dedicated parallel platforms. This methodology is based upon the concept of algorithmic skeletons, i.e., higher
order program constructs encapsulating recurring forms of parallel computations and hiding their low-level implementation
details. Parallel programs are built by simply selecting and composing instances of skeletons chosen in a predefined basis.
A complete parallel programming environment was built to support the presented methodology. It comprises a library of vision-specific
skeletons and a chain of tools capable of turning an architecture-independent skeletal specification of an application into
an optimized, deadlock-free distributive executive for a wide range of parallel platforms. This skeleton basis was defined
after a careful analysis of a large corpus of existing parallel vision applications. The source program is a purely functional
specification of the algorithm in which the structure of a parallel application is expressed only as combination of a limited
number of skeletons. This specification is compiled down to a parametric process graph, which is subsequently mapped onto
the actual physical topology using a third-party CAD software. It can also be executed on any sequential platform to check
the correctness of the parallel algorithm. The applicability of the proposed methodology and associated tools has been demonstrated
by parallelizing several realistic real-time vision applications both on a multi-processor platform and a network of workstations.
It is here illustrated with a complete road-tracking algorithm based upon white-line detection. This experiment showed a dramatic
reduction in development times (hence the term fast prototyping), while keeping performances on par with those obtained with
the handcrafted parallel version.
Received: 22 July 1999 / Accepted: 9 November 2000 相似文献
8.
Peter A. Boncz Martin L. Kersten 《The VLDB Journal The International Journal on Very Large Data Bases》1999,8(2):101-119
In query-intensive database application areas, like decision support and data mining, systems that use vertical fragmentation
have a significant performance advantage. In order to support relational or object oriented applications on top of such a
fragmented data model, a flexible yet powerful intermediate language is needed. This problem has been successfully tackled
in Monet, a modern extensible database kernel developed by our group. We focus on the design choices made in the Monet interpreter
language (MIL), its algebraic query language, and outline how its concept of tactical optimization enhances and simplifies
the optimization of complex queries. Finally, we summarize the experience gained in Monet by creating a highly efficient implementation
of MIL.
Received November 10, 1998 / Accepted March 22, 1999 相似文献
9.
Laura M. Haas Michael J. Carey Miron Livny Amit Shukla 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(3):241-256
In this paper, we re-examine the results of prior work on methods for computing ad hoc joins. We develop a detailed cost model for predicting join algorithm performance, and we use the model to develop cost formulas
for the major ad hoc join methods found in the relational database literature. We show that various pieces of “common wisdom” about join algorithm
performance fail to hold up when analyzed carefully, and we use our detailed cost model to derive op
timal buffer allocation schemes for each of the join methods examined here. We show that optimizing their buffer allocations
can lead to large performance improvements, e.g., as much as a 400% improvement in some cases. We also validate our cost model's
predictions by measuring an actual implementation of each join algorithm considered. The results of this work should be directly
useful to implementors of relational query optimizers and query processing systems.
Edited by M. Adiba. Received May 1993 / Accepted April 1996 相似文献
10.
In this paper, we present a novel approach for multimedia data indexing and retrieval that is machine independent and highly
flexible for sharing multimedia data across applications. Traditional multimedia data indexing and retrieval problems have
been attacked using the central data server as the main focus, and most of the indexing and query-processing for retrieval
are highly application dependent. This precludes the use of created indices and query processing mechanisms for multimedia
data which, in general, have a wide variety of uses across applications. The approach proposed in this paper addresses three
issues: 1. multimedia data indexing; 2. inference or query processing; and 3. combining indices and inference or query mechanism
with the data to facilitate machine independence in retrieval and query processing. We emphasize the third issue, as typically
multimedia data are huge in size and requires intra-data indexing. We describe how the proposed approach addresses various
problems faced by the application developers in indexing and retrieval of multimedia data. Finally, we present two applications
developed based on the proposed approach: video indexing; and video content authorization for presentation. 相似文献
11.
Steve Jones Shona McInnes Mark S. Staveley 《International Journal on Digital Libraries》1999,2(2-3):207-223
Online information repositories commonly provide keyword search facilities through textual query languages based on Boolean
logic. However, there is evidence to suggest that the syntactic demands of such languages can lead to user errors and adversely
affect the time that it takes users to form queries. Users also face difficulties because of the conflict in semantics between
AND and OR when used in Boolean logic and English language. Analysis of usage logs for the New Zealand Digital Library (NZDL)
show that few Boolean queries contain more than three terms, use of the intersection operator dominates and that query refinement
is common. We suggest that graphical query languages, in particular Venn-like diagrams, can alleviate the problems that users
experience when forming Boolean expressions with textual languages. A study of the utility of Venn diagrams for query specification
indicates that with little or no training users can interpret and form Venn-like diagrams in a consistent manner which accurately
correspond to Boolean expressions. We describe VQuery, a Venn-diagram based user interface to the New Zealand Digital Library
(NZDL). In a study which compared VQuery with a standard textual Boolean interface, users took significantly longer to form
queries and produced more erroneous queries when using VQuery. We discuss the implications of these results and suggest directions
for future work.
Received: 15 December 1997 / Revised: June 1999 相似文献
12.
Randal E. Bryant Yirng-An Chen 《International Journal on Software Tools for Technology Transfer (STTT)》2001,3(2):137-155
Binary moment diagrams (BMDs) provide a canonical representation for linear functions similar to the way binary decision diagrams
(BDDs) represent Boolean functions. Within the class of linear functions, we can embed arbitrary functions from Boolean variables
to real, rational, or integer values. BMDs can thus model the functionality of data path circuits operating over word-level
data. Many important functions, including integer multiplication, that cannot be represented efficiently at the bit level
with BDDs, have simple representations at the word level with BMDs. Furthermore, BMDs can represent Boolean functions as a
special case. We propose a hierarchical approach to verifying arithmetic circuits, where component modules are first shown
to implement their word-level specifications. The overall circuit functionality is then verified by composing the component
functions and comparing the result to the word-level circuit specification. Multipliers with word sizes of up to 256 bits
have been verified by this technique.
Published online: 15 May 2001 相似文献
13.
MiniCon: A scalable algorithm for answering queries using views 总被引:5,自引:0,他引:5
Rachel Pottinger Alon Halevy 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(2-3):182-198
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously
materialized views over the database, rather than accessing the database relations. The problem has received significant attention
because of its relevance to a wide variety of data management problems, such as data integration, query optimization, and
the maintenance of physical data independence. To date, the performance of proposed algorithms has received very little attention,
and in particular, their scale up in the presence of a large number of views is unknown. We first analyze two previous algorithms,
the bucket algorithm and the inverse-rules, and show their deficiencies. We then describe the MiniCon, a novel algorithm for
finding the maximally-contained rewriting of a conjunctive query using a set of conjunctive views. We present the first experimental
study of algorithms for answering queries using views. The study shows that the MiniCon scales up well and significantly outperforms
the previous algorithms. We describe an extension of the MiniCon to handle comparison predicates, and show its performance
experimentally. Finally, we describe how the MiniCon can be extended to the context of query optimization.
Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001 相似文献
14.
Answering queries using views: A survey 总被引:25,自引:0,他引:25
Alon Y. Halevy 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(4):270-294
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously
defined materialized views over the database, rather than accessing the database relations. The problem has recently received
significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding
a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation
of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result,
finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views.
Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated
schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate
works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it
and the relevant theoretical results.
Received: 1 August 1999 / Accepted: 23 March 2001 Published online: 6 September 2001 相似文献
15.
J. Claussen A. Kemper D. Kossmann C. Wiesner 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(3):190-213
Decision support queries typically involve several joins, a grouping with aggregation, and/or sorting of the result tuples.
We propose two new classes of query evaluation algorithms that can be used to speed up the execution of such queries. The
algorithms are based on (1) early sorting and (2) early partitioning– or a combination of both. The idea is to push the sorting and/or the partitioning to the leaves, i.e., the base relations,
of the query evaluation plans (QEPs) and thereby avoid sorting or partitioning large intermediate results generated by the
joins. Both early sorting and early partitioning are used in combination with hash-based algorithms for evaluating the join(s)
and the grouping. To enable early sorting, the sort order generated at an early stage of the QEP is retained through an arbitrary
number of so-called order-preserving hash joins. To make early partitioning applicable to a large class of decision support queries, we generalize the so-called hash teams
proposed by Graefe et al. [GBC98]. Hash teams allow to perform several hash-based operations (join and grouping) on the same
attribute in one pass without repartitioning intermediate results. Our generalization consists of indirectly partitioning
the input data. Indirect partitioning means partitioning the input data on an attribute that is not directly needed for the
next hash-based operation, and it involves the construction of bitmaps to approximate the partitioning for the attribute that
is needed in the next hash-based operation. Our performance experiments show that such QEPs based on early sorting, early partitioning, or both in combination perform significantly better than conventional strategies for many common classes of decision support
queries.
Received April 4, 2000 / Accepted June 23, 2000 相似文献
16.
Jean-Robert Gruser Louiqa Raschid Vladimir Zadorozhny Tao Zhan 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):18-37
Abstract. The rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources,
WebSources. Current wrapper mediator architectures need to be extended with a wrapper cost model (WCM) for WebSources that
can estimate the response time (delays) to access sources as well as other relevant statistics. In this paper, we present
a Web prediction tool (WebPT), a tool that is based on learning using query feedback from WebSources. The WebPT uses dimensions
time of day, day, and quantity of data, to learn response times from a particular WebSource, and to predict the expected response
time (delay) for some query. Experiment data was collected from several sources, and those dimensions that were significant
in estimating the response time were determined. We then trained the WebPT on the collected data, to use the three dimensions
mentioned above, and to predict the response time, as well as a confidence in the prediction. We describe the WebPT learning
algorithms, and report on the WebPT learning for WebSources. Our research shows that we can improve the quality of learning
by tuning the WebPT features, e.g., training the WebPT using a logarithm of the input training data; including significant
dimensions in the WebPT; or changing the ordering of dimensions. A comparison of the WebPT with more traditional neural network
(NN) learning has been performed, and we briefly report on the comparison. We then demonstrate how the WebPT prediction of
delay may be used by a scrambling enabled optimizer. A scrambling algorithm identifies some critical points of delay, where
it makes a decision to scramble (modify) a plan, to attempt to hide the expected delay by computing some other part of the
plan that is unaffected by the delay. We explore the space of real delay at a WebSource, versus the WebPT prediction of this
delay, with respect to critical points of delay in specific plans. We identify those cases where WebPT overestimation or underestimation
of the real delay results in a penalty in the scrambling enabled optimizer, and those cases where there is no penalty. Using
the experimental data and WebPT learning, we test how good the WebPT is in minimizing these penalties.
Received June 22, 1999 / Accepted December 24, 1999 相似文献
17.
Aya Soffer Hanan Samet 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(4):253-274
Symbolic images are composed of a finite set of symbols that have a semantic meaning. Examples of symbolic images include
maps (where the semantic meaning of the symbols is given in the legend), engineering drawings, and floor plans. Two approaches
for supporting queries on symbolic-image databases that are based on image content are studied. The classification approach
preprocesses all symbolic images and attaches a semantic classification and an associated certainty factor to each object
that it finds in the image. The abstraction approach describes each object in the symbolic image by using a vector consisting
of the values of some of its features (e.g., shape, genus, etc.). The approaches differ in the way in which responses to queries
are computed. In the classification approach, images are retrieved on the basis of whether or not they contain objects that
have the same classification as the objects in the query. On the other hand, in the abstraction approach, retrieval is on
the basis of similarity of feature vector values of these objects. Methods of integrating these two approaches into a relational
multimedia database management system so that symbolic images can be stored and retrieved based on their content are described.
Schema definitions and indices that support query specifications involving spatial as well as contextual constraints are presented.
Spatial constraints may be based on both locational information (e.g., distance) and relational information (e.g., north of).
Different strategies for image retrieval for a number of typical queries using these approaches are described. Estimated costs
are derived for these strategies. Results are reported of a comparative study of the two approaches in terms of image insertion
time, storage space, retrieval accuracy, and retrieval time.
Received June 12, 1998 / Accepted October 13, 1998 相似文献
18.
R. Braumandl M. Keidl A. Kemper D. Kossmann A. Kreutz S. Seltzsam K. Stocker 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(1):48-71
We present the design of ObjectGlobe, a distributed and open query processor for Internet data sources. Today, data is published
on the Internet via Web servers which have, if at all, very localized query processing capabilities. The goal of the ObjectGlobe
project is to establish an open marketplace in which data and query processing capabilities can be distributed and used by any kind of Internet application. Furthermore, ObjectGlobe integrates cycle providers (i.e., machines) which carry out query processing operators. The overall picture is to make it possible to execute a query
with – in principle – unrelated query operators, cycle providers, and data sources. Such an infrastructure can serve as enabling
technology for scalable e-commerce applications, e.g., B2B and B2C market places, to be able to integrate data and data processing
operations of a large number of participants. One of the main challenges in the design of such an open system is to ensure
privacy and security. We discuss the ObjectGlobe security requirements, show how basic components such as the optimizer and
runtime system need to be extended, and present the results of performance experiments that assess the additional cost for
secure distributed query processing. Another challenge is quality of service management so that users can constrain the costs
and running times of their queries.
Received: 30 October 2000 / Accepted: 14 March 2001 Published online: 7 June 2001 相似文献
19.
Serge Abiteboul Sophie Cluet Tova Milo 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(2):96-114
Structured data stored in files can benefit from standard database technology. In particular, we show here how such data
can be queried and updated using declarative database languages. We introduce the notion of structuring schema, which consists of a grammar annotated with database programs. Based on a structuring schema, a file can be viewed as a database
structure, queried and updated as such. For queries, we show that almost standard database optimization techniques can be used to answer queries without having to construct
the entire database. For updates, we study in depth the propagation to the file of an update specified on the database view of this file. The problem is not
feasible in general and we present a number of negative results. The positive results consist of techniques that allow to
propagate updates efficiently under some reasonable locality conditions on the structuring schemas.
Received November 1, 1995 / Accepted June 20, 1997 相似文献
20.
Summary. In the framework of self-stabilizing systems, the convergence proof is generally done by exhibiting a measure that strictly
decreases until a legitimate configuration is reached. The discovery of such a measure is very specific and requires a deep
understanding of the studied transition system. In contrast we propose here a simple method for proving convergence, which
regards self-stabilizing systems as string rewrite systems, and adapts a procedure initially designed by Dershowitz for proving
termination of string rewrite systems. In order to make the method terminate more often, we also propose an adapted procedure
that manipulates “schemes”, i.e. regular sets of words, and incorporates a process of scheme generalization. The interest
of the method is illustrated on several nontrivial examples.
Received: January 2000 / Accepted: November 2000 相似文献