首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Synopses structures and approximate query answering have become increasingly important in DSS/ OLAP applications with stringent response time requirements. Range queries are an important class of problems in this domain, and have a wide variety of applications and have been studied in the context of histograms. However, wavelets have been shown to be quite useful in several scenarios and in fact their multi-resolution structure makes them especially appealing for hierarchical domains. Furthermore the fact that the Haar wavelet basis has a linear time algorithm for the computation of coefficients has made the Haar basis one of the important and widely used synopsis structures. Very recently optimal algorithms were proposed for the wavelet synopsis construction problem for equality/point queries. In this paper we investigate the problem of optimum Haar wavelet synopsis construction for range queries with workloads. We provide optimum algorithms as well as approximation heuristics and demonstrate the effectiveness of these algorithms with our extensive experimental evaluation using synthetic and real-life data sets. Research was supported in part by the Alfred P. Sloan Research Fellowship and NSF awards CCF-0430376, CCF-0644119. Research was supported by the Ministry of Information and Communication, Korea, under the College Information Technology Research Center Support Program, grant number IITA-2006-C1090-0603-0031.  相似文献   

Approximate query mapping: Accounting for translation closeness   总被引:2,自引:0,他引:2  
In this paper we present a mechanism for approximately translating Boolean query constraints across heterogeneous information sources. Achieving the best translation is challenging because sources support different constraints for formulating queries, and often these constraints cannot be precisely translated. For instance, a query [score>8] might be “perfectly” translated as [rating>0.8] at some site, but can only be approximated as [grade=A] at another. Unlike other work, our general framework adopts a customizable “closeness” metric for the translation that combines both precision and recall. Our results show that for query translation we need to handle interdependencies among both query conjuncts as well as disjuncts. As the basis, we identify the essential requirements of a rule system for users to encode the mappings for atomic semantic units. Our algorithm then translates complex queries by rewriting them in terms of the semantic units. We show that, under practical assumptions, our algorithm generates the best approximate translations with respect to the closeness metric of choice. We also present a case study to show how our technique may be applied in practice. Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001  相似文献   

We present the design of ObjectGlobe, a distributed and open query processor for Internet data sources. Today, data is published on the Internet via Web servers which have, if at all, very localized query processing capabilities. The goal of the ObjectGlobe project is to establish an open marketplace in which data and query processing capabilities can be distributed and used by any kind of Internet application. Furthermore, ObjectGlobe integrates cycle providers (i.e., machines) which carry out query processing operators. The overall picture is to make it possible to execute a query with – in principle – unrelated query operators, cycle providers, and data sources. Such an infrastructure can serve as enabling technology for scalable e-commerce applications, e.g., B2B and B2C market places, to be able to integrate data and data processing operations of a large number of participants. One of the main challenges in the design of such an open system is to ensure privacy and security. We discuss the ObjectGlobe security requirements, show how basic components such as the optimizer and runtime system need to be extended, and present the results of performance experiments that assess the additional cost for secure distributed query processing. Another challenge is quality of service management so that users can constrain the costs and running times of their queries. Received: 30 October 2000 / Accepted: 14 March 2001 Published online: 7 June 2001  相似文献   

Adaptive query processing generally involves a feedback loop comprising monitoring, assessment and response. So far, individual proposals have tended to group together an approach to monitoring, a means of assessment, and a form of response. However, there are many benefits in decoupling these three phases, and in constructing generic frameworks for each of them. To this end, this paper discusses monitoring of query plan execution as a topic in its own right, and advocates an approach based on self-monitoring algebraic operators. This approach is shown to be generic and independent of any specific adaptation mechanism, easily implementable and portable, sufficiently comprehensive, appropriate for heterogeneous distributed environments, and more importantly, capable of driving on-the-fly adaptations of query plan execution. An experimental evaluation of the overheads and of the quality of the results obtained by monitoring is also presented.  相似文献   

Decision support queries typically involve several joins, a grouping with aggregation, and/or sorting of the result tuples. We propose two new classes of query evaluation algorithms that can be used to speed up the execution of such queries. The algorithms are based on (1) early sorting and (2) early partitioning– or a combination of both. The idea is to push the sorting and/or the partitioning to the leaves, i.e., the base relations, of the query evaluation plans (QEPs) and thereby avoid sorting or partitioning large intermediate results generated by the joins. Both early sorting and early partitioning are used in combination with hash-based algorithms for evaluating the join(s) and the grouping. To enable early sorting, the sort order generated at an early stage of the QEP is retained through an arbitrary number of so-called order-preserving hash joins. To make early partitioning applicable to a large class of decision support queries, we generalize the so-called hash teams proposed by Graefe et al. [GBC98]. Hash teams allow to perform several hash-based operations (join and grouping) on the same attribute in one pass without repartitioning intermediate results. Our generalization consists of indirectly partitioning the input data. Indirect partitioning means partitioning the input data on an attribute that is not directly needed for the next hash-based operation, and it involves the construction of bitmaps to approximate the partitioning for the attribute that is needed in the next hash-based operation. Our performance experiments show that such QEPs based on early sorting, early partitioning, or both in combination perform significantly better than conventional strategies for many common classes of decision support queries. Received April 4, 2000 / Accepted June 23, 2000  相似文献   

Online information repositories commonly provide keyword search facilities through textual query languages based on Boolean logic. However, there is evidence to suggest that the syntactic demands of such languages can lead to user errors and adversely affect the time that it takes users to form queries. Users also face difficulties because of the conflict in semantics between AND and OR when used in Boolean logic and English language. Analysis of usage logs for the New Zealand Digital Library (NZDL) show that few Boolean queries contain more than three terms, use of the intersection operator dominates and that query refinement is common. We suggest that graphical query languages, in particular Venn-like diagrams, can alleviate the problems that users experience when forming Boolean expressions with textual languages. A study of the utility of Venn diagrams for query specification indicates that with little or no training users can interpret and form Venn-like diagrams in a consistent manner which accurately correspond to Boolean expressions. We describe VQuery, a Venn-diagram based user interface to the New Zealand Digital Library (NZDL). In a study which compared VQuery with a standard textual Boolean interface, users took significantly longer to form queries and produced more erroneous queries when using VQuery. We discuss the implications of these results and suggest directions for future work. Received: 15 December 1997 / Revised: June 1999  相似文献   

We describe how video data can be organized and structured so as to facilitate efficient querying. We develop a formal model for video data and show how spatial data structures, suitably modified, provide an elegant way of storing such data. We develop algorithms to process various kinds of video queries and show that, in most cases, the complexity of these algorithms is linear. A prototype system, called the Advanced Video Information System (AVIS), based on these concepts, has been designed at the University of Maryland.  相似文献   

In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates. The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates (i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined. For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently. The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces the initial waiting time of users significantly without sacrificing the quality of the answers. Received April 25, 2000 / Accepted June 27, 2000  相似文献   

Query processing over object views of relational data   总被引:2,自引:0,他引:2  
This paper presents an approach to object view management for relational databases. Such a view mechanism makes it possible for users to transparently work with data in a relational database as if it was stored in an object-oriented (OO) database. A query against the object view is translated to one or several queries against the relational database. The results of these queries are then processed to form an answer to the initial query. The approach is not restricted to a ‘pure’ object view mechanism for the relational data, since the object view can also store its own data and methods. Therefore it must be possible to process queries that combine local data residing in the object view with data retrieved from the relational database. We discuss the key issues when object views of relational databases are developed, namely: how to map relational structures to sub-type/supertype hierarchies in the view, how to represent relational database access in OO query plans, how to provide the concept of object identity in the view, how to handle the fact that the extension of types in the view depends on the state of the relational database, and how to process and optimize queries against the object view. The results are based on experiences from a running prototype implementation. Edited by: M.T. ?zsu. Received April 12, 1995 / Accepted April 22, 1996  相似文献   

在数据仓库以及联机分析处理等许多领域中,聚集数据的处理是一个非常重要的核心问题。本文分析了聚集数据查询的特点,引入了基于聚集数据的近似查询计算模型,并针对该计算模型的提纲构建问题,给出了一类近似查询提纲的Haar小波方法,来实现对聚集数据和聚集视图的快速查询计算。  相似文献   

Traditionally, distributed query optimization techniques generate static query plans at compile time. However, the optimality of these plans depends on many parameters (such as the selectivities of operations, the transmission speeds and workloads of servers) that are not only difficult to estimate but are also often unpredictable and fluctuant at runtime. As the query processor cannot dynamically adjust the plans at runtime, the system performance is often less than satisfactory. In this paper, we introduce a new highly adaptive distributed query processing architecture. Our architecture can quickly detect fluctuations in selectivities of operations, as well as transmission speeds and workloads of servers, and accordingly change the operation order of a distributed query plan during execution. We have implemented a prototype based on the Telegraph system [Telegragraph project. Available from >]. Our experimental study shows that our mechanism can adapt itself to the changes in the environment and hence approach to an optimal plan during execution.  相似文献   

Fast joins using join indices   总被引:1,自引:0,他引:1  
Two new algorithms, “Jive join” and “Slam join,” are proposed for computing the join of two relations using a join index. The algorithms are duals: Jive join range-partitions input relation tuple ids and then processes each partition, while Slam join forms ordered runs of input relation tuple ids and then merges the results. Both algorithms make a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a temporary file, whose size is half that of the join index. Both algorithms require only that the number of blocks in main memory is of the order of the square root of the number of blocks in the smaller relation. By storing intermediate and final join results in a vertically partitioned fashion, our algorithms need to manipulate less data in memory at a given time than other algorithms. The algorithms are resistant to data skew and adaptive to memory fluctuations. Selection conditions can be incorporated into the algorithms. Using a detailed cost model, the algorithms are analyzed and compared with competing algorithms. For large input relations, our algorithms perform significantly better than Valduriez's algorithm, the TID join algorithm, and hash join algorithms. An experimental study is also conducted to validate the analytical results and to demonstrate the performance characteristics of each algorithm in practice. Received July 21, 1997 / Accepted June 8, 1998  相似文献   

Abstract. The rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources, WebSources. Current wrapper mediator architectures need to be extended with a wrapper cost model (WCM) for WebSources that can estimate the response time (delays) to access sources as well as other relevant statistics. In this paper, we present a Web prediction tool (WebPT), a tool that is based on learning using query feedback from WebSources. The WebPT uses dimensions time of day, day, and quantity of data, to learn response times from a particular WebSource, and to predict the expected response time (delay) for some query. Experiment data was collected from several sources, and those dimensions that were significant in estimating the response time were determined. We then trained the WebPT on the collected data, to use the three dimensions mentioned above, and to predict the response time, as well as a confidence in the prediction. We describe the WebPT learning algorithms, and report on the WebPT learning for WebSources. Our research shows that we can improve the quality of learning by tuning the WebPT features, e.g., training the WebPT using a logarithm of the input training data; including significant dimensions in the WebPT; or changing the ordering of dimensions. A comparison of the WebPT with more traditional neural network (NN) learning has been performed, and we briefly report on the comparison. We then demonstrate how the WebPT prediction of delay may be used by a scrambling enabled optimizer. A scrambling algorithm identifies some critical points of delay, where it makes a decision to scramble (modify) a plan, to attempt to hide the expected delay by computing some other part of the plan that is unaffected by the delay. We explore the space of real delay at a WebSource, versus the WebPT prediction of this delay, with respect to critical points of delay in specific plans. We identify those cases where WebPT overestimation or underestimation of the real delay results in a penalty in the scrambling enabled optimizer, and those cases where there is no penalty. Using the experimental data and WebPT learning, we test how good the WebPT is in minimizing these penalties. Received June 22, 1999 / Accepted December 24, 1999  相似文献   

In this paper we describe the design and implementation of OPT++, a tool for extensible database query optimization that uses an object-oriented design to simplify the task of implementing, extending, and modifying an optimizer. Building an optimizer using OPT++ makes it easy to extend the query algebra (to add new query algebra operators and physical implementation algorithms to the system), easy to change the search space, and also to change the search strategy. Furthermore, OPT++ comes equipped with a number of search strategies that are available for use by an optimizer-implementor. OPT++ considerably simplifies both, the task of implementing an optimizer for a new database system, and the task of evaluating alternative optimization techniques and strategies to decide what techniques are best suited for that database system. We present the results of a series of performance studies. These results validate our design and show that, in spite of its flexibility, OPT++ can be used to build efficient optimizers. Received October 1996 / Accepted January 1998  相似文献   

Algebraic query optimisation for database programming languages   总被引:1,自引:0,他引:1  
A major challenge still facing the designers and implementors of database programming languages (DBPLs) is that of query optimisation. We investigate algebraic query optimisation techniques for DBPLs in the context of a purely declarative functional language that supports sets as first-class objects. Since the language is computationally complete issues such as non-termination of expressions and construction of infinite data structures can be investigated, whilst its declarative nature allows the issue of side effects to be avoided and a richer set of equivalences to be developed. The language has a well-defined semantics which permits us to reason formally about the properties of expressions, such as their equivalence with other expressions and their termination. The support of a set bulk data type enables much prior work on the optimisation of relational languages to be utilised. In the paper we first give the syntax of our archetypal DBPL and briefly discuss its semantics. We then define a small but powerful algebra of operators over the set data type, provide some key equivalences for expressions in these operators, and list transformation principles for optimising expressions. Along the way, we identify some caveats to well-known equivalences for non-deductive database languages. We next extend our language with two higher level constructs commonly found in functional DBPLs: set comprehensions and functions with known inverses. Some key equivalences for these constructs are provided, as are transformation principles for expressions in them. Finally, we investigate extending our equivalences for the set operators to the analogous operators over bags. Although developed and formally proved in the context of a functional language, our findings are directly applicable to other DBPLs of similar expressiveness. Edited by Matthias Jarke, Jorge Bocca, Carlo Zaniolo. Received September 15, 1994 / Accepted September 1, 1995  相似文献   

Inter-object references are one of the key concepts of object-relational and object-oriented database systems. In this work, we investigate alternative techniques to implement inter-object references and make the best use of them in query processing, i.e., in evaluating functional joins. We will give a comprehensive overview and performance evaluation of all known techniques for simple (single-valued) as well as multi-valued functional joins. Furthermore, we will describe special order-preserving\/ functional-join techniques that are particularly attractive for decision support queries that require ordered results. While most of the presentation of this paper is focused on object-relational and object-oriented database systems, some of the results can also be applied to plain relational databases because index nested-loop joins\/ along key/foreign-key relationships, as they are frequently found in relational databases, are just one particular way to execute a functional join. Received February 28, 1999 / Accepted September 27, 1999  相似文献   

How to process a skyline query efficiently has received considerable attention in recent years. A skyline query identifies a set of non-dominated data records in a multidimensional dataset. Whereas most previous studies have resolved this problem in a centralized environment, this work considers it in a distributed sensor network environment. An algorithm, known as Skyline Sensor Algorithm (SkySensor), is presented to efficiently retrieve skyline results from a sensor network. A cluster-based architecture is designed in SkySensor to collect all sensor readings. A pruning method is then proposed to progressively sift out the skyline results from the sensor network. SkySensor avoids the need of collecting data from all sensors in the network, which is an extremely expensive action, when searching for the skyline results. The performance study indicates that SkySensor is highly efficient, and significantly outperforms previous methods in processing skyline queries.  相似文献   

A multimedia application involves information that may be in a form of video, images, audio, text and graphics, need to be stored, retrieved and manipulated in large databases. In this paper, we propose an object-oriented database schema that supports multimedia documents and their temporal, spatial and logical structures. We present a document example and show how the schema can adress all the structures described. We also present a multimedia query specification language that can be used to describe a multimedia content portion to be retrieved from the database. The language provides means by which the user can specify the information on the media as well as the temoral and spatial relationships among these media.  相似文献   

There is growing interest in algorithms for processing and querying continuous data streams (i.e., data seen only once in a fixed order) with limited memory resources. In its most general form, a data stream is actually an update stream, i.e., comprising data-item deletions as well as insertions. Such massive update streams arise naturally in several application domains (e.g., monitoring of large IP network installations or processing of retail-chain transactions). Estimating the cardinality of set expressions defined over several (possibly distributed) update streams is perhaps one of the most fundamental query classes of interest; as an example, such a query may ask what is the number of distinct IP source addresses seen in passing packets from both router R 1 and R 2 but not router R 3?. Earlier work only addressed very restricted forms of this problem, focusing solely on the special case of insert-only streams and specific operators (e.g., union). In this paper, we propose the first space-efficient algorithmic solution for estimating the cardinality of full-fledged set expressions over general update streams. Our estimation algorithms are probabilistic in nature and rely on a novel, hash-based synopsis data structure, termed 2-level hash sketch. We demonstrate how our 2-level hash sketch synopses can be used to provide low-error, high-confidence estimates for the cardinality of set expressions (including operators such as set union, intersection, and difference) over continuous update streams, using only space that is significantly sublinear in the sizes of the streaming input (multi-)sets. Furthermore, our estimators never require rescanning or resampling of past stream items, regardless of the number of deletions in the stream. We also present lower bounds for the problem, demonstrating that the space usage of our estimation algorithms is within small factors of the optimal. Finally, we propose an optimized, time-efficient stream synopsis (based on 2-level hash sketches) that provides similar, strong accuracy-space guarantees while requiring only guaranteed logarithmic maintenance time per update, thus making our methods applicable for truly rapid-rate data streams. Our results from an empirical study of our synopsis and estimation techniques verify the effectiveness of our approach.Received: 20 October 2003, Accepted: 16 April 2004, Published online: 14 September 2004Edited by: J. Gehrke and J. Hellerstein.Sumit Ganguly: sganguly@cse.iitk.ac.in Current affiliation: Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India  相似文献   

Abstract. This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data and XML. We describe a simple and powerful query language based on pattern matching and show that it can be expressed using structural recursion, which is introduced as a top-down, recursive function, similar to the way XSL is defined on XML trees. On cyclic data, structural recursion can be defined in two equivalent ways: as a recursive function which evaluates the data top-down and remembers all its calls to avoid infinite loops, or as a bulk evaluation which processes the entire data in parallel using only traditional relational algebra operators. The latter makes it possible for optimization techniques in relational queries to be applied to structural recursion. We show that the composition of two structural recursion queries can be expressed as a single such query, and this is used as the basis of an optimization method for mediator systems. Several other formal properties are established: structural recursion can be expressed in first-order logic extended with transitive closure; its data complexity is PTIME; and over relational data it is a conservative extension of the relational calculus. The underlying data model is based on value equality, formally defined with bisimulation. Structural recursion is shown to be invariant with respect to value equality. Received: July 9, 1999 / Accepted: December 24, 1999  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号