首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
There are now millions of PowerPoint documents available within corporate intranets and/or over the Internet. In this paper, we develop a formal model of PowerPoint databases. We propose a relational style algebra called pptA (PowerPoint Algebra) to query PowerPoint databases. The algebra contains some new operators (such as the APPLY operator that changes properties of objects, slides and presentations) as well as interesting twists on relational operators (e.g. join and cartesian product allow different entities being joined together to share attributes whose values may be merged). We prove a set of equivalence results within this algebra. We have implemented a version of pptA—the paper provides a cost model and experimental results on the conditions under which these equivalences are useful.  相似文献   

2.
A relational ranking query uses a scoring function to limit the results of a conventional query to a small number of the most relevant answers. The increasing popularity of this query paradigm has led to the introduction of specialized rank join operators that integrate the selection of top tuples with join processing. These operators access just “enough” of the input in order to generate just “enough” output and can offer significant speed-ups for query evaluation. The number of input tuples that an operator accesses is called the input depth of the operator, and this is the driving cost factor in rank join processing. This introduces the important problem of depth estimation, which is crucial for the costing of rank join operators during query compilation and thus for their integration in optimized physical plans. We introduce an estimation methodology, termed deep, for approximating the input depths of rank join operators in a physical execution plan. At the core of deep lies a general, principled framework that formalizes depth computation in terms of the joint distribution of scores in the base tables. This framework results in a systematic estimation methodology that takes the characteristics of the data directly into account and thus enables more accurate estimates. We develop novel estimation algorithms that provide an efficient realization of the formal deep framework, and describe their integration on top of the statistics module of an existing query optimizer. We validate the performance of deep with an extensive experimental study on data sets of varying characteristics. The results verify the effectiveness of deep as an estimation method and demonstrate its advantages over previously proposed techniques.  相似文献   

3.
Identifying similarities in large datasets is an essential operation in several applications such as bioinformatics, pattern recognition, and data integration. To make a relational database management system similarity-aware, the core relational operators have to be extended. While similarity-awareness has been introduced in database engines for relational operators such as joins and group-by, little has been achieved for relational set operators, namely Intersection, Difference, and Union. In this paper, we propose to extend the semantics of relational set operators to take into account the similarity of values. We develop efficient query processing algorithms for evaluating them, and implement these operators inside an open-source database system, namely PostgreSQL. By extending several queries from the TPC-H benchmark to include predicates that involve similarity-based set operators, we perform extensive experiments that demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.  相似文献   

4.
 Allowing for flexible queries enables database users to express preferences inside elementary conditions and priorities between conditions. The division is one of the algebraic operators defined in order to query regular databases. This operation aims at the selection of A-elements which are connected with (at least) a given subset of B-elements, e.g., the stores which ordered all the items supplied by a given manufacturer. It is mainly used in the framework of the relational model of data, although it makes sense in object-oriented databases as well. In the relational context, the division is a non-primitive operation which may be expressed in terms of other operations, namely projection, Cartesian product and set difference. When fuzzy predicates appear, this operator needs to be extended to fuzzy relations and this requires the replacement of the usual implication by a fuzzy one. This paper proposes two types of meaning of the extended division and it investigates the issue of the primitivity of the extended operation (i.e., if the division of fuzzy relations is expressible in terms of other operations). The final objective is to decide whether this operator is necessary or not for the purpose of flexible querying and to help the design of a query language supporting flexible queries, among which those conveying a division of fuzzy relations.  相似文献   

5.
The application of the object-oriented (O-O) paradigm in the database management field has gained much attention in recent years. Several experimental and commercial O-O database management systems have become available. However, the existing O-O DBMSs still lack a solid mathematical foundation for the manipulation of O-O databases, the optimization of queries, and the design and selection of storage structures for supporting O-O database manipulations. This paper presents an association algebra (A-algebra) to serve as a mathematical foundation for processing O-O databases, which is analogous to the relational algebra used for processing relational databases. In this algebra, objects and their associations in an O-O database are uniformly represented by association patterns which are manipulated by a number of operators to produce other association patterns. Different from the relational algebra, in which set operations operate on relations with union-compatible structures, the A-algebra operators can operate on association patterns of homogeneous and heterogeneous structures. Different from the traditional record-based relational processing, the A-algebra allows very complex patterns of object associations to be directly manipulated. The pattern-based query formulation and the A-algebra operators are described. Some mathematical properties of the algebraic operators are presented together with their application in query decomposition and optimization. The completeness of the A-algebra is also defined and proven. The A-algebra has been used as the basis for the design and implementation of an object-oriented query language, OQL, which is the query language used in a prototype Knowledge Base Management System OSAM*.KBMS  相似文献   

6.
A query processing strategy which is based on pipelining and data-flow techniques is presented. Timing equations are developed for calculating the performance of four join algorithms: nested block, hash, sort-merge, and pipelined sort-merge. They are used to execute the join operation in a query in distributed fashion and in pipelined fashion. Based on these equations and similar sets of equations developed for other relational algebraic operations, the performance of query execution was evaluated using the different join algorithms. The effects of varying the values of processing time, I/O time, communication time, buffer size, and join selectively on the performance of the pipelined join algorithms are investigated. The results are compared to the results obtained by employing the same algorithms for executing queries using the distributed processing approach which does not exploit the vertical concurrency of the pipelining approach. These results establish the benefits of pipelining  相似文献   

7.
Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004Edited by: S. AbiteboulExtended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765  相似文献   

8.
Existence of semantic conflicts between component databases severely impacts query processing in a multidatabase system. In this paper, we describe two types of semantic conflicts that have to be dealt with in the integration of databases modeling information about related sets of real-world entities. These are the entityidentification problem and theattribute value conflict problem. While thetwo-way outerjoin operation has been commonly used for resolving entity identification problem between two component relations, outerjoins using regular equality comparisons between component relation keys is shown to produce counter-intuitive entity identification result. We remedy this by defining a newkey-equality comparator in place of regular equality comparator, for outerjoins. For the attribute value conflict problem, we define aGeneralized Attribute Derivation (GAD) operation which allows user-defined attribute derivation functions to be used to compute new attributes from the component relations' attributes. By adding two-way outerjoin andGAD to the set of relational operations, the traditional algebraic transformation framework for relational queries is no longer adequate for multidatabase query processing and optimization. As a result, we introduceconstrained query tree as the multidatabase query representation. We show that some knowledge about query predicates and attribute derivation functions can be used to simplify queries. Such knowledge is modeled as an outerjoin graph attached to every outerjoin operation in the query tree. Based on this, we further extend the traditional algebraic transformation framework to include two-way outerjoins andGAD operations. Our framework demonstrates that properties of selection/join predicates and attribute derivation functions can be used to provide interesting transformation alternatives. This framework also serves as a formal ground for developing optimization strategies for multidatabase queries. Recommended by: Clement Yu  相似文献   

9.
《Information Sciences》2007,177(12):2493-2521
The query optimization phase in query processing plays a crucial role in choosing the most efficient strategy for executing a query. In this paper, we study an optimization technique for SQL-Nested queries using Hints. Hints are additional comments that are inserted into an SQL statement for the purpose of instructing the optimizer to perform the specified operations. We utilize various Hints including Optimizer Hints, Table join and anti-join Hints, and Access method Hints. We analyse the performance of various nested queries using the TRACE and TKPROF utilities which provide query execution statistics and execution plans.  相似文献   

10.
Emerging database application domains demand not only high functionality, but also high performance. To satisfy these two requirements, the Volcano query execution engine combines the efficient use of parallelism on a wide variety of computer architectures with an extensible set of query processing operators that can be nested into arbitrarily complex query evaluation plans. Volcano's novel exchange operator permits designing, developing, debugging, and tuning data manipulation operators in single-process environments but executing them in various forms of parallelism. The exchange operator shields the data manipulation operators from all parallelism issues. The design and implementation of the generalized exchange operator are examined. The authors justify their decision to support hierarchical architectures and argue that the exchange operator offers a significant advantage for development and maintenance of database query processing software. They discuss the integration of bit vector filtering into the exchange operator paradigm with only minor modifications  相似文献   

11.
A novel indexing structure-the join index hierarchy-is proposed to handle the “gotos on disk” problem in object-oriented query processing. The method constructs a hierarchy of join indices and transforms a sequence of pointer-chasing operations into a simple search in an appropriate join index file, and thus accelerates navigation in object-oriented databases. The method extends the join index structure studied in relational and spatial databases, supports both forward and backward navigation among objects and classes, and localizes update propagations in the hierarchy. Our performance study shows that a partial join index hierarchy outperforms several other indexing mechanisms in object-oriented query processing  相似文献   

12.
Keyword search in relational databases   总被引:1,自引:1,他引:0  
This paper surveys research on enabling keyword search in relational databases. We present fundamental characteristics and discuss research dimensions, including data representation, ranking, efficient processing, query representation, and result presentation. Various approaches for developing the search system are described and compared within a common framework. We discuss the evolution of new research strategies to resolve the issues associated with probabilistic models, efficient top-k query processing, and schema analysis in relational databases.  相似文献   

13.
Semijoin is a relational operator used in many relational query processing algorithms. Semijoins can be used to “reduce” the database by delimitting portions of the database that contain data relevant to a given query. For some queries, there exist sequences of semijoins that delimit the exact portions of the database needed to answer the query. Such sequences are called full reducers.

This paper considers a class of queries called natural inequality queries (NI queries), and characterizes a subclass for which full reducers exist. We also present an efficient algorithm that decides whether an NI query lies within this subclass, and constructs a full reducer for the query. The NI queries are a subset of the aggregate-free, conjunctive queries of QUEL, and permit join clauses to include <, , =, , >.  相似文献   


14.
为解决基于本体的数据集成系统中的查询转换问题,提出SPARQL查询的关系代数表示和转换方法。引入RDF图模式的关系代数,定义了五种基本的关系运算,给出了SPARQL查询的关系代数表示;提出了SPARQL到SQL的查询转换方法,将基于本体的SPARQL查询转换为可在关系数据库上直接执行的SQL查询,从而实现关系数据库的集成。系统实现表明,该方法能够有效地实现查询语言的转换。  相似文献   

15.
针对统计与科学数据库的应用要求,本文以语义数据模型MICSUM2为基础,以C—关系、原子统计表和复合统计表为操作对象,定义了统计与科学数据库上的操作。这些操作构成了C—关系、原子统计表和统计表集合上的代数,简称MS代数。MS代数从两个方面扩展了关系代数,一是MS代数操作具有更丰富的语义和更广泛的操作对象;二是MS代数包括很多支持统计分析查询的新代数操作,MS代数是构造对用户友好的统计与科学数据库查询语言的理论基础。  相似文献   

16.
The quality of data in relational databases is often uncertain, and the relationship between the quality of the underlying base tables and the set of potential query results, a type of information product (IP), that could be produced from them has not been fully investigated. This paper provides a basis for the systematic analysis of the quality of such IPs. This research uses the relational algebra framework to develop estimates for the quality of query results based on the quality estimates of samples taken from the base tables. Our procedure requires an initial sample from the base tables; these samples are then used for all possible information IPs. Each specific query governs the quality assessment of the relevant samples. By using the same sample repeatedly, our approach is relatively cost effective. We introduce the reference-table procedure, which can be used for quality estimation in general. In addition, for each of the basic algebraic operators, we discuss simpler procedures that may be applicable. Special attention is devoted to the join operation. We examine various, relevant statistical issues, including how to deal with the impact on quality of missing rows in base tables. Finally, we address several implementation issues related to sampling.  相似文献   

17.
Algebraic query optimisation for database programming languages   总被引:1,自引:0,他引:1  
A major challenge still facing the designers and implementors of database programming languages (DBPLs) is that of query optimisation. We investigate algebraic query optimisation techniques for DBPLs in the context of a purely declarative functional language that supports sets as first-class objects. Since the language is computationally complete issues such as non-termination of expressions and construction of infinite data structures can be investigated, whilst its declarative nature allows the issue of side effects to be avoided and a richer set of equivalences to be developed. The language has a well-defined semantics which permits us to reason formally about the properties of expressions, such as their equivalence with other expressions and their termination. The support of a set bulk data type enables much prior work on the optimisation of relational languages to be utilised. In the paper we first give the syntax of our archetypal DBPL and briefly discuss its semantics. We then define a small but powerful algebra of operators over the set data type, provide some key equivalences for expressions in these operators, and list transformation principles for optimising expressions. Along the way, we identify some caveats to well-known equivalences for non-deductive database languages. We next extend our language with two higher level constructs commonly found in functional DBPLs: set comprehensions and functions with known inverses. Some key equivalences for these constructs are provided, as are transformation principles for expressions in them. Finally, we investigate extending our equivalences for the set operators to the analogous operators over bags. Although developed and formally proved in the context of a functional language, our findings are directly applicable to other DBPLs of similar expressiveness. Edited by Matthias Jarke, Jorge Bocca, Carlo Zaniolo. Received September 15, 1994 / Accepted September 1, 1995  相似文献   

18.
Graphs are widely used for modeling complicated data such as social networks, bibliographical networks and knowledge bases. The growing sizes of graph databases motivate the crucial need for developing powerful and scalable graph-based query engines. We propose a SPARQL-like language, G-SPARQL, for querying attributed graphs. The language enables the expression of different types of graph queries that are of large interest in the databases that are modeled as large graph such as pattern matching, reachability and shortest path queries. Each query can combine both structural predicates and value-based predicates (on the attributes of the graph nodes/edges). We describe an algebraic compilation mechanism for our proposed query language which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern. We describe an efficient hybrid Memory/Disk representation of large attributed graphs where only the topology of the graph is maintained in memory while the data of the graph are stored in a relational database. The execution engine of our proposed query language splits parts of the query plan to be pushed inside the relational database (using SQL) while the execution of other parts of the query plan is processed using memory-based algorithms, as necessary. Experimental results on real and synthetic datasets demonstrate the efficiency and the scalability of our approach and show that our approach outperforms native graph databases by several factors.  相似文献   

19.
This paper presents algebraic identities and algebraic query optimization for a parametric model for temporal databases. The parametric model has several features not present in the classical model. In this model, a key is explicitly designated with a relation, and an operator is available to change the key. The algebra for the parametric model is three-sorted; it includes 1) relational expressions that evaluate to relations, 2) domain expressions that evaluate to time domains, and 3) Boolean expressions that evaluate to TRUE or FALSE. The identities in the parametric model are classified as weak identities and strong identities. Weak identities in this model are largely counterparts of the identities in classical relational databases. Rather than establishing weak identities from scratch, a meta inference mechanism, introduced in the paper, allows weak identities to be induced from their respective classical counterpart. On the other hand, the strong identities will be established from scratch. An algorithm is presented for algebraic optimization to transform a query to an equivalent query that will execute more efficiently  相似文献   

20.
We consider adaptive index utilization as a fine-grained problem in autonomic databases in which an existing index is dynamically determined to be used or not in query processing. As a special case, we study this problem for structural joins, the core operator in XML query processing, in the main memory. We find that index utilization is beneficial for structural joins only under certain join selectivity and distribution of matching elements. Therefore, we propose adaptive algorithms to decide whether to use an index probe or a data scan for each step of matching during the processing of a structural join operator. Our adaptive algorithms are based on the history, the look-ahead information, or both. We have developed a cost model to facilitate this adaptation and have conducted experiments with both synthetic and real-world data sets. Our results show that adaptively utilizing indexes in a structural join improves the performance by taking advantage of both sequential scans and index probes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号