首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 468 毫秒
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods. Recommended by: Patrick Valduriez  相似文献   

The performances of several algorithms suitable for processing an important class of recursive queries called the instantiated transitive closure (TC) queries are studied and compared. These algorithms are the wavefront, δ-wavefront, and a generic algorithm called super-TC. During the evaluation of a TC query, the first two algorithms may read a given disk page more than once, whereas super-TC reads the disk page at most once. A comprehensive performance evaluation of these three algorithms using rigorous analytical and simulation models is presented. The study reveals that the relative performance of the algorithms is a strong function of the parameters which characterize the processed TC query and the relation referenced by that query. The superiority of one of the super-TC variants over all of the other presented algorithms is shown  相似文献   

Summary Forming the transitive closure of a binary relation (or directed graph) is an important part of many algorithms. When the relation is represented by a bit matrix, the transitive closure can be efficiently computed in parallel in a systolic array.Here we propose two novel ways of computing the transitive closure of an arbitrarily big graph on a systolic array of fixed size. The first method is a simple partitioning of a well-known systolic algorithm for computing the transitive closure. The second is a block-structured algorithm. This algorithm is suitable for execution on a systolic array that can multiply fixed size bit matrices and compute transitive closure of graphs with a fixed number of nodes. The algorithm is, however, not limited to systolic array implementations; it works onany parallel architecture that can perform these bit matrix operatons efficiently.The shortest path problem, for directed graphs with weighted edges, can also be solved in the same manner, devised above, as the transitive closure is computed. Björn Lisper was born in 1956 in Solna, Sweden. He received the M. Eng. Physics degree in 1980 and the Ph.D. degree in Computer Science in 1987, both from the Royal Institute of Technology in Stockholm. Currently he shares his time between the Royal Institute of Technology and the Swedish Institute of Computer Science. His research interests are mainly in the area of formal methods for deriving efficient parallel implementations of algorithms, including synthesis of fixed hardware structures for specific algorithms and compilation techniques for tightly coupled parallel systems. Dr. Lisper is a member of the European Association for Theoretical Computer Science.  相似文献   

A data structure is used to store materialized generalized transitive closure so that the evaluation of generalized transitive closure queries, deletions, and insertions of tuples can be performed efficiently in centralized and parallel environments. Some techniques to manage materialized transitive closure are presented and generalized to more general recursions. The proposed algorithms and the associated data structures are simple conceptually and in implementation. In a multiprocessor environment, the time complexities for insertion and deletion of the authors schemes are reduced. Only two rounds of communication are needed  相似文献   

The development of efficient algorithms to process the different forms of transitive-closure (TC) queries within the context of large database systems has recently attracted a large volume of research efforts. In this paper, we present two new algorithms suitable for processing one of these forms, the so called strong partially instantiated transitive closure, in which one of the query's arguments is instantiated to a set of constants and the processing of which yields a set of tuples that draw their values from both of the query's instantiated and uninstantiated arguments. These algorithms avoids the redundant computations and high storage cost found in a number of similar algorithms. Using simulation, this paper compares the performance of the new algorithms with those found in literature and shows clearly the superiority of the new algorithms  相似文献   

We propose an approach with feasible space requirement to maintain the transitive closure of a class of hypergraphs called OR-graphs. OR-graphs are equivalent to disjunctive deductive databases where disjunctions are limited to one attribute in each OR-table. It has been shown that query processing in disjunctive deductive databases grows into CoNP with very simple examples, but few attempts have been made, as is done in this paper, to obtain classes of disjunctive databases and queries for which efficient algorithms exist. Polynomial time algorithms are presented to compute the transitive closure of OR-graphs and to handle dynamic insertions and deletions. With algorithms for insertions and deletions, we provide a simple but efficient technique to solve the failure set problem in reliability models, which is equivalent to finding the closure of an arbitrary non-empty set of simple nodes. We also show that a minimal extension to OR-graphs makes the computational complexity of the transitive closure CoNP-complete.Research supported in part by NSF under IRI-9210220 and IRI-9111988, Omron Corporation and Omron Management Center of America.  相似文献   

Analyzing graphs is a fundamental problem in big data analytics, for which DBMS technology does not seem competitive. On the other hand, SQL recursive queries are a fundamental mechanism to analyze graphs in a DBMS, whose processing and optimization are significantly harder than traditional SPJ queries. Columnar DBMSs are a new faster class of database system, with significantly different storage and query processing mechanisms compared to row DBMSs, still the dominating technology. With that motivation in mind, we study the optimization of recursive queries on a columnar DBMS focusing on two fundamental and complementary graph problems: transitive closure and adjacency matrix multiplication. From a query processing perspective we consider the three fundamental relational operators: selection, projection and join (SPJ), where projection subsumes SQL group-by aggregation. We present comprehensive experiments comparing recursive query processing on columnar, row and array DBMSs to analyze large graphs with different shape and density. We study the relative impact of query optimizations and we compare raw speed of DBMSs to evaluate recursive queries on graphs. Results confirm classical query optimizations that keep working well in a columnar DBMS, but their relative impact is different. Most importantly, a columnar DBMS with tuned query optimization is uniformly faster than row and array systems to analyze large graphs, regardless of their shape, density and connectivity. On the other hand, there is no clear winner between the row and array DBMSs.  相似文献   

I/O parallelism is considered to be a promising approach to achieving high performance in parallel data warehousing systems where huge amounts of data and complex analytical queries have to be processed. This paper proposes a parallel secondary data cube storage structure (PHC for short) to efficiently support the processing of range sum queries and dynamic updates on data cube using parallel computing systems. Based on PHC, two parallel algorithms for processing range sum queries and updates are proposed also. Both the algorithms have the same time complexity, O(logdn/P). The analytical and experimental results show that PHC and the parallel algorithms have high performance and achieve optimum speedup.  相似文献   

Over the past few years, much attention has been paid to deductive databases. They offer a logic-based interface, and allow formulation of complex recursive queries. However, they do not offer appropriate update facilities, and do not support existing applications. To overcome these problems an SQL-like interface is required besides a logic-based interface.

In the PRISMA project we have developed a tightly-coupled distributed database, on a multiprocessor machine, with two user interfaces: SQL and PRISMAlog. Query optimization is localized in one component: the relational query optimizer. Therefore, we have defined an eXtended Relational Algebra that allows recursive query formulation and can also be used for expressing executable schedules, and we have developed algebraic optimization strategies for recursive queries. In this paper we describe an optimization strategy that rewrites regular (in the context of formal grammars) mutually recursive queries into standard Relational Algebra and transitive closure operations. We also describe how to push selections into the resulting transitive closure operations.

The reason we focus on algebraic optimization is that, in our opinion, the new generation of advanced database systems will be built starting from existing state-of-the-art relational technology, instead of building a completely new class of systems.  相似文献   

One of the most important reasoning tasks on queries is checking containment, i.e., verifying whether one query yields necessarily a subset of the result of another one. Query containment is crucial in several contexts, such as query optimization, query reformulation, knowledge-base verification, information integration, integrity checking, and cooperative answering. Containment is undecidable in general for Datalog, the fundamental language for expressing recursive queries. On the other hand, it is known that containment between monadic Datalog queries and between Datalog queries and unions of conjunctive queries are decidable. It is also known that containment between unions of conjunctive two-way regular path queries, which are queries used in the context of semistructured data models containing a limited form of recursion in the form of transitive closure, is decidable. In this paper, we combine the automata-theoretic techniques at the base of these two decidability results to show that containment of Datalog in union of conjunctive two-way regular path queries is decidable in 2EXPTIME. By sharpening a known lower bound result for containment of Datalog in union of conjunctive queries we show also a matching lower bound.  相似文献   

The main requirements for spatial query processing via mobile terminals include rapid and accurate searching and low energy consumption. Most location-based services (LBSs) are provided using an on-demand method, which is suitable for light-loaded systems where contention for wireless channels and server processing is not severe. However, as the number of users of LBSs increases, performance deteriorates rapidly since the servers’ capability to process queries is limited. Furthermore, the response time of a query may significantly increase with the concentration of users’ queries in a server at the same time. That is because the server has to check the locations of users and potential objects for the final result and then individually send answers to clients via a point-to-point channel. At this time, an inefficient structure of spatial index and searching algorithm may incur an extremely large access latency. To address this problem, we propose the Hierarchical Grid Index (HGI), which provides a light-weight sequential location-based index structure for efficient LBSs. We minimize the index size through the use of hierarchical location-based identifications. And we support efficient query processing in broadcasting environments through sequential data transfer and search based on the object locations. We also propose Top-Down Search and Reduction-Counter Search algorithms for efficient searching and query processing. HGI has a simple structure through elimination of replication pointers and is therefore suitable for broadcasting environments with one-dimensional characteristics, thus enabling rapid and accurate spatial search by reducing redundant data. Our performance evaluation shows that our proposed index and algorithms are accurate and fast and support efficient spatial query processing.  相似文献   

In this paper, we present a new method for fuzzy query processing for document retrieval based on extended fuzzy concept networks. In an extended fuzzy concept network, there are four kinds of fuzzy relationships between concepts, i.e., fuzzy positive association, fuzzy negative association, fuzzy generalization, and fuzzy specialization. An extended fuzzy concept network can be modeled by a relation matrix and a relevance matrix, where the elements in a relation matrix represent the fuzzy relationships between concepts, and the elements in a relevance matrix indicate the degrees of relevance between concepts. The implicit fuzzy relationships between concepts can be inferred by the transitive closure of the relation matrix. The implicit degrees of relevance between concepts also can be inferred by the transitive closure of the relevance matrix. The proposed method allows the users to perform positive queries, negative queries, generalization queries, and specialization queries. The proposed method allows the users to perform fuzzy queries in a more flexible and more intelligent manner.  相似文献   

There is a perceived need within the database community to extend the traditional relational database systems so as to accommodate applications which are deductive in nature. One major problem involved in such an extension is the efficient processing of recursive queries. To this end, parallel processing is expected to play an important role. While substantial work has been done in devising strategies for processing recursive queries in parallel, it is perhaps surprising that little has been reported on the implementation and the run-time performance of these strategies. In the paper we report our experience of implementing a pipelined evaluation strategy on transputers. A wide range of queries, database structures and architectural configurations are considered as benchmarks in this study. The performance is studied in terms of both speed-up factors and communication costs. The experimental results show the potential of processing recursive queries in parallel, and provide insight into the usefulness of using transputers for such applications.  相似文献   

The integration of logic rules and relational databases has recently emerged as an important technique for developing knowledge management systems. An important class of logic rules utilized by these systems is the so-called transitive closure rules, the processing of which requires the computation of the transitive closure of database relations referenced by these rules. This article presents a new algorithm suitable for computing the transitive closure of very large database relations. This algorithm proceeds in two phases. In the first phase, a general graph is condensed into an acyclic one, and at the same time a special sparse matrix is formed from the acyclic graph. The second phase is the main one, in which all the page I/O operations are minimized by removing most of the redundant operations that appear in previous algorithms. Using simulation, this article also studies and examines the performance of this algorithm and compares it with the previous algorithms.  相似文献   


Functional completeness of a relational language is the ability to express linear recursive queries. We present such a language that also has the property of relational completeness (relational algebra is, in fact, embedded in it). The transitive closure of a binary relation is carried out by means of the projection function and its inverse.  相似文献   

In this paper we introduce a general framework for casting fully dynamic transitive closure into the problem of reevaluating polynomials over matrices. With this technique, we improve the best known bounds for fully dynamic transitive closure. In particular, we devise a deterministic algorithm for general directed graphs that achieves O(n 2) amortized time for updates, while preserving unit worst-case cost for queries. In case of deletions only, our algorithm performs updates faster in O(n) amortized time. We observe that fully dynamic transitive closure algorithms with O(1) query time maintain explicitly the transitive closure of the input graph, in order to answer each query with exactly one lookup (on its adjacency matrix). Since an update may change as many as Ω(n 2) entries of this matrix, no better bounds are possible for this class of algorithms. This work has been partially supported by the Sixth Framework Programme of the EU under contract number 507613 (Network of Excellence “EuroNGI: Designing and Engineering of the Next Generation Internet”), and number 001907 (“DELIS: Dynamically Evolving, Large Scale Information Systems”), and by the Italian Ministry of University and Research (Project “ALGO-NEXT: Algorithms for the Next Generation Internet and Web: Methodologies, Design and Experiments”). Portions of this paper have been presented at the 41st Annual Symp. on Foundations of Computer Science, 2000.  相似文献   

One of the key components of a multiuser multimedia-on-demand system is the data server. Digitalization of traditionally analog data such as video and audio, and the feasibility of obtaining network bandwidths above the gigabit-per-second range, are two important advances that have made possible the realization, in the near future, of interactive distributed multimedia systems. Secondary-to-main memory I/O technology has not kept pace with advances in networking, main memory, and CPU processing power. Consequently, the performance of the server has a direct bearing on the overall performance of such a system. In this paper, we present a highperformance solution to the I/O retrieval problem in a distributed multimedia system. We develop a model for the architecture of a server for such a system. Parallelism of data retrieval is achieved by striping the data across multiple disks. We present the algorithms for server operation when servicing a constant number of streams, as well as the admission control policy for accepting requests for new streams. The performance of any server ultimately depends on the data access patterns. Two modifications of the basic retrieval algorithm are presented to exploit data access patterns in order to improve system throughput and response time. Finally, we present preliminary performance results of these algorithms on the IBM SP1 and Intel Paragon parallel computers.  相似文献   

Abstract. This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data and XML. We describe a simple and powerful query language based on pattern matching and show that it can be expressed using structural recursion, which is introduced as a top-down, recursive function, similar to the way XSL is defined on XML trees. On cyclic data, structural recursion can be defined in two equivalent ways: as a recursive function which evaluates the data top-down and remembers all its calls to avoid infinite loops, or as a bulk evaluation which processes the entire data in parallel using only traditional relational algebra operators. The latter makes it possible for optimization techniques in relational queries to be applied to structural recursion. We show that the composition of two structural recursion queries can be expressed as a single such query, and this is used as the basis of an optimization method for mediator systems. Several other formal properties are established: structural recursion can be expressed in first-order logic extended with transitive closure; its data complexity is PTIME; and over relational data it is a conservative extension of the relational calculus. The underlying data model is based on value equality, formally defined with bisimulation. Structural recursion is shown to be invariant with respect to value equality. Received: July 9, 1999 / Accepted: December 24, 1999  相似文献   

In this paper we consider parallel processing of a graph represented by a database relation, and we achieved two objectives. First, we propose a methodology for analyzing the speedup of a parallel processing strategy with the purpose of selecting at runtime one of several candidate strategies, depending on the hardware architecture and the input graph. Second, we study the single-source reachability problem, namely the problem of computing the set of nodes reachable from a given node in a directed graph. We propose several parallel strategies for solving this problem, and we analyze their performance using our new methodology. The analysis is confirmed experimentally in a UNIX-Ethernet environment. We also extend the results to the transitive closure problem.A preliminary shortened version of this paper has appeared inPDIS. See Ref. 1.This author's work was supported in part by NSF Grant 90-03341.This author's work was supported in part by the Natural Sciences and Engineering Research Council of Canada.This author's work was supported in part by NSF Grant 90-03341.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号