首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Volcano-an extensible and parallel query evaluation system   总被引:2,自引:0,他引:2  
To investigate the interactions of extensibility and parallelism in database query processing, we have developed a new dataflow query execution system called Volcano. The Volcano effort provides a rich environment for research and education in database systems design, heuristics for query optimization, parallel query execution, and resource allocation. Volcano uses a standard interface between algebra operators, allowing easy addition of new operators and operator implementations. Operations on individual items, e.g., predicates, are imported into the query processing operators using support functions. The semantics of support functions is not prescribed; any data type including complex objects and any operation can be realized. Thus, Volcano is extensible with new operators, algorithms, data types, and type-specific methods. Volcano includes two novel meta-operators. The choose-plan meta-operator supports dynamic query evaluation plans that allow delaying selected optimization decisions until run-time, e.g., for embedded queries with free variables. The exchange meta-operator supports intra-operator parallelism on partitioned datasets and both vertical and horizontal inter-operator parallelism, translating between demand-driven dataflow within processes and data-driven dataflow between processes. All operators, with the exception of the exchange operator, have been designed and implemented in a single-process environment, and parallelized using the exchange operator. Even operators not yet designed can be parallelized using this new operator if they use and provide the interator interface. Thus, the issues of data manipulation and parallelism have become orthogonal, making Volcano the first implemented query execution engine that effectively combines extensibility and parallelism  相似文献   

2.
In the past two decades, a considerable amount of research exists which uses hardware, firmware, and novel architectures to achieve the needed efficiency in implementing database management functions. However, most of the past efforts have been directed toward developing database computers for supporting a relatively primitive data model, namely, the relational model. This paper presents the design and evaluation of an Object Flow Computer (OFC). OFC is designed to efficiently support the processing of object-oriented databases. OFC employs a vertically fragmented data storage structure and a two-phase parallel query processing strategy. A set of primitive operators is defined for OFC. Depending on the performance requirement, these operators can be implemented in software running on general-purpose processors or as functions in special-purpose coprocessors. A high-level database request can be decomposed into these primitive operators and executed in parallel. OFC combines a number of known database processing techniques such as query decomposition, pipelining mode of data processing, and data flow control strategy. The performance evaluation of the proposed two-phase query processing strategy and a comparison with the conventional query processing strategy are also presented.  相似文献   

3.
分析了基于结构化覆盖网的分布式查询处理模型,支持大量数据流的分布式存储,连续查询间、查询内的并行处理操作,能够在很大程度上消除资源约束问题(主要是内存),提高了查询性能、服务质量,并且该查询模型具有很好的扩展性。  相似文献   

4.
The newly developed object-oriented database management systems provide rich facilities for the modeling and processing of structural as well as behavioral properties of complex application objects. However, due to their inherent generality, new functionalities to be added to these systems as they continue to evolve, and high performance demand in many application domains, efficient parallel algorithms and architectures would be needed to meet the performance requirement for processing large OODBs. In our previous work, we have shown that processing OODBs can be viewed as the manipulation of patterns of object associations. In this paper, we present several parallel, multiwavefront algorithms based on two approaches, i.e., identification and elimination approaches, to verify association patterns specified in queries. Both approaches allow more processors to operate concurrently on a query than the traditional tree-structured query processing approach, thus introducing a higher degree of parallelism in query processing. We present a graph model to transform the query processing problem into a graph problem. Based on the graph model, proofs of correctness of both approaches for tree-structured queries are given, and a combined approach for solving cyclic queries is also provided. We present a new data structure to represent associations between objects, parallel algorithms based on these approaches, and some evaluation results obtained from an actual implementation of these algorithms on an nCUBE 2 parallel computer.  相似文献   

5.
目前成熟的RDF流处理(RDF Stream Processing, RSP)系统由于集中式的设计而缺乏并行处理特性,因此在查询处理大量传入的RDF流数据时,均无法实现高吞吐和低延迟。为提高查询性能,本文对RSP查询过程和Flink流计算结构进行研究,设计数据源、滤器、多路分区连接和投影4个逻辑操作符,并设计一种多流连接(Multi-Stream Join, MSJ)算法用于生成具有并行性的有向无环图的逻辑查询计划,最后以大数据流处理平台Apache Flink为底层实现逻辑操作符和逻辑查询计划。使用真实数据集SRBench和模拟数据集LUBMs进行实验验证。结果表明,与最成熟的系统C-SPARQL、CQELS相比,单机吞吐量增长高达10倍,5台机器集群的吞吐量增长高达28倍,同时在延时方面达到了毫秒级;在查询性能方面实现了处理大量RDF流数据时吞吐量的提高和延时的降低。  相似文献   

6.
Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Joins as physical database operators. In this paper, we focus on the study, design, implementation, and optimization of a Similarity Join database operator for metric spaces. We present DBSimJoin, a physical database operator that integrates techniques to: enable a non-blocking behavior, prioritize the early generation of results, and fully support the database iterator interface. The proposed operator can be used with multiple distance functions and data types. We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. We also study ways in which DBSimJoin can be combined with other similarity and non-similarity operators to answer more complex queries, and how DBSimJoin can be used in query transformation rules to improve query performance. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches and scales very well when important parameters like ε, data size, and number of dimensions increase.  相似文献   

7.
集群技术在大型数据库应用系统中得到了越来越多的应用。无共享结构的集群易于实现,具有良好的可扩展性。但是目前的数据库集群工具非常少,往往与数据库相关。针对这一问题,该文提出了一种灵活有效的构建数据库集群的方法,研究并实现了并行数据库中间件StarTP。StarTP的基本思想是:屏蔽后端数据库的细节,为应用提供单一的虚拟数据库;通过流水并行加速数据加载;利用数据划分和复制将查询本地化从而实现并行查询。StarTP支持大型数据库集群,具有容错和负载均衡功能。试验结果证明了StarTP的有效性和可扩展性。  相似文献   

8.
Database query processing can benefit significantly from parallelism. Parallel database algorithms combine substantial CPU and I/O activity, memory requirements, and massive data exchange between processes, all of which must be considered to obtain optimal performance. Since parallel external sorting is a very typical example, we have focused on sorting to tune Volcano, a new query processing system. The purpose of the Volcano project is to provide efficient, extensible tools for query and request processing in novel application domains, particularly in object-oriented and scientific database systems, and for experimental database performance research. It includes all query processing algorithms conventionally used in relational database systems as well as several new ones, and can execute all of them in parallel. In this article, we present Volcano's parallel external sorting algorithm and a sequence of enhancements to improve its performance. We obtained very good absolute performance, 84 seconds for 100 MB of data, as well as near-linear speedup with sixteen CPUs and disks. Furthermore, these results were achieved on a shared-memory machine despite the common belief that parallel query processing is best implemented on distributed-memory systems. We detail our tuning measures and report on their effectiveness.  相似文献   

9.
The application of the object-oriented (O-O) paradigm in the database management field has gained much attention in recent years. Several experimental and commercial O-O database management systems have become available. However, the existing O-O DBMSs still lack a solid mathematical foundation for the manipulation of O-O databases, the optimization of queries, and the design and selection of storage structures for supporting O-O database manipulations. This paper presents an association algebra (A-algebra) to serve as a mathematical foundation for processing O-O databases, which is analogous to the relational algebra used for processing relational databases. In this algebra, objects and their associations in an O-O database are uniformly represented by association patterns which are manipulated by a number of operators to produce other association patterns. Different from the relational algebra, in which set operations operate on relations with union-compatible structures, the A-algebra operators can operate on association patterns of homogeneous and heterogeneous structures. Different from the traditional record-based relational processing, the A-algebra allows very complex patterns of object associations to be directly manipulated. The pattern-based query formulation and the A-algebra operators are described. Some mathematical properties of the algebraic operators are presented together with their application in query decomposition and optimization. The completeness of the A-algebra is also defined and proven. The A-algebra has been used as the basis for the design and implementation of an object-oriented query language, OQL, which is the query language used in a prototype Knowledge Base Management System OSAM*.KBMS  相似文献   

10.
Aggregation of imprecise and uncertain information in databases   总被引:4,自引:0,他引:4  
Information stored in a database is often subject to uncertainty and imprecision. Probability theory provides a well-known and well understood way of representing uncertainty and may thus be used to provide a mechanism for storing uncertain information in a database. We consider the problem of aggregation using an imprecise probability data model that allows us to represent imprecision by partial probabilities and uncertainty using probability distributions. Most work to date has concentrated on providing functionality for extending the relational algebra with a view to executing traditional queries on uncertain or imprecise data. However, for imprecise and uncertain data, we often require aggregation operators that provide information on patterns in the data. Thus, while traditional query processing is tuple-driven, processing of uncertain data is often attribute-driven where we use aggregation operators to discover attribute properties. The aggregation operator that we define uses the Kullback-Leibler information divergence between the aggregated probability distribution and the individual tuple values to provide a probability distribution for the domain values of an attribute or group of attributes. The provision of such aggregation operators is a central requirement in furnishing a database with the capability to perform the operations necessary for knowledge discovery in databases  相似文献   

11.
张宇  张延松  陈红  王珊 《软件学报》2016,27(5):1246-1265
通用GPU因其强大的并行计算能力成为新兴的高性能计算平台,并逐渐成为近年来学术界在高性能数据库实现技术领域的研究热点.但当前GPU数据库领域的研究沿袭的是ROLAP(relational OLAP)多维分析模型,研究主要集中在关系操作符在GPU平台上的算法实现和性能优化技术,以哈希连接的GPU并行算法研究为中心.GPU拥有数千个并行计算单元,但其逻辑控制单元较少,相对于CPU具有更强的并行计算能力,但逻辑控制和复杂内存管理能力较弱,因此并不适合需要复杂数据结构和复杂内存管理机制的内存数据库查询处理算法直接移植到GPU平台.提出了面向GPU向量计算特性的混合OLAP多维分析模型semi-MOLAP,将MOLAP(multidimensionalOLAP)模型的直接数组访问和计算特性与ROLAP模型的存储效率结合在一起,实现了一个基于完全数组结构的GPU semi-MOLAP多维分析模型,简化了GPU数据管理,降低了GPU semi-MOLAP算法复杂度,提高了GPU semi-MOLAP算法的代码执行率.同时,基于GPU和CPU计算的特点,将semi-MOLAP操作符拆分为CPU和GPU平台的协同计算,提高了CPU和GPU的利用率以及OLAP的查询整体性能.  相似文献   

12.
查询操作是数据库中最常用的操作,由于分布式数据库的数据分布性和冗余性,使得查询优化处理成为分布式数据库研究的核心问题之一。为了提高分布式数据库查询效率,分析讨论了基于直接连接的常见执行策略和查询优化算法,同时针对分布式数据库应用中多表连接时存在多连接属性,提出一种改进的直接连接查询优化策略。改进后的算法提高了查询执行的并行性,缩短了查询处理时间,提高了查询效率。  相似文献   

13.
由于数据流具有无界的特性,数据流系统中的查询多为带有窗口的查询,对带有窗口的查询,现有方法常由操作符直接维护窗口,但操作符的类型及排列方式可能会导致窗口难以维护,且冗余度较大.因此提出一种查询处理中的分级窗口维护策略,将窗口分为流窗口和操作符窗口,以流窗口为主并控制操作符窗口的维护,使查询中的窗口保持一致,解决了窗口维护问题,并且符合流查询语言的语义,各级窗口中的数据通过共享来解决内存消耗问题.  相似文献   

14.
针对数据流的特征,提出了一种基于速率的抢占式批处理方法。一个查询计划是一个操作符序列。文章将一个查询计划划分为不同的操作单元,并为不同的操作单元分配不同的优先级,而且这个优先级随系统因素的变化而动态改变,根据变化的优先级来动态调度操作单元,采取抢占式调度,从而提高连续查询的查询效率。实验表明该方法不但能提高系统的总体性能,而且可以减少元组的平均等待时间,大大提高了元组的输出速率。  相似文献   

15.
Dynamic programming (DP) is a popular technique which is used to solve combinatorial search and optimization problems. This paper focuses on one type of DP, which is called nonserial polyadic dynamic programming (NPDP). Owing to the nonuniform data dependencies of NPDP, it is difficult to exploit either parallelism or locality. Worse still, the emerging multi/many-core architectures with small on-chip memory make these issues more challenging. In this paper, we address the challenges of exploiting the fine grain parallelism and locality of NPDP on multicore architectures. We describe a latency-tolerant model and a percolation technique for programming on multicore architectures. On an algorithmic level, both parallelism and locality do benefit from a specific data dependence transformation of NPDP. Next, we propose a parallel pipelining algorithm by decomposing computation operators and percolating data through a memory hierarchy to create just-in-time locality. In order to predict the execution time, we formulate an analytical performance model of the parallel algorithm. The parallel pipelining algorithm achieves not only high scalability on the 160-core IBM Cyclops64, but portable performance as well, across the 8-core Sun Niagara and quad-cores Intel Clovertown.  相似文献   

16.
17.
《Information Systems》2005,30(3):167-204
Algebraic optimisation is both theoretically and practically important for query processing in complex value databases. In this paper, we consider this issue and investigate some algebraic properties concerning the nested relational operators.The join operation is one of the most time-consuming operations in nested relational query processing. We introduce a new join operator, called P-join, which combines the advantages of Roth's extended natural join and Colby's recursive join for efficient data access. We also investigate some algebraic properties concerning the P-join operator and extended relational operators, which can be used for query optimisation in nested relational databases.We then examine the role of the restructuring operators nest and unnest in their interactions with the extended relational operators proposed by Roth et al. Under certain functional and mutual data dependencies, the six nested relational equations will hold.Finally, we outline the steps of a heuristic optimisation algorithm that utilises algebraic transformation rules developed in this paper and previous related work to transform an initial query to an optimised one that is more efficient to execute.  相似文献   

18.
快速局域网下分布式查询处理数据划分策略的研究   总被引:3,自引:0,他引:3  
在分布式数据库系统中,查询处理的响应时间一直是一个热门话题。根据分布式数据库查询的固有并行性,可以利用数据划分来提高查询的并行处理程度、改进响应时间。文章提出了在快速局域网下、多数据库环境中,分布式查询处理的一种数据划分策略,旨在提高查询的响应时间。并通过模拟实验验证了算法的合理性。  相似文献   

19.
Wireless sensor networks (WSN) are composed of several sensors having limited memory, processing power, communication bandwidth, and energy, which cooperate in performing a given task. The use of the database paradigm has emerged in the last few years as a viable solution to manage data in such a context. In this paper we present the MaD‐WiSe system, a distributed query processing framework that moves the processing of the query into the network. MaD‐WiSe reconsiders various aspects related to database system design and it reinterprets them according to the WSN constraints and requirements. In particular it considers the aspects related to the definition of a query language to formalize the queries, a stream model to manage data acquired by the sensors, a query algebra to define the operators that actually perform the query, and energy efficiency and query optimization strategies for saving energy. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

20.
Real-time database management systems have become a hot topic in the research and development community of late (Fort94a,Grah93,WCPP93). In addition there has been a movement in the standards community to examine and develop extensions to existing and proposed query languages to support real-time (Fish94,Fort94,Fort94a,FS94,Gord94).This paper examines the state of research into real-time database management systems in the areas of database structuring, transaction structuring, transaction processing, concurrency control, recovery and real-time transaction scheduling. We then extend the findings and trends of this work into the high level specification of data definition language, data manipulation language and data control language extensions for the standard SQL2 and emerging SQL3 database query languages.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号