首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
With the growth of information technology and computer networks, there is a vital need for optimal design of distributed databases with the aim of performance improvement in terms of minimizing the round-trip response time and query transmission and processing costs. To address this issue, new fragmentation, data allocation, and replication techniques are required. In this paper, we propose enhanced vertical fragmentation, allocation, and replication schemes to improve the performance of distributed database systems. The proposed fragmentation scheme clusters highly-bonded attributes (i.e., normally accessed together) into a single fragment in order to minimize the query processing cost. The allocation scheme is proposed to find an optimized allocation to minimize the round-trip response time. The replication scheme partially replicates the fragments to increase the local execution of queries in a way that minimizes the cost of transmitting replicas to the sites. Experimental results show that, on average, the proposed schemes reduce the round-trip response time of queries by 23% and query processing cost by 15%, as compared to the related work.  相似文献   

2.
NoSQL databases are famed for the characteristics of high scalability, high availability, and high fault-tolerance. So NoSQL databases are used in a lot of applications. The data partitioning strategy and fragment allocation strategy directly affect NoSQL database systems’ performance. The data partition strategy of large, global databases is performed by horizontally, vertically partitioning or combination of both. In the general way the system scatters the related fragments as possible to improve operations’ parallel degree. But the operations are usually not very complicated in some applications, and an operation may access to more than one fragment. At the same time, those fragments which have to be accessed by an operation may interact with each other. The general allocation strategies will increase system’s communication cost during operations execution over sites. In order to improve those applications’ performance and enable NoSQL database systems to work efficiently, these applications’ fragments have to be allocated in a reasonable way that can reduce the communication cost i.e., to minimize the total volume of data transmitted during operations execution over sites. A strategy of clustering fragments based on hypergraph is proposed, which can cluster fragments which were accessed together in most operations to the same cluster. Themethod uses a weighted hypergraph to represent the fragments’ access pattern of operations. A hypergraph partitioning algorithmis used to cluster fragments in our strategy. This method can reduce the amount of sites that an operation has to span. So it can reduce the communication cost over sites. Experimental results confirm that the proposed technique will effectively contribute in solving fragments re-allocation problem in a specific application environment of NoSQL database system.  相似文献   

3.
Enhancing the performance of the DDBs (Distributed Database system) can be done by speeding up the computation of the data allocation, leading to higher speed allocation decisions and resulting in smaller data redundancy and shorter processing time. This paper deals with an integrated method for grouping the distributed sites into clusters and customizing the database fragments allocation to the clusters and their sites. We design a high speed clustering and allocating method to determine which fragments would be allocated to which cluster and site so as to maintain data availability and a constant systemic reliability, and evaluate the performance achieved by this method and demonstrate its efficiency by means of tabular and graphical representation. We tested our method over different network sites and found it reduces the data transferred between the sites during the execution time, minimizes the communication cost needed for processing applications, and handles the database queries and meets their future needs.  相似文献   

4.
查询操作是数据库中最常用的操作,由于分布式数据库的数据分布性和冗余性,使得查询优化处理成为分布式数据库研究的核心问题之一。为了提高分布式数据库查询效率,分析讨论了基于直接连接的常见执行策略和查询优化算法,同时针对分布式数据库应用中多表连接时存在多连接属性,提出一种改进的直接连接查询优化策略。改进后的算法提高了查询执行的并行性,缩短了查询处理时间,提高了查询效率。  相似文献   

5.
Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is well understood, the unique characteristics of the XML data and query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically. Based on this, we propose solutions to two of the problems encountered in distributed query processing and optimization on XML data, namely localization and pruning. Localization takes a fragmentation-unaware query plan and converts it to a distributed query plan that can be executed at the sites that hold XML data fragments in a distributed system. We then show how the resulting distributed query plan can be pruned so that only those sites are accessed that can contribute to the query result. We demonstrate that our techniques can be integrated into a real-life XML database system and that they significantly improve the performance of distributed query execution.  相似文献   

6.
The interest for multimedia database management systems has grown rapidly due to the need for the storage of huge volumes of multimedia data in computer systems. An important building block of a multimedia database system is the query processor, and a query optimizer embedded to the query processor is needed to answer user queries efficiently. Query optimization problem has been widely studied for conventional database systems; however it is a new research area for multimedia database systems. Due to the differences in query processing strategies, query optimization techniques used in multimedia database systems are different from those used in traditional databases. In this paper, a query optimization strategy is proposed for processing spatio-temporal queries in video database systems. The proposed strategy includes reordering algorithms to be applied on query execution tree. The performance results obtained by testing the reordering algorithms on different query sets are also presented.  相似文献   

7.
数据库系统性能模型是数据库系统管理的重要基础技术支撑,广泛用于查询调度、资源分配、性能调优等任务中。当前的性能模型主要分为分析型和统计型两种,分析型模型需要深入研究数据库系统查询执行过程,对动态查询的适应性较好,无须成本高昂的采样实验,但在查询并行执行情景下建模复杂,对不同的数据库系统有不同的理论模型。统计型模型无须分析查询执行过程,通过采集查询执行参数并训练某个数学模型。统计型建模过程简单,能够较好地描述查询交互,预测效果较好,但采样成本很高,对动态查询的适应性差。对数据库系统性能建模的主要文献进行综述,重点介绍数据库系统性能建模的主要方法,并讨论这两类模型各自的优缺点、建模的难点以及应对策略。在此基础上,对数据库系统性能模型领域的研究做了展望,为有关该领域的研究提供参考。  相似文献   

8.
The Semantic Web’s promise of web-wide data integration requires the inclusion of legacy relational databases,1 i.e. the execution of SPARQL queries on RDF representation of the legacy relational data. We explore a hypothesis: existing commercial relational databases already subsume the algorithms and optimizations needed to support effective SPARQL execution on existing relationally stored data. The experiment is embodied in a system, Ultrawrap, that encodes a logical representation of the database as an RDF graph using SQL views and a simple syntactic translation of SPARQL queries to SQL queries on those views. Thus, in the course of executing a SPARQL query, the SQL optimizer uses the SQL views that represent a mapping of relational data to RDF, and optimizes its execution. In contrast, related research is predicated on incorporating optimizing transforms as part of the SPARQL to SQL translation, and/or executing some of the queries outside the underlying SQL environment.Ultrawrap is evaluated using two existing benchmark suites that derive their RDF data from relational data through a Relational Database to RDF (RDB2RDF) Direct Mapping and repeated for each of the three major relational database management systems. Empirical analysis reveals two existing relational query optimizations that, if applied to the SQL produced from a simple syntactic translations of SPARQL queries (with bound predicate arguments) to SQL, consistently yield query execution time that is comparable to that of SQL queries written directly for the relational representation of the data. The analysis further reveals the two optimizations are not uniquely required to achieve a successful wrapper system. The evidence suggests effective wrappers will be those that are designed to complement the optimizer of the target database.  相似文献   

9.
This paper addresses the processing of a query in distributed database systems using a sequence of semijoins. The objective is to minimize the intersite data traffic incurred by a distributed query. A method is developed which accurately and efficiently estimates the size of an intermediate result of a query. This method provides the basis of the query optimization algorithm. Since the distributed query optimization problem is known to be intractable, a heuristic algorithm is developed to determine a low-cost sequence of semijoins. The cost comparison with an existing algorithm is provided. The complexity of the main features of the algorithm is analytically derived. The scheduling time for sequences of semijoins is measured for example queries using the PASCAL program which implements the algorithm. All rights of reproduction in any form reserved.  相似文献   

10.
Distributed database systems provide a new data processing and storage technology for decentralized organizations of today. Query optimization, the process to generate an optimal execution plan for the posed query, is more challenging in such systems due to the huge search space of alternative plans incurred by distribution. As finding an optimal execution plan is computationally intractable, using stochastic-based algorithms has drawn the attention of most researchers. In this paper, for the first time, a multi-colony ant algorithm is proposed for optimizing join queries in a distributed environment where relations can be replicated but not fragmented. In the proposed algorithm, four types of ants collaborate to create an execution plan. Hence, there are four ant colonies in each iteration. Each type of ant makes an important decision to find the optimal plan. In order to evaluate the quality of the generated plan, two cost models are used—one based on the total time and the other on the response time. The proposed algorithm is compared with two previous genetic-based algorithms on chain, tree and cyclic queries. The experimental results show that the proposed algorithm saves up to about 80 % of optimization time with no significant difference in the quality of generated plans compared with the best existing genetic-based algorithm.  相似文献   

11.
现有的较成熟的分布式数据库系统大都是不含水平划分的,而水平划分能够提高许多数据库操作的时空效率。本文介绍O~2D~2B中包含水平划分的分布式查询处理,主要包括与查询语句相关各表的分布地址集的获得、数据传输费用的计算与执行地点的选择,其中着重介绍通过对查询语句where子句条件表达式的分析,如何得到表的分布信息。与一般水平划分不同的是,O~2D~2B支持多属性划分,并且在划分中可以有重叠包含关系。因此,O~2D~2B具有更强的适用性。  相似文献   

12.
基于多重加权树的并行数据库查询优化方法   总被引:1,自引:0,他引:1  
李建中 《计算机学报》1998,21(5):401-412
本文提出了一种基于多重加权树的查询优化方法,包括多重加权树并行查询计划模型、并行查询计划的复杂性模型和查询优化处工法。  相似文献   

13.
The traditional approach to evaluate query execution strategies using approximate cost models may be inadequate for particular environments. For instance, if the environment does not satisfy the assumptions made by the cost model, the cost estimates can be so distorted that expensive strategies will be chosen. We propose a new approach for choosing execution strategies based on the actual cost history of query execution under various strategies, rather than on assumption-loaded estimates of these costs. Adaptive selection automatically changes the strategies selected, tracking cost variations caused by changes in the database state and query load. Furthermore, it does not require any assumptions about internal database structures, data characteristics, or distribution of queries. Queries are divided into query classes, where all queries in a class share the same execution strategies. A learning automaton is then used for each class to infer over time which are the current best strategies, based on actual query execution costs. We show the results of running the adaptive selector using real query loads for an existing database.  相似文献   

14.
Cloud computing enables a conventional relational database system's hardware to be adjusted dynamically according to query workload, performance and deadline constraints. One can rent a large amount of resources for a short duration in order to run complex queries efficiently on large-scale data with virtual machine clusters. Complex queries usually contain common subexpressions, either in a single query or among multiple queries that are submitted as a batch. The common subexpressions scan the same relations, compute the same tasks (join, sort, etc.), and/or ship the same data among virtual computers. The total time spent for the queries can be reduced by executing these common tasks only once. In this study, we build and use efficient sets of query execution plans to reduce the total execution time. This is an NP-Hard problem therefore, a set of robust heuristic algorithms, Branch-and-Bound, Genetic, Hill Climbing, and Hybrid Genetic-Hill Climbing, are proposed to find (near-) optimal query execution plans and maximize the benefits. The optimization time of each algorithm for identifying the query execution plans and the quality of these plans are analyzed by extensive experiments.  相似文献   

15.
We investigate techniques for efficiently executing multiquery workloads from data and computation-intensive applications in parallel and/or distributed computing environments. In this context, we describe a database optimization framework that supports data and computation reuse, query scheduling, and active semantic caching to speed up the evaluation of multiquery workloads. Its most striking feature is the ability of optimizing the execution of queries in the presence of application-specific constructs by employing a customizable data and computation reuse model. Furthermore, we discuss how the proposed optimization model is flexible enough to work efficiently irrespective of the parallel/distributed environment underneath. In order to evaluate the proposed optimization techniques, we present experimental evidence using real data analysis applications. For this purpose, a common implementation for the queries under study was provided according to the database optimization framework and deployed on top of three distinct experimental configurations: a shared memory multiprocessor, a cluster of workstations, and a distributed computational Grid-like environment.  相似文献   

16.
Horizontal partitioning is a logical database design technique which facilitates efficient execution of queries by reducing the irrelevant objects accessed. Given a set of most frequently executed queries on a class, the horizontal partitioning generates horizontal class fragments (each of which is a subset of object instances of the class), that meet the queries requirements. There are two types of horizontal class partitioning, namely, primary and derived. Primary horizontal partitioning of a class is performed using predicates of queries accessing the class. Derived horizontal partitioning of a class is the partitioning of a class based on the horizontal partitioning of another class. We present algorithms for both primary and derived horizontal partitioning and discuss some issues in derived horizontal partitioning and present their solutions. There are two important aspects for supporting database operations on a partitioned database, namely, fragment localization for queries and object migration for updates. Fragment localization deals with identifying the horizontal fragments that contribute to the result of the query, and object migration deals with migrating objects from one class fragment to another due to updates. We provide novel solutions to these two problems, and finally we show the utility of horizontal partitioning for query processing.  相似文献   

17.
In a distributed relational database system, the processing of a query involves data transmission among different sites via a computer network. In a distributed database multiple copies of each relation can be allocated to different, physically distributed sites. In this paper we discuss the query preoptimization problem for join-queries. In general, there is a large number of possibilities to use the copies of the data item in a distributed relational database when evaluating a join-query. We consider the problem of a copy preselection for each relation in a join sequence of a join-query. We show how to express the preselection problem for a given query and data allocation to the network in terms of an integer linear programming problem, namely, a minimum cover problem. It can be treated as a heuristic for the first phase of a join-query optimization, and as such as an input to the final stage of optimization, the execution strategy generation for a join-query. In this paper we assumed that a distributed system provides fully transparent data management, i.e., data allocation to the network and data replication which is revealed to a user. We illustrate the proposed mathematical programming problem through a nontrivial example. Recommended by: R. Elamsri  相似文献   

18.
It is desirable to design partitioning methods that minimize the I/O time incurred during query execution in spatial databases. This paper explores optimal partitioning for two-dimensional data for a class of queries and develops multi-disk allocation techniques that maximize the degree of I/O parallelism obtained in each case. We show that hexagonal partitioning has optimal I/O performance for circular queries among all partitioning methods that use convex non-overlapping regions. An analysis and extension of this result to all possible partitioning techniques is also given. For rectangular queries, we show that hexagonal partitioning has overall better I/O performance for a general class of range queries, except for rectilinear queries, in which case rectangular grid partitioning is superior. By using current algorithms for rectangular grid partitioning, parallel storage and retrieval algorithms for hexagonal partitioning can be constructed. Some of these results carry over to circular partitioning of the data—which is an example of a non-convex region.  相似文献   

19.
快速局域网下分布式查询处理数据划分策略的研究   总被引:3,自引:0,他引:3  
在分布式数据库系统中,查询处理的响应时间一直是一个热门话题。根据分布式数据库查询的固有并行性,可以利用数据划分来提高查询的并行处理程度、改进响应时间。文章提出了在快速局域网下、多数据库环境中,分布式查询处理的一种数据划分策略,旨在提高查询的响应时间。并通过模拟实验验证了算法的合理性。  相似文献   

20.
An adaptive probe-based optimization technique is developed and demonstrated in the context of an Internet-based distributed database environment. More and more common are database systems which are distributed across servers communicating via the Internet where a query at a given site might require data from remote sites. Optimizing the response time of such queries is a challenging task due to the unpredictability of server performance and network traffic at the time of data shipment; this may result in the selection of an expensive query plan using a static query optimizer. We constructed an experimental setup consisting of two servers running the same database management system connected via the Internet. Concentrating on join queries, we demonstrate how a static query optimizer might choose an expensive plan by mistake. This is due to the lack of a priori knowledge of the run-time environment, inaccurate statistical assumptions in size estimation, and neglecting the cost of remote method invocation. These shortcomings are addressed collectively by proposing a probing mechanism. An implementation of our run-time optimization technique for join queries was constructed in the Java language and incorporated into an experimental setup. The results demonstrate the superiority of our probe-based optimization over a static optimization. Received 6 February 1999 / Revised 15 February 2000 / Accepted 10 May 2000  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号