期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adaptive Algorithms for Join Processing in Distributed Database Systems

Peter Scheuermann Eugene Inseok Chong 《Distributed and Parallel Databases》1997,5(3):233-269

Distributed query processing algorithms usually perform data reduction by using a semijoin program, but the problem with these approaches is that they still require an explicit join of the reduced relations in the final phase. We introduce an efficient algorithm for join processing in distributed database systems that makes use of bipartite graphs in order to reduce data communication costs and local processing costs. The bipartite graphs represent the tuples that can be joined in two relations taking also into account the reduction state of the relations. This algorithm fully reduces the relations at each site. We then present an adaptive algorithm for response time optimization that takes into account the system configuration, i.e., the additional resources available and the data characteristics, in order to select the best strategy for response time minimization. We also report on the results of a set of experiments which show that our algorithms outperform a number of the recently proposed methods for total processing time and response time minimization. 相似文献

2.

一种基于层次栈的XML数据小枝查询算法研究

孙丹凤涂利明《计算机时代》2011,(6):34-36

当前针对小枝模式的XML查询是XML文档查询的研究热点。文章在分析XML数据小枝查询处理常用算法的基础上,提出了一种高灵活性的、易确定结点对之间结构关系的EDiezt-P编码,并基于EDiezt-P编码和层次栈结构提出了一种自底向上的小枝查询算法。实验表明,该算法在一定程度上减少了查询处理时间,提高了查询效率。相似文献

3.

基于MapReduce的Skyline查询处理算法

崔文相肖迎元郝刚王洪亚邓华锋《计算机科学》2016,43(6):35-38, 64

Skyline查询是一个典型的多目标优化查询,在多目标优化、数据挖掘等领域有着广泛的应用。现有的Skyline查询处理算法大都假定数据集存放在单一数据库服务器中,查询处理算法通常也被设计成针对单一服务器的串行算法。随着数据量的急剧增长,特别是在大数据背景下,传统的基于单机的串行Skyline算法已经远远不能满足用户的需求。基于流行的分布式并行编程框架MapReduce,研究了适用于大数据集的并行Skyline查询算法。针对影响MapReduce计算的因素,对现有基于角度的划分策略进行了改进,提出了Balanced Angular划分策略;同时,为了减少Reduce过程的计算量,提出了在Map端预先进行数据过滤的策略。实验结果显示所提出的Skyline查询算法能显著提升系统性能。相似文献

4.

Efficient Processing of Distributed Twig Queries Based on Node Distribution

下载免费PDF全文

Xin Bi Xiang-Guo Zhao Guo-Ren Wang 《计算机科学技术学报》2017,32(1):78-92

Massive XML data are increasingly generated for the representation, storage and exchange of web information. Twig query processing over massive XML data has become a research focus. However, most traditional algorithms cannot be directly implemented in a distributed manner. Some of the existing distributed algorithms generate a lot of useless intermediate results and execute many join operations of partial results in most cases; others require the priori knowledge of query pattern before XML partition, storage and query processing, which is impractical in the cases of large-scale data or frequent incoming new queries. To improve efficiency and scalability, in this paper, we propose a 3-phase distributed algorithm DisT3 based on node distribution mechanism to avoid unnecessary intermediate results. Furthermore, we propose a lightweight local index ReP with an enhanced XML partitioning approach using arbitrary partitioning strategy, and based on ReP we propose an improved 2-phase distributed algorithm DisT2ReP to further reduce the communication cost. After the performance guarantees are analyzed, extensive experiments are conducted to verify the efficiency and scalability of our proposed algorithms in distributed twig query applications. 相似文献

5.

一种改进的分布式数据库查询优化算法

陈钟叶雪梅青宪刘红《计算机应用》2008,28(Z2)

分析和研究了传统的分布式数据库连接查询优化算法,利用数据划分和并行处理执行策略,提出了多连接属性划分的查询优化算法.实验证明,此算法可以提高查询的响应速度,减少查询的响应时间,在处理分布式数据 '库中海量信息查询和复杂查询方面具有实用价值. 相似文献

6.

Evolutionary Algorithms for Allocating Data in Distributed Database Systems 总被引：2，自引：0，他引：2

Ishfaq Ahmad Kamalakar Karlapalem Yu-Kwong Kwok Siu-Kai So 《Distributed and Parallel Databases》2002,11(1):5-32

A major cost in executing queries in a distributed database system is the data transfer cost incurred in transferring relations (fragments) accessed by a query from different sites to the site where the query is initiated. The objective of a data allocation algorithm is to determine an assignment of fragments at different sites so as to minimize the total data transfer cost incurred in executing a set of queries. This is equivalent to minimizing the average query execution time, which is of primary importance in a wide class of distributed conventional as well as multimedia database systems. The data allocation problem, however, is NP-complete, and thus requires fast heuristics to generate efficient solutions. Furthermore, the optimal allocation of database objects highly depends on the query execution strategy employed by a distributed database system, and the given query execution strategy usually assumes an allocation of the fragments. We develop a site-independent fragment dependency graph representation to model the dependencies among the fragments accessed by a query, and use it to formulate and tackle data allocation problems for distributed database systems based on query-site and move-small query execution strategies. We have designed and evaluated evolutionary algorithms for data allocation for distributed database systems. 相似文献

7.

Improving network systems performance by clustering distributed database sites

Ismail Hababeh 《The Journal of supercomputing》2012,59(1):249-267

Clustering network sites is a vital issue in parallel and distributed database systems DDBS. Grouping distributed database network sites into clusters is considered an efficient way to minimize the communication time required for query processing. However, clustering network sites is still an open research problem since its optimal solution is NP-complete. The main contribution in this field is to find a near optimal solution that groups distributed database network sites into disjoint clusters in order to minimize the communication time required for data allocation. Grouping a large number of network sites into a small number of clusters effectively increases the transaction response time, results in better data distribution, and improves the distributed database system performance. We present a novel algorithm for clustering distributed database network sites based on the communication time as database query processing is time dependent. Extensive experimental tests and simulations are conducted on this clustering algorithm. The experimental and simulation results show that a better network distribution is achieved with significant network servers load balance and network delay, a minor communication time between network sites is realized, and a higher distributed database system performance is recognized. 相似文献

8.

Preprocessing predicates and queries

Xian-He Sun Nabil Kamel 《Information Systems》1992,17(6):465-475

Fragmentation has been used to distribute the contents of a database across the sites of a distributed database system. During run time, the system must determine which fragments can be used to answer each query. This process requires solving the predicate implication problem. In order to speed processing, it is desirable to do as much preprocessing as possible on the prestored fragments, without knowledge of the run-time query. In this paper, performing preprocessing on database fragments to speed later run-time implication checking is investigated. The investigation is based on a new concept, separation among predicates. When two predicates are properly separated, their union cannot be implied by any other conjunctive predicate unless one of them is implied by the conjunctive predicate. A polynomial time algorithm for checking the pair-wise separation among a collection of fragment predicates is introduced and its complexity is theoretically analyzed. The separation checking algorithm is accompanied by a query processing algorithm which makes use of the result of the separation properties of the fragments to speed real time query processing. The two algorithms presented are scalable according to available preprocessing time in the sense that the preprocessing algorithm can be run for shorter periods to produce partial preprocessing that can still be used by the query processing algorithm. 相似文献

9.

An evaluation of relational join algorithms in a pipelined queryprocessing environment

Mikkilineni K.P. Su S.Y.W. 《IEEE transactions on pattern analysis and machine intelligence》1988,14(6):838-848

A query processing strategy which is based on pipelining and data-flow techniques is presented. Timing equations are developed for calculating the performance of four join algorithms: nested block, hash, sort-merge, and pipelined sort-merge. They are used to execute the join operation in a query in distributed fashion and in pipelined fashion. Based on these equations and similar sets of equations developed for other relational algebraic operations, the performance of query execution was evaluated using the different join algorithms. The effects of varying the values of processing time, I/O time, communication time, buffer size, and join selectively on the performance of the pipelined join algorithms are investigated. The results are compared to the results obtained by employing the same algorithms for executing queries using the distributed processing approach which does not exploit the vertical concurrency of the pipelining approach. These results establish the benefits of pipelining 相似文献

10.

An intelligent query processing for distributed ontologies

Jihyun Lee Author Vitae Jun-Ki Min^{Author Vitae} 《Journal of Systems and Software》2010,83(1):85-95

In this paper, we propose an intelligent distributed query processing method considering the characteristics of a distributed ontology environment. We suggest more general models of the distributed ontology query and the semantic mapping among distributed ontologies compared with the previous works. Our approach rewrites a distributed ontology query into multiple distributed ontology queries using the semantic mapping, and we can obtain the integrated answer through the execution of these queries. Furthermore, we propose a distributed ontology query processing algorithm with several query optimization techniques: pruning rules to remove unnecessary queries, a cost model considering site load balancing and caching, and a heuristic strategy for scheduling plans to be executed at a local site. Finally, experimental results show that our optimization techniques are effective to reduce the response time. 相似文献

11.

Multidatabase Query Optimization

Cem Evrendilek Asuman Dogac Sena Nural Fatma Ozcan 《Distributed and Parallel Databases》1997,5(1):77-114

A multidatabase system (MDBS) allows the users to simultaneously access heterogeneous,and autonomous databases using an integrated schema and a single global query language. The query optimization problem in MDBSs is quite different from the query optimization problem in distributed homogeneous databases due to schema heterogeneity and autonomy of local database systems. In this work, we consider the optimization of query distribution in case of data replication and the optimization of intersite joins, that is, the join of the results returned by the local sitesin response to the global subqueries. The algorithms presented for the optimization of intersite joins try to maximize the parallelism in execution and take the federated nature of the problem into account. It has also been shown through a comparativeperformance study that the proposed intersite join optimization algorithms are efficient.The approach presented can easily be generalized to any operation required for intersite query processing. The query optimization scheme presentedin this paper is being implemented within the scopeof a multidatabase system which is based on OMG‘sobject management architecture. 相似文献

12.

分布式数据库查询优化策略研究

聂林娣《数字社区&智能家居》2006,(17)

分布式数据库系统由于数据的分布和冗余使得分布式查询处理增加了许多新的内容和复杂性,因此分布式查询处理的优化显得尤为重要。本文简要介绍分布式查询优化的目标、策略,并针对分布式数据库系统的查询优化,讲述三个典型的算法:INGRES算法、SystemR*算法、SDD-1算法,并进行对比、优化、总结,最后对SDD-1算法进行改进。相似文献

13.

快速局域网下分布式查询处理数据划分策略的研究 总被引：3，自引：0，他引：3

赵葆华王于同《计算机工程与应用》2000,36(4):133-136,139

在分布式数据库系统中,查询处理的响应时间一直是一个热门话题。根据分布式数据库查询的固有并行性,可以利用数据划分来提高查询的并行处理程度、改进响应时间。文章提出了在快速局域网下、多数据库环境中,分布式查询处理的一种数据划分策略,旨在提高查询的响应时间。并通过模拟实验验证了算法的合理性。相似文献

14.

一种基于多连接属性划分的查询优化算法

褚龙现申远《计算机与现代化》2012,(5):10-13

查询操作是数据库中最常用的操作,由于分布式数据库的数据分布性和冗余性,使得查询优化处理成为分布式数据库研究的核心问题之一。为了提高分布式数据库查询效率,分析讨论了基于直接连接的常见执行策略和查询优化算法,同时针对分布式数据库应用中多表连接时存在多连接属性,提出一种改进的直接连接查询优化策略。改进后的算法提高了查询执行的并行性,缩短了查询处理时间,提高了查询效率。相似文献

15.

不确定数据流上的并行反Skyline查询

张建荣毛宇光《计算机与现代化》2015,(1):46

作为Skyline查询的一种重要变体,不确定数据流上的反Skyline查询已经成为研究的热点。已有的单机算法无法应对诸如高速数据流、高数据维度、大滑动窗口等情况,相应提出并行查询处理算法PRSUDS。算法采用基于角度划分的分发策略将处理任务分发至各并行节点,给出该分发策略的正确性证明,进而设计、实现算法的并行处理框架。实验结果表明PRSUDS算法较单机算法具有更好的综合性能,更能满足数据流查询的实时性要求。 相似文献

16.

Design of Distributed Databases on Local Computer Systems with a Multiaccess Network

《IEEE transactions on pattern analysis and machine intelligence》1985,(7):606-619

Concurrency control, distribution design, and query processing are some of the important issues in the design of distributed databases. In this paper, we have studied these issues with respect to a relational database on a local computer system connected by a multiaccess broadcast bus. A broadcast bus allows information to be distributed efficiently, and hence simplifies the solutions to some of these issues. A transaction model that integrates the control strategies in concurrency control and query processing is proposed. In concurrency control, the lock, unlock, and update of data are achieved by a few broadcasts. A dynamic strategy is used in query processing, as less data are transferred when compared to a static strategy. The status information needed in dynamic query processing can be conveniently obtained by broadcasting. Lastly, some NP-hard file placement problems are found to be solvable in polynomial time when updates are broadcast. 相似文献

17.

A graph theoretical approach to determine a join reducer sequencein distributed query processing

Ming-Syan Chen Yu P.S. 《Knowledge and Data Engineering, IEEE Transactions on》1994,6(1):152-165

Semijoin has traditionally been relied upon to reduce the cost of data transmission for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the amount of data transmission required. In view of this fact, we explore the approach of using join operations as reducers in distributed query processing. We first show that the problem of determining a sequence of join operations for a query can be transformed to that of finding a specific type of set of cuts to the corresponding query graph, where a cut to a graph is a partition of nodes in that graph. Then, in light of this concept, we prove that the problem of determining the optimal sequence of join operations for a given query graph is of exponential complexity, thus justifying the necessity of applying heuristic approaches to solve this problem. By mapping the problem of determining a sequence of join reducers into the one of finding a set of cuts, we develop (for tree and general query graphs, respectively) efficient heuristic algorithms to determine a join reducer sequence for distributed query processing. The algorithms developed are based on the concept of divide and conquer and are of polynomial time complexity. Simulation is performed to evaluate these algorithms 相似文献

18.

Optimising the distributed execution of join queries in polynomial time

《Computers & Mathematics with Applications》1999,37(3):105-126

It is proposed that an optimal strategy for executing a join query in a distributed database system may be computed in a time which is bounded by a polynomial function of the number of relations and the size parameters of the network. The solution so unveiled considers both the transmission costs and the processing costs incurred in delivering the required result to the user that issued the query.The query specifies that several relational tables are to be coalesced and presented to the appropriate user. Undertaking this task demands the utilisation of limited system resources, so that a strategy for fulfilling the request that imposes minimal cost to the system should be devised. Both the processor sites, and the communications links that interconnect them, are utilised; an optimal strategy is one that minimises a weighted sum of processing and data transmission costs.An integer linear programming model of this problem was originally proposed in [1]; however, no suggestion was given as to how this model might be efficiently solved. By extending the earlier analysis, the recursive nature of the join computation is revealed. Further investigations then produce a modified relationship amenable to algorithmic solution; the resultant procedure has polynomial time and space requirements. 相似文献

19.

A new vertical fragmentation algorithm based on ant collective behavior in distributed database systems

Mehdi Goli Seyed Mohammad Taghi Rouhani Rankoohi 《Knowledge and Information Systems》2012,30(2):435-455

Considering the existing massive volumes of data processed nowadays and the distributed nature of many organizations, there is no doubt how vital the need is for distributed database systems. In such systems, the response time to a transaction or a query is highly affected by the distribution design of the database system, particularly its methods for fragmentation, replication, and allocation data. According to the relevant literature, from the two approaches to fragmentation, namely horizontal and vertical fragmentation, the latter requires the use of heuristic methods due to it being NP-Hard. Currently, there are a number of different methods of providing vertical fragmentation, which normally introduce a relatively high computational complexity or do not yield optimal results, particularly for large-scale problems. In this paper, because of their distributed and scalable nature, we apply swarm intelligence algorithms to present an algorithm for finding a solution to vertical fragmentation problem, which is optimal in most cases. In our proposed algorithm, the relations are tried to be fragmented in such a way so as not only to make transaction processing at each site as much localized as possible, but also to reduce the costs of operations. Moreover, we report on the experimental results of comparing our algorithm with several other similar algorithms to show that ours outperforms the other algorithms and is able to generate a better solution in terms of the optimality of results and computational complexity. 相似文献

20.

知识图谱划分算法研究综述 总被引：6，自引：0，他引：6

王鑫陈蔚雪杨雅君张小旺冯志勇《计算机学报》2021,44(1):235-260

知识图谱是人工智能的重要基石,因其包含丰富的图结构和属性信息而受到广泛关注.知识图谱可以精确语义描述现实世界中的各种实体及其联系,其中顶点表示实体,边表示实体间的联系.知识图谱划分是大规模知识图谱分布式处理的首要工作,对知识图谱分布式存储、查询、推理和挖掘起基础支撑作用.随着知识图谱数据规模及分布式处理需求的不断增长,... 相似文献