首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Semijoin has traditionally been relied upon to reduce the cost of data transmission for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the amount of data transmission required. In view of this fact, we explore the approach of using join operations as reducers in distributed query processing. We first show that the problem of determining a sequence of join operations for a query can be transformed to that of finding a specific type of set of cuts to the corresponding query graph, where a cut to a graph is a partition of nodes in that graph. Then, in light of this concept, we prove that the problem of determining the optimal sequence of join operations for a given query graph is of exponential complexity, thus justifying the necessity of applying heuristic approaches to solve this problem. By mapping the problem of determining a sequence of join reducers into the one of finding a set of cuts, we develop (for tree and general query graphs, respectively) efficient heuristic algorithms to determine a join reducer sequence for distributed query processing. The algorithms developed are based on the concept of divide and conquer and are of polynomial time complexity. Simulation is performed to evaluate these algorithms  相似文献   

2.
Foreign functions have been considered in the advanced database systems to support complex applications. We consider optimizing queries with foreign functions in a distributed environment. In traditional distributed query processing, selection operations are locally processed before joins as much as possible so that the size of relations being transmitted and joined can be reduced. However, if selection predicates involve foreign functions, the cost of evaluating selections cannot be ignored. As a result, the execution order of selections and joins becomes significant, and the trade-off for reducing the costs of data transmission, join processing, and selection predicate evaluation needs to be carefully considered in query optimization. A response time model is developed for estimating the cost of distributed query processing involving foreign functions. We explore the property of the problem and find an optimal algorithm with polynomial complexity for a special case of it. However, finding the optimal execution plan for the general case is NP-hard. We propose an efficient heuristic algorithm for solving the problem and the simulation result shows its good quality. The research result can also be applied to the advanced database systems and the multidatabase systems where the conversion function defined for the need of schema integration can be considered a type of foreign functions  相似文献   

3.
In a distributed spatial database system, a user may issue a query that relates two spatial relations not stored at the same site. Because of the sheer volume and complexity of spatial data, spatial joins between two spatial relations at different sites are expensive in terms of computational and transmission costs. In this paper, we address the problems of processing spatial joins in a distributed environment. We propose a semijoin-like operator, called the spatial semijoin, to prune away objects that do not contribute to the join result. This operator also reduces both the transmission and local processing costs for a later join operation. However, the cost of the elimination process must be taken into account, and we consider approaches to minimize these overheads. We also study and compare two families of distributed join algorithms that are based on the spatial semijoin operator. The first is based on multi-dimensional approximations obtained from an index such as the R-tree, and the second is based on single-dimensional approximations obtained from object mapping. We have conducted experiments on real data sets and report the results in this paper  相似文献   

4.
列的连接策略优化是列存储数据查询中的重要问题。现有的列存储系统中,列的连接存在策略单一,缺少优化处理,无法满足复杂查询等缺陷。针对这些问题,提出一种连接策略选择方法。该方法首先定义简单规则过滤代价过大的查询计划,生成候选查询计划树。进而根据动态Huffman树原理提出动态优化树算法,对候选查询计划树中的查询执行顺序进行改进。根据列存储数据的特点,候选计划中每个连接节点的执行策略被归纳为两种:串行连接和并行连接。在此基础上构建代价估计模型,集中针对这两种连接策略进行代价估计和策略选择,从而以较小的时间复杂度获得优化的查询执行策略。  相似文献   

5.
Query rewriting for SWIFT (First) answers   总被引:2,自引:0,他引:2  
Traditionally, the answer to a database query is construed as the set of all tuples that meet the criteria stated. Strict adherence to this notion in query evaluation is, however, increasingly unsatisfactory because decision makers are more prone to adopting an exploratory strategy for information search which we call “getting some answers quickly, and perhaps more later”. From a decision-maker's perspective, such a strategy is optimal for coping with information overload and makes economic sense (when used in conjunction with a micropayment mechanism). These new requirements present new opportunities for database query optimization. In this paper, we propose a progressive query processing strategy that exploits this behavior to conserve system resources and to minimize query response time and user waiting time. This is accomplished by the heuristic decomposition of user queries into subqueries that can be evaluated on demand. To illustrate the practicality of the proposed methods, we describe the architecture of a prototype system that provides a nonintrusive implementation of our approach. Finally, we present experimental results obtained from an empirical study conducted using an Oracle server that demonstrate the benefits of the progressive query processing strategy  相似文献   

6.
利用区间约束优化包含多个用户函数的查询   总被引:1,自引:0,他引:1  
杨波  洪晓光  王海洋 《软件学报》2001,12(9):1393-1398
如何高效地处理说明性查询语言中嵌入的用户自定义函数,是查询优化的一个重要内容.以往的研究成果不能处理一条语句中的多个用户函数,并且难以实现.提出了分3个阶段进行优化的方案,能够对用户定义的多个函数进行处理.首先,把用户定义的函数以区间约束的形式等价地表述出来;然后,通过对区间约束进行分层筛选,去掉冗余;最后,选择最佳的执行策略.该方案易于实现,效率较高,特别是在用户定义的函数本身隐含多个表的连接条件时,更能取得明显的优化效果.  相似文献   

7.
在移动计算环境下,基于准确的操作代价估算结果来选择合适的连接查询处理模式,可以减少数据的传输量和移动设备的能量消耗。探讨了该环境下移动设备能量消耗的一个新的非对称特征,提出了一种操作代价估算方法,并从数据传输量和能量消耗两个方面对连接查询处理模式进行了代价估算和性能比较,提出了4个实用准则,以指导连接查询处理模式的选择。试验结果充分论证了估算方法和准则的正确性,且比现有同类估算模型和结论具有更加广泛的应用范围。  相似文献   

8.
Distributed query processing algorithms usually perform data reduction by using a semijoin program, but the problem with these approaches is that they still require an explicit join of the reduced relations in the final phase. We introduce an efficient algorithm for join processing in distributed database systems that makes use of bipartite graphs in order to reduce data communication costs and local processing costs. The bipartite graphs represent the tuples that can be joined in two relations taking also into account the reduction state of the relations. This algorithm fully reduces the relations at each site. We then present an adaptive algorithm for response time optimization that takes into account the system configuration, i.e., the additional resources available and the data characteristics, in order to select the best strategy for response time minimization. We also report on the results of a set of experiments which show that our algorithms outperform a number of the recently proposed methods for total processing time and response time minimization.  相似文献   

9.
针对云环境下空间数据连接查询处理问题,提出了一种基于Spark的多路空间连接查询处理算法BSMWSJ.该算法采用网格划分方法将整个数据空间划分成大小相同的网格单元,并将各类数据集中的空间对象,根据其空间位置划分到相应的网格单元中,不同网格单元中的空间数据对象进行并行连接查询处理.在多路空间连接查询处理过程中,采用边界过滤的方法来过滤无用数据,即通过计算前面连接操作候选结果的MBR来过滤后续连接数据集,从而过滤掉无用的连接对象,减少连接对象的多余投影与复制,并采用重复避免策略来减少重复结果的输出,从而进一步减少后续连接计算的代价.合成数据集和真实数据集上的大量实验结果表明:提出的多路空间连接查询处理算法在性能上明显优于现有的多路连接查询处理算法.  相似文献   

10.
数据库数据量日益增多,造成了用户在使用数据库系统查询时费时费力,传统的查询优化方式已无法满足如今的数据查询要求,提高数据库系统优化的效率也成为计算机研究工作的热点。提出基于半连接算法的分布式查询处理技术对数据库系统进行查询优化,提出半连接操作的查询优化算法(SDD-1),并采用实验分析的方法进行验证,计算查询算法的代价。结果表明,基于半连接的研究策略的分布式数据库查询优化可以显著降低传输代价,使查询总效率得到有效提高。  相似文献   

11.
空间连接查询是最耗时,最重要的空间查询、空间多路连接是涉及多个空间关系的连接查询,顺序空间连接查询的效率还是不能令人满意,研究利用并行机制提高空间连接查询效率成为有吸引力的方向,并行空间连接处理由三个阶段组成;任务创建,任务分配和任务并行执行,本文提出一种新的平面扫描方法用于多路并行处理的任务创建过程,随机提出基于花费估计的动态任务分配策略,给出了花费模型,并将其推到处理多路并行连接查询处理以实现负荷平衡。  相似文献   

12.
On-line analytical processing (OLAP) refers to the technologies that allow users to efficiently retrieve data from the data warehouse for decision-support purposes. Data warehouses tend to be extremely large, it is quite possible for a data warehouse to be hundreds of gigabytes to terabytes in size (Chauduri and Dayal, 1997). Queries tend to be complex and ad hoc, often requiring computationally expensive operations such as joins and aggregation. Given this, we are interested in developing strategies for improving query processing in data warehouses by exploring the applicability of parallel processing techniques. In particular, we exploit the natural partitionability of a star schema and render it even more efficient by applying DataIndexes-a storage structure that serves both as an index as well as data and lends itself naturally to vertical partitioning of the data. DataIndexes are derived from the various special purpose access mechanisms currently supported in commercial OLAP products. Specifically, we propose a declustering strategy which incorporates both task and data partitioning and present the Parallel Star Join (PSJ) Algorithm, which provides a means to perform a star join in parallel using efficient operations involving only rowsets and projection columns. We compare the performance of the PSJ Algorithm with two parallel query processing strategies. The first is a parallel join strategy utilizing the Bitmap Join Index (BJI), arguably the state-of-the-art OLAP join structure in use today. For the second strategy we choose a well-known parallel join algorithm, namely the pipelined hash algorithm. To assist in the performance comparison, we first develop a cost model of the disk access and transmission costs for all three approaches.  相似文献   

13.
Heterogeneities exist in a multidatabase environment. For example, a real world entity may be differently represented in relations of different databases. In particular, keys of these relations may be incompatible. In this paper, we consider processing entity join queries when data transmission cost dominates. An entity join operation ‘integrates’ tuples representing the same entities from different relations in which inconsistent data may exist. A natural way to process the entity join is to transmit both relations to a site, resolve the possible conflicts between corresponding attributes and process the join, which is very costly. In this paper, an approach is proposed to correctly transform a global query into local subqueries to preprocess entity join queries in multiple sites with an attempt to lower the cost of data transmission. Besides, an extension of the traditional semijoin, named extended semijoin, is proposed to further reduce the cost of data transmission for entity join query processing.  相似文献   

14.
The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join for processing. The execution of a query is usually denoted by a query execution tree. To improve the execution of pipelined hash joins, an innovative approach to query execution tree selection is proposed to exploit segmented right-deep trees, which are bushy trees of right-deep subtrees. We first derive an analytical model for the execution of a pipeline segment, and then, in the light of the model, we develop heuristic schemes to determine the query execution plan based on a segmented right-deep tree so that the query can be efficiently executed. As shown by our simulation, the proposed approach, without incurring additional overhead on plan execution, possesses more flexibility in query plan generation, and can lead to query plans of better performance than those achievable by the previous schemes using right-deep trees  相似文献   

15.
Privacy is a major concern when users query public online data services. The privacy of millions of people has been jeopardized in numerous user data leakage incidents in many popular online applications. To address the critical problem of personal data leakage through queries, we enable private querying on public data services so that the contents of user queries and any user data are hidden and therefore not revealed to the online service providers. We propose two protocols for private processing of database queries, namely BHE and HHE. The two protocols provide strong query privacy by using Paillier’s homomorphic encryption, and support common database queries such as range and join queries by relying on the bucketization of public data. In contrast to traditional Private Information Retrieval proposals, BHE and HHE only incur one round of client server communication for processing a single query. BHE is a basic private query processing protocol that provides complete query privacy but still incurs expensive computation and communication costs. Built upon BHE, HHE is a hybrid protocol that applies ciphertext computation and communication on a subset of the data, such that this subset not only covers the actual requested data but also resembles some frequent query patterns of common users, thus achieving practical query performance while ensuring adequate privacy levels. By using frequent query patterns and data specific privacy protection, HHE is not vulnerable to the traditional attacks on k-Anonymity that exploit data similarity and skewness. Moreover, HHE consistently protects user query privacy for a sequence of queries in a single query session.  相似文献   

16.
面向用户需求的非结构化P2P资源定位泛洪策略   总被引:1,自引:0,他引:1  
何明  张玉洁  孟祥武 《软件学报》2015,26(3):640-662
在非结构化P2P网络中,如何对用户所需资源进行快速、准确定位是当前研究的热点问题,也是P2P应用领域面临的核心问题之一.相关的非结构化P2P资源定位算法在查准率、查全率和查询成本上难以同时被优化,这会造成严重的网络带宽负担以及巨大的索引维护开销.为此,提出一种面向用户需求的非结构化P2P资源定位策略(user requirements resource location strategy,简称U2RLS).该策略的创新点是:在原有非结构化P2P网络资源定位泛洪算法的基础上,融入用户需求、用户偏好、用户兴趣度等因素,首先进行用户资源子网划分;采用带有用户需求信息的泛洪和查询索引机制,对用户所需资源进行精确定位.该策略有效避免了因海量信息引起的网络风暴、信息重叠和资源搜索偏覆盖等问题,从而解决了查询节点盲目使用中继节点的现象.实验结果表明:面向用户需求的非结构化P2P资源定位策略U2RLS以其高搜索成功率、有限网络资源消耗和短查询时间响应等优势,能够显著地提高用户资源定位效率.  相似文献   

17.
Relaxation and approximation techniques have been proposed as approaches for improving the quality of query results, in terms of completeness and accuracy, in environments where the user may not be able to specify the query in a complete and exact way, since data are quite heterogeneous or she may not know all the characteristics of data at hand. This problem, mainly addressed for relational and XML data, is nowadays quite relevant also for geo-spatial data, due to their increasing usage in highly critical decisional processes. Among geo-spatial queries, those based on spatial and more precisely topological relations are currently used in an increasing number of applications. As far as we know, no approach has been proposed so far for relaxing queries based on topological predicates when they return an empty or insufficient answer, in order to improve result quality and user satisfaction. In this paper, we consider this problem and we present a general relaxation strategy for, possibly multi-domain, topological selection and join queries. Two specific semantics are also provided: the first applies the minimum amount of relaxation in order to get an acceptable answer; the second relaxes the given query of a certain fixed amount, depending on the considered topological predicate. Index-based processing algorithms, for efficiently executing relaxed queries based on the proposed semantics, are also presented and a specific topological similarity function, to be used for relaxation purposes, is proposed. Experimental results show that the overhead given by query relaxation is acceptable.  相似文献   

18.
针对传统Top-k连接查询算法在处理海量数据时的时效问题,提出一种基于MapReduce框架的负载均衡的并行Top-k连接查询算法(P-TKJ)。使用直方图形式来存储数据,有助于提高CPU的利用率。同时融入了提前终止策略和磁盘数据的选择性访问,以便提高对HDFS数据访问的性能。另外,提出了一种基于最长处理时间优先(LPT)算法的负载均衡策略来均衡Reduce任务,以此设计出高效的并行Top-k连接算法。一个集群实验结果表明,该方法能够有效缩短算法的执行时间。  相似文献   

19.
In a distributed relational database system, the processing of a query involves data transmission among different sites via a computer network. In a distributed database multiple copies of each relation can be allocated to different, physically distributed sites. In this paper we discuss the query preoptimization problem for join-queries. In general, there is a large number of possibilities to use the copies of the data item in a distributed relational database when evaluating a join-query. We consider the problem of a copy preselection for each relation in a join sequence of a join-query. We show how to express the preselection problem for a given query and data allocation to the network in terms of an integer linear programming problem, namely, a minimum cover problem. It can be treated as a heuristic for the first phase of a join-query optimization, and as such as an input to the final stage of optimization, the execution strategy generation for a join-query. In this paper we assumed that a distributed system provides fully transparent data management, i.e., data allocation to the network and data replication which is revealed to a user. We illustrate the proposed mathematical programming problem through a nontrivial example. Recommended by: R. Elamsri  相似文献   

20.
基于Greenplum数据库的查询优化   总被引:1,自引:0,他引:1  
邹承明  谢义  吴佩 《计算机应用》2018,38(2):478-482
针对分布式数据库查询效率随着数据规模的增大而降低的问题,以Greenplum分布式数据库为研究对象,从优化查询路径的角度提出一个基于代价的最优查询计划生成方法。首先,该方法设计一种有效的代价模型来估算查询代价;然后,采用并行最大最小蚁群算法来搜索具有最小查询代价的连接顺序,即最优连接顺序;最后,根据Greenplum数据库对查询计划中不同操作的默认最优选择得到最优查询计划。采用该方法在自主生成的数据集与事务处理性能理事会测试基准(TPC-H)的标准数据集上进行了多组实验。实验结果表明,所提出的优化方法能有效地搜索出最优解,获得最优的查询计划,从而提升Greenplum数据库的查询效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号