期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimizing queries with foreign functions in a distributedenvironment

Tsai P.S.M. Chen A.L.P. 《Knowledge and Data Engineering, IEEE Transactions on》2002,14(4):809-824

Foreign functions have been considered in the advanced database systems to support complex applications. We consider optimizing queries with foreign functions in a distributed environment. In traditional distributed query processing, selection operations are locally processed before joins as much as possible so that the size of relations being transmitted and joined can be reduced. However, if selection predicates involve foreign functions, the cost of evaluating selections cannot be ignored. As a result, the execution order of selections and joins becomes significant, and the trade-off for reducing the costs of data transmission, join processing, and selection predicate evaluation needs to be carefully considered in query optimization. A response time model is developed for estimating the cost of distributed query processing involving foreign functions. We explore the property of the problem and find an optimal algorithm with polynomial complexity for a special case of it. However, finding the optimal execution plan for the general case is NP-hard. We propose an efficient heuristic algorithm for solving the problem and the simulation result shows its good quality. The research result can also be applied to the advanced database systems and the multidatabase systems where the conversion function defined for the need of schema integration can be considered a type of foreign functions 相似文献

2.

Semantic query optimization for query plans of heterogeneousmultidatabase systems

Chun-Nan Hsu Knoblock C.A. 《Knowledge and Data Engineering, IEEE Transactions on》2000,12(6):959-978

New applications of information systems need to integrate a large number of heterogeneous databases over computer networks. Answering a query in these applications usually involves selecting relevant information sources and generating a query plan to combine the data automatically. As significant progress has been made in source selection and plan generation, the critical issue has been shifting to query optimization. This paper presents a semantic query optimization (SQO) approach to optimizing query plans of heterogeneous multidatabase systems. This approach provides global optimization for query plans as well as local optimization for subqueries that retrieve data from individual database sources. An important feature of our local optimization algorithm is that we prove necessary and sufficient conditions to eliminate an unnecessary join in a conjunctive query of arbitrary join topology. This feature allows our optimizer to utilize more expressive relational rules to provide a wider range of possible optimizations than previous work in SQO. The local optimization algorithm also features a new data structure called AND-OR implication graphs to facilitate the search for optimal queries. These features allow the global optimization to effectively use semantic knowledge to reduce the data transmission cost. We have implemented this approach in the PESTO (Plan Enhancement by SemanTic Optimization) query plan optimizer as a part of the SIMS information mediator. Experimental results demonstrate that PESTO can provide significant savings in query execution cost over query plan execution without optimization 相似文献

3.

Optimizing large join queries using a graph-based approach 总被引：4，自引：0，他引：4

Chiang Lee Chi-Sheng Shih Yaw-Huei Chen 《Knowledge and Data Engineering, IEEE Transactions on》2001,13(2):298-315

Although many query tree optimization strategies have been proposed in the literature, there still is a lack of a formal and complete representation of all possible permutations of query operations (i.e., execution plans) in a uniform manner. A graph-theoretic approach presented in the paper provides a sound mathematical basis for representing a query and searching for an execution plan. In this graph model, a node represents an operation and a directed edge between two nodes indicates the older of executing these two operations in an execution plan. Each node is associated with a weight and so is an edge. The weight is an expression containing optimization required parameters, such as relation size, tuple size, join selectivity factors. All possible execution plans are representable in this graph and each spanning tree of the graph becomes an execution plan. It is a general model which can be used in the optimizer of a DBMS for internal query representation. On the basis of this model, we devise an algorithm that finds a near optimal execution plan using only polynomial time. The algorithm is compared with a few other popular optimization methods. Experiments show that the proposed algorithm is superior to the others under most circumstances 相似文献

4.

Optimizing top-k selection queries over multimedia repositories 总被引：2，自引：0，他引：2

Chaudhuri S. Gravano L. Marian A. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(8):992-1009

Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically, request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, which indicates how well the object matches the selection condition (ranking). Furthermore, unlike in the relational model, users may just want the k top-ranked objects for their selection queries for a relatively small k. In addition to the differences in the query model, another peculiarity of multimedia repositories is that they may allow access to the attributes of each object only through indexes. We investigate how to optimize the processing of top-k selection queries over multimedia repositories. The access characteristics of the repositories and the above query model lead to novel issues in query optimization. In particular, the choice of the indexes used to search the repository strongly influences the cost of processing the filtering condition. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we present an efficient algorithm that solves the problem optimally with respect to our cost model and execution space when the predicates in the query are independent. We also show that the problem of optimizing top-k selection queries can be viewed, in many cases, as that of evaluating more traditional selection conditions. Thus, both problems can be viewed together as an extended filtering problem to which techniques of query processing and optimization may be adapted. 相似文献

5.

基于多重加权树的并行数据库查询优化方法 总被引：1，自引：0，他引：1

李建中《计算机学报》1998,21(5):401-412

本文提出了一种基于多重加权树的查询优化方法，包括多重加权树并行查询计划模型、并行查询计划的复杂性模型和查询优化处工法。相似文献

6.

结合哈希过滤的一种改进多连接查询优化算法

王果徐仁佐《计算机工程》2004,30(7):57-59

多连接查询优化是提高数据库性能的关键问题之一。Chiang Lee提出了一种启发式多连接查询优化算法MVP，分析发现该算法并没有考虑减小执行计划的计算代价。该文结合哈希过滤的特点提出一种改进的多连接查询优化算法，与MVP算法相比该算法降低了执行计划的计算代从，从而使查询响应时间更短。相似文献

7.

列存储数据查询中的连接策略选择方法

下载免费PDF全文

李静孙莉王梅《计算机科学与探索》2010,4(9):850-858

列的连接策略优化是列存储数据查询中的重要问题。现有的列存储系统中,列的连接存在策略单一,缺少优化处理,无法满足复杂查询等缺陷。针对这些问题,提出一种连接策略选择方法。该方法首先定义简单规则过滤代价过大的查询计划,生成候选查询计划树。进而根据动态Huffman树原理提出动态优化树算法,对候选查询计划树中的查询执行顺序进行改进。根据列存储数据的特点,候选计划中每个连接节点的执行策略被归纳为两种:串行连接和并行连接。在此基础上构建代价估计模型,集中针对这两种连接策略进行代价估计和策略选择,从而以较小的时间复杂度获得优化的查询执行策略。相似文献

8.

A two-phase framework for quality-aware Web service selection 总被引：1，自引：0，他引：1

Qi Yu Manjeet Rege Athman Bouguettaya Brahim Medjahed Mourad Ouzzani 《Service Oriented Computing and Applications》2010,4(2):63-79

Service-oriented computing is gaining momentum as the next technological tool to leverage the huge investments in Web application development. The expected large number of Web services poses a set of new challenges for efficiently accessing these services. We propose an integrated service query framework that facilitates users in accessing their desired services. The framework incorporates a service query model and a two-phase optimization strategy. The query model defines service communities that are used to organize the large and heterogeneous service space. The service communities allow users to use declarative queries to retrieve their desired services without worrying about the underlying technical details. The two-phase optimization strategy automatically generates feasible service execution plans and selects the plan with the best user-desired quality. In particular, we present an evolutionary algorithm that is able to “co-evolve” multiple feasible execution plans simultaneously and allows them to compete with each other to generate the best plan. We conduct a set of experiments to assess the performance of the proposed algorithms. 相似文献

9.

A graph-theoretic model for optimizing queries involving methods

Chiang Lee Chi-Sheng Shih Yaw-Huei Chen 《The VLDB Journal The International Journal on Very Large Data Bases》2001,9(4):327-343

Traditional algorithms for optimizing the execution order of joins are no more valid when selections and projections involve methods and become very expensive operations. Selections and projections could be even more costly than joins such that they are pulled above joins, rather than pushed down in a query tree. In this paper, we take a fundamental look at how to approach query optimization from a top-down design perspective, rather than trying to force one model to fit into another. We present a graph model which is designed to characterize execution plans. Each edge and each vertex of the graph is assigned a weight to model execution plans. We also design algorithms that use these weights to optimize the execution order of operations. A cost model of these algorithms is developed. Experiments are conducted on the basis of this cost model. The results show that our algorithms are superior to similar work proposed in the literature. Received 20 April 1999 / Accepted 9 August 2000 Published online 20 April 2001 相似文献

10.

Extensible Parallel Query Processing for Exploratory Geoscientific Data Mining 总被引：1，自引：0，他引：1

Eddie C. Shek Richard R. Muntz Edmond Mesrobian 《Data mining and knowledge discovery》2001,5(4):277-304

Exploratory data mining and analysis requires a computing environment which provides facilities for the user-friendly expression and rapid execution of scientific queries. In this paper, we address research issues in the parallelization of scientific queries containing complex user-defined operations. In a parallel query execution environment, parallelizing a query execution plan involves determining how input data streams to evaluators implementing logical operations can be divided to be processed by clones of the same evaluator in parallel. We introduced the concept of relevance window that characterizes data lineage and data partitioning opportunities available for an user-defined evaluator. In addition, we developed a query parallelization framework by extending relational parallel query optimization algorithms to allow the parallelization characteristics of user-defined evaluators to guide the process of query parallelization in an extensible query processing environment. We demonstrated the utility of our system by performing experiments mining cyclonic activity, blocking events, and the upward wave-energy propagation features from several observational and model simulation datasets. 相似文献

11.

Shared-nothing并行数据库系统查询优化技术 总被引：15，自引：0，他引：15

文继荣陈红王珊《计算机学报》2000,23(1):28-38

查询优化是并行数据库系统的核心技术。该文介绍作者自行研制的一个Ｓｈａｒｅｄ－ｎｏｔｈｉｎｇ并行数据库系统ＰＢＡＳＥ／２中独特的两阶段优化策略。为了缩减并行相称优化庞大的搜索空间,ＰＢＡＳＥ／２将并行查询优化划分为顺序优化和并行化两个在阶段。在顺序优化阶段对并行化后的通信代价进行预先估算,将通信开销加入顺序优化的代价模型,同时对动态规划搜索算法进行了修正和扩展,保证了顺序优化阶段得到的最小代价计划在相似文献

12.

混合MapReduce环境下大数据划分的查询优化

李伏朱青《计算机科学与探索》2012,(10):877-887

在MapReduce与数据库的混合架构中,数据划分是影响查询性能的重要因素。对于开销最大的连接和聚集操作,采用混合MapReduce的方式实现,需要大规模数据的跨结点传输,网络传输和I/O开销巨大。为了减少传输的数据量,并提高连接操作的查询效率,提出了划分建议器模型。实现了MapReduce和数据库混合架构上的划分建议器,并计算划分代价,生成最优的数据划分方案,提高了系统效率。为了减少查询时间,依据划分建议器模型,提出了基于代价优先的生成策略和空间搜索算法,减少了划分建议器生成最优方案的时间。通过实验验证了划分建议器的有效性,使系统的整体查询代价最小,显著提高了系统性能。相似文献

13.

基于粒子群算法的数据库查询优化 总被引：1，自引：0，他引：1

林桂亚《计算机应用研究》2012,29(3):947-949

研究粒子群算法在数据库查询优化中的应用问题。为了解决大型数据库信息检索困难、查询效率低的问题,提出了一种基于粒子群算法优化数据库查询技术方案。算法提出了一种数据库查询执行计划代价模型,主要包括了查询多链接次序以及副本的选择问题,准确定义了数据库查询执行代价,采用提出的粒子群算法来优化并求解该执行代价问题,从而使得分组数目更少、数据定位更精确。实例验证结果表明,通过属性表现和违规行为任何教师都可以被准确定位,减少了分组,为数据库查询提供了优化。相似文献

14.

An adaptable distributed query processing architecture

Yongluan Zhou Beng Chin Ooi Kian-Lee Tan Wee Hyong Tok 《Data & Knowledge Engineering》2005,53(3):1-309

Traditionally, distributed query optimization techniques generate static query plans at compile time. However, the optimality of these plans depends on many parameters (such as the selectivities of operations, the transmission speeds and workloads of servers) that are not only difficult to estimate but are also often unpredictable and fluctuant at runtime. As the query processor cannot dynamically adjust the plans at runtime, the system performance is often less than satisfactory. In this paper, we introduce a new highly adaptive distributed query processing architecture. Our architecture can quickly detect fluctuations in selectivities of operations, as well as transmission speeds and workloads of servers, and accordingly change the operation order of a distributed query plan during execution. We have implemented a prototype based on the Telegraph system [Telegragraph project. Available from >]. Our experimental study shows that our mechanism can adapt itself to the changes in the environment and hence approach to an optimal plan during execution. 相似文献

15.

基于Greenplum数据库的查询优化 总被引：1，自引：0，他引：1

邹承明谢义吴佩《计算机应用》2018,38(2):478-482

针对分布式数据库查询效率随着数据规模的增大而降低的问题,以Greenplum分布式数据库为研究对象,从优化查询路径的角度提出一个基于代价的最优查询计划生成方法。首先,该方法设计一种有效的代价模型来估算查询代价;然后,采用并行最大最小蚁群算法来搜索具有最小查询代价的连接顺序,即最优连接顺序;最后,根据Greenplum数据库对查询计划中不同操作的默认最优选择得到最优查询计划。采用该方法在自主生成的数据集与事务处理性能理事会测试基准（TPC-H）的标准数据集上进行了多组实验。实验结果表明,所提出的优化方法能有效地搜索出最优解,获得最优的查询计划,从而提升Greenplum数据库的查询效率。相似文献

16.

Query optimization via contention space partitioning and cost error controlling for dynamic multidatabase systems

Qiang Zhu Jaidev Haridas Wen-Chi Hou 《Distributed and Parallel Databases》2008,23(2):151-188

A multidatabase system (MDBS) integrates information from multiple autonomous local databases. Performing global query optimization to achieve efficient query processing in such a system is challenging due to local autonomy of the data sources. Dynamic factors in the environment make the problem even more difficult. In this paper, we present two techniques, i.e., contention space partitioning and cost error controlling, to perform global query optimization in a dynamic MDBS. Both techniques generate an execution plan with multiple versions for a query in a dynamic MDBS, utilizing the multistate cost models built for the dynamic environment via our previous multistate query sampling method. The first technique partitions the contention space of a dynamic multidatabase environment into a given number of subspaces and chooses a good query execution plan version for each subspace, while the second technique selects a set of execution plan versions by using a given error tolerance to control query execution costs. Experiments demonstrate that the proposed techniques are quite promising for performing global query optimization in a dynamic MDBS. Compared with related work on dynamic query optimization, our approach has an advantage of avoiding the high overhead for modifying or re-generating an execution plan for a query based on dynamic runtime information. Research was supported by the US National Science Foundation under Grant # IIS-9811980 and The University of Michigan. 相似文献

17.

Query-aware location anonymization for road networks

Chi-Yin Chow Mohamed F. Mokbel Jie Bao Xuan Liu 《GeoInformatica》2011,15(3):571-607

Recently, several techniques have been proposed to protect the user location privacy for location-based services in the Euclidean space. Applying these techniques directly to the road network environment would lead to privacy leakage and inefficient query processing. In this paper, we propose a new location anonymization algorithm that is designed specifically for the road network environment. Our algorithm relies on the commonly used concept of spatial cloaking, where a user location is cloaked into a set of connected road segments of a minimum total length L{\cal L} including at least K{\cal K} users. Our algorithm is “query-aware” as it takes into account the query execution cost at a database server and the query quality, i.e., the number of objects returned to users by the database server, during the location anonymization process. In particular, we develop a new cost function that balances between the query execution cost and the query quality. Then, we introduce two versions of our algorithm, namely, pure greedy and randomized greedy, that aim to minimize the developed cost function and satisfy the user specified privacy requirements. To accommodate intervals with a high workload, we introduce a shared execution paradigm that boosts the scalability of our location anonymization algorithm and the database server to support large numbers of queries received in a short time period. Extensive experimental results show that our algorithms are more efficient and scalable than the state-of-the-art technique, in terms of both query execution cost and query quality. The results also show that our algorithms have very strong resilience to two privacy attacks, namely, the replay attack and the center-of-cloaked-area attack. 相似文献

18.

A multi-colony ant algorithm for optimizing join queries in distributed database systems

Ladan Golshanara Seyed Mohammad Taghi Rouhani Rankoohi Hamed Shah-Hosseini 《Knowledge and Information Systems》2014,39(1):175-206

Distributed database systems provide a new data processing and storage technology for decentralized organizations of today. Query optimization, the process to generate an optimal execution plan for the posed query, is more challenging in such systems due to the huge search space of alternative plans incurred by distribution. As finding an optimal execution plan is computationally intractable, using stochastic-based algorithms has drawn the attention of most researchers. In this paper, for the first time, a multi-colony ant algorithm is proposed for optimizing join queries in a distributed environment where relations can be replicated but not fragmented. In the proposed algorithm, four types of ants collaborate to create an execution plan. Hence, there are four ant colonies in each iteration. Each type of ant makes an important decision to find the optimal plan. In order to evaluate the quality of the generated plan, two cost models are used—one based on the total time and the other on the response time. The proposed algorithm is compared with two previous genetic-based algorithms on chain, tree and cyclic queries. The experimental results show that the proposed algorithm saves up to about 80 % of optimization time with no significant difference in the quality of generated plans compared with the best existing genetic-based algorithm. 相似文献

19.

基于适应性的网格数据库中子查询的节点优化调度

胡乃静赵亮胡金华《计算机科学》2007,34(9):95-98

网格数据库是数据库技术和网格技术相结合后新的研究领域,网格的动态变化特性对数据库查询优化技术提出了适应性的要求。本文提出了基于Petri网描述的子查询计划模型TNSN,通过扩展子查询及其节点的数据关联关系的描述,建立了子查询进行适应性优化调度的查询计划模型;进一步提出了考虑变化的参数在内的耗费估算模型,并在TNSN和耗费模型的基础上提出了适应性优化算法,保证了查询处理过程中可以根据网格参数的变化情况对查询进行适应性调整,最后给出了实验验证。相似文献

20.

Optimizing Recursive Information Gathering Plans in EMERAC

Subbarao Kambhampati Eric Lambrecht Ullas Nambiar Zaiqing Nie Gnanaprakasam Senthil 《Journal of Intelligent Information Systems》2004,22(2):119-153

In this paper we describe two optimization techniques that are specially tailored for information gathering. The first is a greedy minimization algorithm that minimizes an information gathering plan by removing redundant and overlapping information sources without loss of completeness. We then discuss a set of heuristics that guide the greedy minimization algorithm so as to remove costlier information sources first. In contrast to previous work, our approach can handle recursive query plans that arise commonly in the presence of constrained sources. Second, we present a method for ordering the access to sources to reduce the execution cost. This problem differs significantly from the traditional database query optimization problem as sources on the Internet have a variety of access limitations and the execution cost in information gathering is affected both by network traffic and by the connection setup costs. Furthermore, because of the autonomous and decentralized nature of the Web, very little cost statistics about the sources may be available. In this paper, we propose a heuristic algorithm for ordering source calls that takes these constraints into account. Specifically, our algorithm takes both access costs and traffic costs into account, and is able to operate with very coarse statistics about sources (i.e., without depending on full source statistics). Finally, we will discuss implementation and empirical evaluation of these methods in Emerac, our prototype information gathering system. 相似文献