首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of duplicate tuples is required. One of the major problems that arises as part of advanced features of query optimization, data integration, query reformulation and many other research topics is testing for containment of such queries. In this work, we investigate the complexity of query containment problem for CQs under bag semantics (i.e. duplicate tuples are allowed in both the database and the results of queries) and under bag-set semantics (i.e. duplicates are allowed in the result of the queries but not in the database). We derive complexity results for these problems for five major subclasses of CQs; and we also find necessary conditions for CQ query containment. The general case of these problems remains open.  相似文献   

2.
Semijoin has traditionally been relied upon to reduce the cost of data transmission for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the amount of data transmission required. In view of this fact, we explore the approach of using join operations as reducers in distributed query processing. We first show that the problem of determining a sequence of join operations for a query can be transformed to that of finding a specific type of set of cuts to the corresponding query graph, where a cut to a graph is a partition of nodes in that graph. Then, in light of this concept, we prove that the problem of determining the optimal sequence of join operations for a given query graph is of exponential complexity, thus justifying the necessity of applying heuristic approaches to solve this problem. By mapping the problem of determining a sequence of join reducers into the one of finding a set of cuts, we develop (for tree and general query graphs, respectively) efficient heuristic algorithms to determine a join reducer sequence for distributed query processing. The algorithms developed are based on the concept of divide and conquer and are of polynomial time complexity. Simulation is performed to evaluate these algorithms  相似文献   

3.
Foreign functions have been considered in the advanced database systems to support complex applications. We consider optimizing queries with foreign functions in a distributed environment. In traditional distributed query processing, selection operations are locally processed before joins as much as possible so that the size of relations being transmitted and joined can be reduced. However, if selection predicates involve foreign functions, the cost of evaluating selections cannot be ignored. As a result, the execution order of selections and joins becomes significant, and the trade-off for reducing the costs of data transmission, join processing, and selection predicate evaluation needs to be carefully considered in query optimization. A response time model is developed for estimating the cost of distributed query processing involving foreign functions. We explore the property of the problem and find an optimal algorithm with polynomial complexity for a special case of it. However, finding the optimal execution plan for the general case is NP-hard. We propose an efficient heuristic algorithm for solving the problem and the simulation result shows its good quality. The research result can also be applied to the advanced database systems and the multidatabase systems where the conversion function defined for the need of schema integration can be considered a type of foreign functions  相似文献   

4.
Optimizing large join queries that consist of many joins has been recognized as NP-hard. Most of the previous work focuses on a uniprocessor environment. In a multiprocessor, the location of each join adds another dimension to the complexity of the problem. In this paper, we examine the feasibility of exploiting the inherent parallelism in optimizing large join queries on a hypercube multiprocessor. This includes using the multiprocessor not only to answer the large join query but also to optimize it. We propose an algorithm to estimate the cost of a parallel large join plan. Three heuristics are provided for generating an initial solution, which is further optimized by an iterative local-improvement method. The entire process of parallel query optimization and execution is simulated on an Intel iPSC/2 hypercube machine. Our experimental results show that the performance of each heuristic depends on the characteristics of the query  相似文献   

5.
This paper investigates nondeterministic bounded query classes in relation to the complexity of NP-hard approximation problems and the Boolean Hierarchy. Nondeterministic bounded query classes turn out be rather suitable for describing the complexity of NP-hard approximation problems. The results in this paper take advantage of this machine-based model to prove that in many cases, NP-approximation problems have the upward collapse property. That is, a reduction between NP-approximation problems of apparently different complexity at a lower level results in a similar reduction at a higher level. For example, if M C reduces to (log n)-approximating M C using many–one reductions, then the Traveling Salesman Problem (TSP) is equivalent to M C under many–one reductions. Several upward collapse theorems are presented in this paper. The proofs of these theorems rely heavily on the machinery provided by the nondeterministic bounded query classes. In fact, these results depend on a surprising connection between the Boolean hierarchy and nondeterministic bounded query classes.  相似文献   

6.
The complexity, approximation and algorithmic issues of several clustering problems are studied. These non-traditional clustering problems arise from recent studies in microarray data analysis. We prove the following results. (1) Two variants of the Order-Preserving Submatrix problem are NP-hard. There are polynomial-time algorithms for the Order-Preserving Submatrix problem when the condition or gene sets are fixed. (2) Three variants of the Smooth Clustering problem are NP-hard. The Smooth Subset problem is approximable with ratio 0.5, but it cannot be approximable with ratio 0.5 + δ for any δ > 0 unless NP = P. (3) The inferring plaid model problem is NP-hard.  相似文献   

7.
This work deals with a class of problems under interval data uncertainty, namely interval robust-hard problems, composed of interval data min-max regret generalizations of classical NP-hard combinatorial problems modeled as 0-1 integer linear programming problems. These problems are more challenging than other interval data min-max regret problems, as solely computing the cost of any feasible solution requires solving an instance of an NP-hard problem. The state-of-the-art exact algorithms in the literature are based on the generation of a possibly exponential number of cuts. As each cut separation involves the resolution of an NP-hard classical optimization problem, the size of the instances that can be solved efficiently is relatively small. To smooth this issue, we present a modeling technique for interval robust-hard problems in the context of a heuristic framework. The heuristic obtains feasible solutions by exploring dual information of a linearly relaxed model associated with the classical optimization problem counterpart. Computational experiments for interval data min-max regret versions of the restricted shortest path problem and the set covering problem show that our heuristic is able to find optimal or near-optimal solutions and also improves the primal bounds obtained by a state-of-the-art exact algorithm and a 2-approximation procedure for interval data min-max regret problems.  相似文献   

8.
Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, e.g., V-Optimal, use error measures which are the sums of a suitable function, e.g., square, of the error at each point. Although the best-known algorithms for solving these problems run in quadratic time, a sequence of results have given us a linear time approximation scheme for these algorithms. In recent years, there have been many emerging applications where we are interested in measuring the maximum (absolute or relative) error at a point. We show that this problem is fundamentally different from the other traditional {rm{non}}{hbox{-}}ell_infty error measures and provide an optimal algorithm that runs in linear time for a small number of buckets. We also present results which work for arbitrary weighted maximum error measures.  相似文献   

9.
The general problem of minimizing the maximal regret in combinatorial optimization problems with interval data is considered. In many cases, the minmax regret versions of the classical, polynomially solvable, combinatorial optimization problems become NP-hard and no approximation algorithms for them have been known. Our main result is a polynomial time approximation algorithm with a performance ratio of 2 for this class of problems.  相似文献   

10.
Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.  相似文献   

11.
We investigate the complexity of learning for the well-studied model in which the learning algorithm may ask membership and equivalence queries. While complexity theoretic techniques have previously been used to prove hardness results in various learning models, these techniques typically are not strong enough to use when a learning algorithm may make membership queries. We develop a general technique for proving hardness results for learning with membership and equivalence queries (and for more general query models). We apply the technique to show that, assuming , no polynomial-time membership and (proper) equivalence query algorithms exist for exactly learning read-thrice DNF formulas, unions of halfspaces over the Boolean domain, or some other related classes. Our hardness results are representation dependent, and do not preclude the existence of representation independent algorithms.?The general technique introduces the representation problem for a class F of representations (e.g., formulas), which is naturally associated with the learning problem for F. This problem is related to the structural question of how to characterize functions representable by formulas in F, and is a generalization of standard complexity problems such as Satisfiability. While in general the representation problem is in , we present a theorem demonstrating that for "reasonable" classes F, the existence of a polynomial-time membership and equivalence query algorithm for exactly learning F implies that the representation problem for F is in fact in co-NP. The theorem is applied to prove hardness results such as the ones mentioned above, by showing that the representation problem for specific classes of formulas is NP-hard. Received: December 6, 1994  相似文献   

12.
基于逻辑规则的语义缓存查询处理优化技术   总被引:3,自引:0,他引:3  
郝小卫  章陶  李磊 《计算机学报》2005,28(7):1096-1103
语义缓存在移动计算环境中有着非常广阔的应用前景.查询处理是语义缓存的一个关键问题,但是现有的查询处理算法在时空效率和裁剪结果的复杂度两个方面存在很大的局限性,这在一定程度上限制了语义缓存的实用性.为了克服这些缺陷,作者首先给出并证明了用于优化查询裁剪的逻辑规则;基于这些规则,给出了剩余查询的裁剪算法;最终给出了只需进行剩余查询裁剪的优化查询处理算法.算法分析从理论上证明了该优化机制的有效性,同时,仿真实验的性能比较也表明该优化方法在提高查询裁剪时空效率和降低剩余查询复杂度等方面都要明显优于没有优化的方法.  相似文献   

13.
Optimal power flow (OPF) is the central optimization problem in electric power grids. Although solved routinely in the course of power grid operations, it is known to be strongly NP-hard in general, and weakly NP-hard over tree networks. In this paper, we formulate the optimal power flow problem over tree networks as an inference problem over a tree-structured graphical model where the nodal variables are low-dimensional vectors. We adapt the standard dynamic programming algorithm for inference over a tree-structured graphical model to the OPF problem. Combining this with an interval discretization of the nodal variables, we develop an approximation algorithm for the OPF problem. Further, we use techniques from constraint programming (CP) to perform interval computations and adaptive bound propagation to obtain practically efficient algorithms. Compared to previous algorithms that solve OPF with optimality guarantees using convex relaxations, our approach is able to work for arbitrary tree-structured distribution networks and handle mixed-integer optimization problems. Further, it can be implemented in a distributed message-passing fashion that is scalable and is suitable for “smart grid” applications like control of distributed energy resources. Numerical evaluations on several benchmark networks show that practical OPF problems can be solved effectively using this approach.  相似文献   

14.

Machine learning algorithms typically rely on optimization subroutines and are well known to provide very effective outcomes for many types of problems. Here, we flip the reliance and ask the reverse question: can machine learning algorithms lead to more effective outcomes for optimization problems? Our goal is to train machine learning methods to automatically improve the performance of optimization and signal processing algorithms. As a proof of concept, we use our approach to improve two popular data processing subroutines in data science: stochastic gradient descent and greedy methods in compressed sensing. We provide experimental results that demonstrate the answer is “yes”, machine learning algorithms do lead to more effective outcomes for optimization problems, and show the future potential for this research direction. In addition to our experimental work, we prove relevant Probably Approximately Correct (PAC) learning theorems for our problems of interest. More precisely, we show that there exists a learning algorithm that, with high probability, will select the algorithm that optimizes the average performance on an input set of problem instances with a given distribution.

  相似文献   

15.
In order to formulate mathematical conjectures likely to be true, a number of base cases must be determined. However, many combinatorial problems are NP-hard and the computational complexity makes this research approach difficult using a standard brute force approach on a typical computer. One sample problem explored is that of finding a minimum identifying code. To work around the computational issues, a variety of methods are explored and consist of a parallel computing approach using MATLAB, an adiabatic quantum optimization approach using a D-Wave quantum annealing processor, and lastly using satisfiability modulo theory (SMT) and corresponding SMT solvers. Each of these methods requires the problem to be formulated in a unique manner. In this paper, we address the challenges of computing solutions to this NP-hard problem with respect to each of these methods.  相似文献   

16.
随着时代的飞速发展,人们对智能生活的追求不断提高,空间查询也被人们愈来愈重视。移动空间关键字查询,作为一种主要的连续空间查询类型,受到了广泛的研究。在最新的顶尖会议文刊中,提出了一种新的查询类型,称为移动集合空间关键字查询(MCSKQ)。这种类型的查询不断报告一组对象,这些对象在查询移动时共同覆盖查询关键字。同时,返回的对象也必须靠近查询对象并且彼此靠近。计算精确的结果集是一个NP-hard的问题。为了降低查询处理的成本,本文提出了基于安全区域技术的算法,在查询对象移动时,保持精确的结果集。在其基础上,本文基于MCKSQ的思想提出新的优化策略,以降低查询处理成本的方法。  相似文献   

17.
There are many optimization problems having the following common property:Given a total task consisting of many subtasks,the problem asks to find a solution to complete only part of these subtasks.Examples include the k-Forest problem and the k-Multicut problem,etc.These problems are called partial optimization problems,which are often NP-hard.In this paper,we systematically study the LP-rounding plus greed approach,a method to design approximation algorithms for partial optimization problems.The approach is simple,powerful and versatile.We show how to use this approach to design approximation algorithms for the k-Forest problem,the k-Multicut problem,the k-Generalized connectivity problem,etc.  相似文献   

18.
This work presents sequential and parallel evolutionary algorithms (EAs) applied to the scheduling problem in heterogeneous computing environments, a NP-hard problem with capital relevance in distributed computing. These methods have been specifically designed to provide accurate and efficient solutions by using simple operators that allow them to be later extended for solving realistic problem instances arising in distributed heterogeneous computing (HC) and grid systems. The EAs were codified over MALLBA, a general-purpose library for combinatorial optimization. Efficient numerical results are reported in the experimental analysis performed on well-known problem instances. The comparative study of scheduling methods shows that the parallel versions of the implemented evolutionary algorithms are able to achieve high problem solving efficacy, outperforming traditional scheduling heuristics and also improving over previous results already reported in the related literature.  相似文献   

19.
As a new service-oriented smart manufacturing paradigm, cloud manufacturing (CMfg) aims at fully sharing and circulation of manufacturing capabilities towards socialization, in which composite CMfg service optimal selection (CCSOS) involves selecting appropriate services to be combined as a composite complex service to fulfill a customer need or a business requirement. Such composition is one of the most difficult combination optimization problems with NP-hard complexity. For such an NP-hard CCSOS problem, this study proposes a new approach, called multi-population parallel self-adaptive differential artificial bee colony (MPsaDABC) algorithm. The proposed algorithm adopts multiple parallel subpopulations, each of which evolves according to different mutation strategies borrowed from the differential evolution (DE) to generate perturbed food sources for foraging bees, and the control parameters of each mutation strategy are adapted independently. Moreover, the size of each subpopulation is dynamically adjusted based on the information derived from the search process. Different scales of the CCSOS problems are conducted to validate the effectiveness of the proposed algorithm, and the experimental results show that the proposed algorithm has superior performance over other hybrid and single population algorithms, especially for complex CCSOS problems.  相似文献   

20.
分布式数据库系统由于数据的分布和冗余使得分布式查询处理增加了许多新的内容和复杂性,因此分布式查询处理的优化显得尤为重要。本文简要介绍分布式查询优化的目标、策略,并针对分布式数据库系统的查询优化,讲述三个典型的算法:INGRES算法、SystemR*算法、SDD-1算法,并进行对比、优化、总结,最后对SDD-1算法进行改进。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号