首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper investigates the optimization problem when executing a join in a distributed database environment. The minimization of the communication cost for sending data through links has been adopted as an optimization criterion. We explore in this paper the approach of judiciously using join operations as reducers in distributed query processing. In general, this problem is computationally intractable. A restriction of the execution of a join in a pre-defined combinatorial order leads to a possible solution in polynomial time. An algorithm for a chain query computation has been proposed in [21]. The time complexity of the algorithm isO(m 2 n 2+m 3 n), wheren is the number of sites in the network, andm is the number of relations (fragments) involved in the join. In this paper, we firstly present a proof of the intuitively well understood fact—that the eigenorder of a chain join will be the best pre-defined combinatorial order to implement the algorithm in [21]. Secondly, we show a sufficient and necessary condition for a chain query with the eigenordering to be a simple query. For the process of the class of simple queries, we show a significant reduction of the time complexity fromO(m 2 n 2+m 3 n) toO(mn+m 2). It is encouraging that, in practice, the most frequent queries belong to the category of simple queries. Editor: Peter Apers  相似文献   

2.
Object-oriented databases (OODBs) provide an effective means for capturing complex data and semantic relationships underlying many real-world database applications. Because users' interactions with databases have increased significantly in today's era of client–server computing, it is important to examine users' ability to interact with such databases. We investigated a number of factors that potentially affect performance in writing queries on an OODB. First, we evaluated the utility of graphical and textual schemas associated with emerging OODBs from the perspective of database querying. Second, we examined the use of two different strategies (navigation and join) that could be used in writing OODB queries. Third, we examined a number of factors that potentially contribute to the complexity of an OODB query.Our exploratory study examined the performance of 20 graduate students in an experiment in which each participant wrote queries for two problems, one using a graphical OODB schema and the other a textual OODB schema. The participants had no prior exposure to the object-oriented data model. We found that there was no difference in query writing performance (either accuracy or time) using the graphical and textual schemas. Examination of query strategy revealed that a significant number of participants used a join strategy, rather than the navigation strategy that matches the database structure. Use of the join strategy resulted in significantly less accurate and slower query writing than did the navigation strategy. From the viewpoint of complexity, the number of objects referenced in a query, the number of starting points in the from clause, and the presence of special operators influenced both the accuracy and time of query writing.  相似文献   

3.
Many database applications and environments, such as mediation over heterogeneous database sources and data warehousing for decision support, lead to complex queries. Queries are often nested, defined over previously defined views, and may involve unions. There are good reasons why one might want to remove pieces (sub-queries or sub-views) from such queries: some sub-views of a query may be effectively cached from previous queries, or may be materialized views; some may be known to evaluate empty, by reasoning over the integrity constraints; and some may match protected queries, which for security cannot be evaluated for all users.In this paper, we present a new evaluation strategy with respect to queries defined over views, which we call tuple-tagging, that allows for an efficient removal of sub-views from the query. Other approaches to this are to rewrite the query so the sub-views to be removed are effectively gone, then to evaluate the rewritten query. With the tuple tagging evaluation, no rewrite of the original query is necessary.We describe formally a discounted query (a query with sub-views marked that are to be considered as removed), present the tuple tagging algorithm for evaluating discounted queries, provide an analysis of the algorithm's performance, and present some experimental results. These results strongly support the tuple-tagging algorithm both as an efficient means to effectively remove sub-views from a view query during evaluation, and as a viable optimization strategy for certain applications. The experiments also suggest that rewrite techniques for this may perform worse than the evaluation of the original query, and much worse than the tuple tagging approach.  相似文献   

4.
Answering heterogeneous database queries with degrees of uncertainty   总被引:1,自引:0,他引:1  
In heterogeneous database systems,partial values have been used to resolve some schema integration problems. Performing operations on partial values may producemaybe tuples in the query result which cannot be compared. Thus, users have no way to distinguish which maybe tuple is the most possible answer. In this paper, the concept of partial values is generalized toprobabilistic partial values. We propose an approach to resolve the schema integration problems using probabilistic partial values and develop a full set of extended relational operators for manipulating relations containing probabilistic partial values. With this approach, the uncertain answer tuples of a query are associated with degrees of uncertainty (represented by probabilities). That provides users a comparison among maybe tuples and a better understanding on the query results. Besides, extended selection and join are generalized to -selection and -join, respectively, which can be used to filter out maybe tuples with low probabilities — those which have probabilities smaller than .  相似文献   

5.
The relation between an operational interleaving semantics forTSCP based on a transition system and a compositional true concurrency semantics based on event structures is studied. In particular we extend the consistency result of Goltz and Loogan [15] forTCSP processes without recursion to the general case. Thus we obtain for everyTCSP processP that its operational meaningO(P) and the interleaving behaviourO( M3P3) which is derived from the event structureM3P3 associated withP are bisimilar.  相似文献   

6.
Exploratory data mining and analysis requires a computing environment which provides facilities for the user-friendly expression and rapid execution of scientific queries. In this paper, we address research issues in the parallelization of scientific queries containing complex user-defined operations. In a parallel query execution environment, parallelizing a query execution plan involves determining how input data streams to evaluators implementing logical operations can be divided to be processed by clones of the same evaluator in parallel. We introduced the concept of relevance window that characterizes data lineage and data partitioning opportunities available for an user-defined evaluator. In addition, we developed a query parallelization framework by extending relational parallel query optimization algorithms to allow the parallelization characteristics of user-defined evaluators to guide the process of query parallelization in an extensible query processing environment. We demonstrated the utility of our system by performing experiments mining cyclonic activity, blocking events, and the upward wave-energy propagation features from several observational and model simulation datasets.  相似文献   

7.
We consider the problem of identifying a base k string given an oracle which returns information about the number of correct components in a query, specifically, the Hamming distance between the query and the solution, modulo r = max{2, 6 – k}. Classically this problem requires (nlog r k) queries. For k {2, 3, 4}, we construct quantum algorithms requiring only a single quantum query. For k > 4, we show that O(k) quantum queries suffice. In both cases the quantum algorithms are optimal. PACS: 03.67.Lx  相似文献   

8.
混合散列连接算法(HHJ)是数据库管理系统查询处理中一种重要的连接算法. 本文提出通过缓存优化来减少随机I/O的缓存优化混合散列连接算法(OHHJ), 即通过合理优化分区阶段桶缓存的大小来尽量减少分区过程中产生的随机I/O. 文章通过对分区(桶)大小、桶缓存大小、可用缓存大小、关系表大小与硬盘随机I/O访问特性之间的关系进行定量分析, 得出桶大小以及桶缓存大小最优分配的启发式. 实验结果表明OHHJ可以较好地减少传统HHJ算法分区阶段产生的随机I/O, 提升了算法性能.  相似文献   

9.
Let a tuple of n objects obeying a query graph (QG) be called the n-tuple. The D_distance-value of this n-tuple is the value of a linear function of distances of the n objects that make up this n-tuple, according to the edges of the QG. This paper addresses the problem of finding the K n-tuples between n spatial datasets that have the smallest D_distance-values, the so-called K-multi-way distance join query (K-MWDJQ), where each set is indexed by an R-tree-based structure. This query can be viewed as an extension of K-closest-pairs query (K-CPQ) [8] for n inputs. In addition, a recursive non-incremental branch-and-bound algorithm following a depth-first search for processing synchronously all inputs without producing any intermediate result is proposed. Enhanced pruning techniques are also applied to n R-trees nodes in order to reduce the total response time and the number of distance computations of the query. Due to the exponential nature of the problem, we also propose a time-based approximate version of the recursive algorithm that combines approximation techniques to adjust the quality of the result and the global processing time. Finally, we give a detailed experimental study of the proposed algorithms using real spatial datasets, highlighting their performance and the quality of the approximate results.  相似文献   

10.
We consider the parallel time complexity of logic programs without function symbols, called logical query programs, or Datalog programs. We give a PRAM algorithm for computing the minimum model of a logical query program, and show that for programs with the polynomial fringe property, this algorithm runs in time that is logarithmic in the input size, assuming that concurrent writes are allowed if they are consistent. As a result, the linear and piecewise linear classes of logic programs are inN C. Then we examine several nonlinear classes in which the program has a single recursive rule that is an elementary chain. We show that certain nonlinear programs are related to GSM mappings of a balanced parentheses language, and that this relationship implies the polynomial fringe property; hence such programs are inN C Finally, we describe an approach for demonstrating that certain logical query programs are log space complete forP, and apply it to both elementary single rule programs and nonelementary programs.Supported by NSF Grant IST-84-12791, a grant of IBM Corporation, and ONR contract N00014-85-C-0731.  相似文献   

11.
Selective Sampling Using the Query by Committee Algorithm   总被引:23,自引:0,他引:23  
Freund  Yoav  Seung  H. Sebastian  Shamir  Eli  Tishby  Naftali 《Machine Learning》1997,28(2-3):133-168
We analyze the query by committee algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of perceptrons.  相似文献   

12.
We present an architecture for query processing in the relational model extended with transaction time. The architecture integrates standard query optimization and computation techniques with new differential computation techniques. Differential computation computes a query incrementally or decrementally from the cahced and indexed results of previous computations. The use of differential computation techniques is essential in order to provide efficient processing of queries that access very large temporal relations. Alternative query plans are integrated into a state transition network, where the state space includes backlogs of base relations, cached results from previous computations, a cache index, and intermediate results; the transitions include standard relational algebra operators, operators for constructing differential files, operators for differential computation, and combined operators. A rule set is presented to prune away parts of state transition networks that are not promising, and dynamic programming techniques are used to identify the optimal plans from the remaining state transition networks. An extended logical access path serves as a structuring index on the cached results and contains, in addition, vital statistics for the query optimization process (including statistics about base relations, backlogs, and queries-previously computed and cached, previously computed, or just previously estimated).  相似文献   

13.
部分整体关系获取是知识获取中的重要组成部分。Web逐步成为知识获取的重要资源之一。搜索引擎是从Web中获取部分整体关系知识的有效手段之一,我们将Web中包含部分整体关系的检索结果集合称为部分整体关系语料。由于目前主流搜索引擎尚不支持语义搜索,如何构造有效的查询以得到富含部分整体关系的语料,从而进一步获取部分整体关系,就成为一个重要的问题。该文提出了一种新的查询构造方法,目的在于从Web中获取部分整体关系语料。该方法能够构造基于语境词的查询,进而利用现有的搜索引擎从Web中获取部分整体关系语料。该方法在两个方面与人工构造查询方法和基于语料库查询构造查询方法所获取的语料进行对比,其一是语料中含有部分整体关系的语句数量;二是从语料中进一步获取部分整体关系的难易程度。实验结果表明,该方法远远优于后两者。  相似文献   

14.
Efficient and effective Querying by Image Content   总被引:35,自引:0,他引:35  
In the QBIC (Query By Image Content) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, shape, position, and dominant edges of image objects and regions. Potential applications include medical (Give me other images that contain a tumor with a texture like this one), photo-journalism (Give me images that have blue at the top and red at the bottom), and many others in art, fashion, cataloging, retailing, and industry. We describe a set of novel features and similarity measures allowing query by image content, together with the QBIC system we implemented. We demonstrate the effectiveness of our system with normalized precision and recall experiments on test databases containing over 1000 images and 1000 objects populated from commercially available photo clip art images, and of images of airplane silhouettes. We also present new methods for efficient processing of QBIC queries that consist of filtering and indexing steps. We specifically address two problems: (a) non Euclidean distance measures; and (b) the high dimensionality of feature vectors. For the first problem, we introduce a new theorem that makes efficient filtering possible by bounding the non-Euclidean, full cross-term quadratic distance expression with a simple Euclidean distance. For the second, we illustrate how orthogonal transforms, such as Karhunen Loeve, can help reduce the dimensionality of the search space. Our methods are general and allow some false hits but no false dismissals. The resulting QBIC system offers effective retrieval using image content, and for large image databases significant speedup over straightforward indexing alternatives. The system is implemented in X/Motif and C running on an RS/6000.On sabbatical from Univ. of Maryland, College Park. His work was partially supported by SRC, by the National Science Foundation under the grant IRI-8958546 (PYI).  相似文献   

15.
We investigate three-dimensional visibility problems for scenes that consist ofn non-intersecting spheres. The viewing point moves on a flightpath that is part of a circle at infinity given by a planeP and a range of angles {(t)¦t[01]} [02]. At timet, the lines of sight are parallel to the ray inP, which starts in the origin ofP and represents the angle(t) (orthographic views of the scene). We give an algorithm that computes the visibility graph at the start of the flight, all time parameters at which the topology of the scene changes, and the corresponding topology changes. The algorithm has running time0(n + k + p) logn), wheren is the number of spheres in the scene;p is the number of transparent topology changes (the number of different scene topologies visible along the flight path, assuming that all spheres are transparent); andk denotes the number of vertices (conflicts) which are in the (transparent) visibility graph at the start and do not disappear during the flight.The second author was supported by the ESPRIT II Basic Research Actions Program, under Contract No. 3075 (project ALCOM).  相似文献   

16.
Flexible distributed query processing capabilities are an important prerequisite for building scalable Internet applications, such as electronic Business-to-Business (B2B) market places. Architecting an electronic market place in a conventional data warehouse-like approach by integrating all the data from all participating enterprises in one centralized repository incurs severe problems: stale data, data security threats, administration overhead, inflexibility during query processing, etc. In this paper we present a new framework for dynamic distributed query processing based on so-called HyperQueries which are essentially query evaluation sub-plans sitting behind hyperlinks. Our approach facilitates the pre-materialization of static data at the market place whereas the dynamic data remains at the data sources. In contrast to traditional data integration systems, our approach executes essential (dynamic) parts of the data-integrating views at the data sources. The other, more static parts of the data are integrated à priori at the central portal, e.g., the market place. The portal serves as an intermediary between clients and data providers which execute their sub-queries referenced via hyperlinks. The hyperlinks are embedded as attribute values within data objects of the intermediarys database. Retrieving such a virtual object will execute the referenced HyperQuery in order to materialize the missing data. We illustrate the flexibility of this distributed query processing architecture in the context of B2B electronic market places with an example derived from the car manufacturing industry.Based on these HyperQueries, we propose a reference architecture for building scalable and dynamic electronic market places. All administrative tasks in such a distributed B2B market place are modeled as Web services and are initiated decentrally by the participants. Thus, sensitive data remains under the full control of the data providers. We describe optimization and implementation issues to obtain an efficient and highly flexible data integration platform for electronic market places. All proposed techniques have been fully implemented in our QueryFlow prototype system which served as the platform for our performance evaluation.  相似文献   

17.
The paper considers an N × n matrix (N n) over a field GF(2) that consists of random values with a distribution depending on a small parameter . The expansion is found in terms of the power of the parameter of the probability that the matrix rank is equal to n. Exact values of the first three coefficients are indicated.  相似文献   

18.
Basic problems in the use of applied mathematical statistics for the modeling of complex systems are considered; the possibility of establishing the uniqueness of a mathematical model of optimal complexity by the group method of data handling (GMDH) is demonstrated. The basic shortcoming of contemporary mathematical statistics is that the models used are too simple because until now in regression analysis only one mean-squared error criterion has been used. To define a mathematical model of optimal complexity GMDH uses not one but two criteria and these two criteria assure a unique solution. The resulting equations are so complex that only the multilayered structure of GMDH allows us to write them down. The method works not only whenK N but also whenK >N(Kis the number of coefficients of the regression equation,N is the number of interpolation points). Increasing the area of optimization raises the accuracy of the model. The second criterion should be heuristic. Mean-squared error defined on a test sequence is used. The division of data into training and test sequences is the basic object of so-called goal-directed regularization. A second shortcoming of contemporary applied mathematical statistics is the absence of freedom of decision in the terminology of D. Gabor. The GMDH selection-type algorithm realizes both the self-organization and freedom of decision criteria. GMDH is a nonparametric procedure and does not require many of the concepts of mathematical statistics.  相似文献   

19.
With every finite-state word or tree automaton, we associate a binary relation on words or trees. We then consider the rectangular decompositions of this relation, i.e., the various ways to express it as a finite union of Cartesian products of sets of words or trees, respectively. We show that the determinization and the minimization of these automata correspond to simple geometrical reorganizations of the rectangular decompositions of the associated relations.This work was supported by the Programme de Recherches Coordonnées: Mathématiques et Informatique. It was initiated during a stay in Bordeaux by D. Niwinski in 1988.  相似文献   

20.
Reliable and probably useful learning, proposed by Rivest and Sloan, is a variant of probably approximately correct learning. In this model the hypothesis must never misclassify an instance but is allowed to answer I don't know with a low probability. We derive upper and lower bounds for the sample complexity of reliable and probably useful learning in terms of the combinatorial characteristics of the concept class to be learned. This is done by reducing reliable and probably useful learning to learning with one-sided error. The bounds also hold for a slightly weaker model that allows the learner to output with a low probability a hypothesis that makes misclassifications. We see that in these models learning with one oracle is more difficult than learning with two oracles. Our results imply that monotone Boolean conjunctions or disjunctions cannot be learned reliably and probably usefully from a polynomial number of examples. Rectangles in n forn 2 cannot be learned from any finite number of examples.A preliminary version of this paper appeared under the title Reliable and useful learning inProceedings of the 2nd Annual Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 1989, pp. 365–380. This work was supported by the Academy of Finland.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号