共查询到20条相似文献,搜索用时 0 毫秒
1.
Bodorik P. Riordon J.S. Pyra J.S. 《Knowledge and Data Engineering, IEEE Transactions on》1992,4(3):253-265
Most algorithms for determining query processing strategies in distributed databases are static in nature; that is, the strategy is completely determined on the basis of a priori estimates of the size of intermediate results, and it remains unchanged throughout its execution. The static approach may be far from optimal because it denies the opportunity to reschedule operations if size estimates are found to be inaccurate. Adaptive query execution may be used to alleviate this problem. A low overhead delay method is proposed to decide when to correct a strategy. Sampling is used to estimate the size of relations, and alternative heuristic strategies prepared in a background mode are used to decide when to correct. Evaluation using a model of a distributed database indicates that the heuristic strategies are near optimal. Moreover, it also suggests that it is usually correct to abort creation of an intermediate relation which is much larger than predicted 相似文献
2.
The collective processing of multiple queries in a database system has recently received renewed attention due to its capability of improving the overall performance of a database system and its applicability to the design of knowledge-based expert systems and extensible database systems. A new multiple query processing strategy is presented which utilizes semantic knowledge on data integrity and information on predicate conditions of the access paths (plans) of queries. The processing of multiple queries is accomplished by the utilization of subset relationships between intermediate results of query executions, which are inferred employing both semantic and logical information. Given a set of fixed order access plans, the A* algorithm is used to find the set of reformulated access plans which is optimal for a given collection of semantic knowledge. 相似文献
3.
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as
a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing
site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective
in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query
processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental
processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes
our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing
model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch
processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms
for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering.
Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different
communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose
an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never
ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic
which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct
comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan
between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively. 相似文献
4.
5.
James Cheng Yiping Ke Ada Wai-Chee Fu Jeffrey Xu Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(4):521-539
This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a
supergraph. Existing works usually construct an index and performs a filtering-and-verification process, which still requires many subgraph isomorphism testings. There are also significant overheads in both index construction
and maintenance. In this paper, we design a graph querying system that achieves both fast indexing and efficient query processing.
The index is constructed by a simple but fast method of extracting the commonality among the graphs, which does not involve
any costly operation such as graph mining. Our query processing has two key techniques, direct inclusion and filtering. Direct inclusion allows partial query answers to be included directly without candidate verification. Our filtering technique
further reduces the candidate set by operating on a much smaller projected database. Experimental results show that our method
is significantly more efficient than the existing works in both indexing and query processing, and our index has a low maintenance
cost. 相似文献
6.
为了解决多路空间距离连接查询问题,提出了一种基于R树的非增量递归算法。该算法采用深度优先递归搜索策略,同步遍历n个空间数据集对应的R树,算法结束时,同时返回K个距离最短的n元组。并且采用基于距离的平面扫描技术对该算法进行了优化,有效减少磁盘访问次数和CPU响应时间。最后,通过实验验证了算法的有效性。 相似文献
7.
在移动计算环境下,基于准确的操作代价估算结果来选择合适的连接查询处理模式,可以减少数据的传输量和移动设备的能量消耗。探讨了该环境下移动设备能量消耗的一个新的非对称特征,提出了一种操作代价估算方法,并从数据传输量和能量消耗两个方面对连接查询处理模式进行了代价估算和性能比较,提出了4个实用准则,以指导连接查询处理模式的选择。试验结果充分论证了估算方法和准则的正确性,且比现有同类估算模型和结论具有更加广泛的应用范围。 相似文献
8.
Dutta Supriyo Adhikari Bibhas Banerjee Subhashish 《Quantum Information Processing》2016,15(5):2193-2212
Quantum Information Processing - Building upon our previous work, on graphical representation of a quantum state by signless Laplacian matrix, we pose the following question. If a local unitary... 相似文献
9.
Csaba J. Egyhazy Konstantinos P. Triantis Bharat Bhasker 《Distributed and Parallel Databases》1996,4(1):49-79
This paper presents a query processing algorithm, formulated and developed in support of the prototype architecture of the Distributed Access View Integrated Database (DAVID) which is a heterogeneous distributed database management system. The objective of the proposed query processing algorithm is to produce an inexpensive strategy for a given query. The inexpensive query strategy is obtained primarily by computing the most profitable semi-joins and by determining the best sequence of join operations per processing site. The latter is obtained by applying a zero-one integer linear program that uses a non-parametric statistical estimation technique to compute the sizes of the temporary clusters. A cluster is a subset of the cartesian product of a list of atomic and non-atomic domains and is the structure that can represent in a uniform way data stored in relational, hierarchical and network databases.Following some background information on the development of the DAVID prototype, this paper introduces the schema architecture. The schema architecture describes the mechanism by which the component heterogeneous database schemata are mapped into the uniform global schema. This is followed by the formulation of the query processing algorithm, its implementation and an illustration of its use in the context of NASA's Astrophysics Data System.Recommended by: Y. Breitbart 相似文献
10.
The Space Efficient Embedded Cluster (SEEC) is a new system that offers a practical solution to space-restricted distributed processing. Utilising Linux compatible, embedded network controllers and standard Beowulf libraries presents developers with an easy to use, modular, distributed architecture. Networking characteristics are examined to quantify MPI operational overheads. Analysis in terms of weight and volume is undertaken when compared to a reference PC system for the example application RC5. A similar analysis is presented for DGEMM when utilising a possible, modular, per node FPGA enhancement. 相似文献
11.
The authors discuss various performance issues in distributed query processing. They validate and evaluate the performance of the local reduction (LR) the fragment and replicate strategy (FRS) and the partition and replicate strategy (PRS) optimization algorithms. The experimental results reveal that the choices made by these algorithms concerning which local operations should be performed, which relation should remain fragmented or which relation should be partitioned are valid. It is shown using experimental results that various parameters, such as the number of processing sites, partitioning speed relative to join speed, and sizes of the join relations, affect the performance of PRS significantly. It is also shown that the response times of query execution are affected significantly by the degree of site autonomy, interferences among processes, interface with the local database management systems (DBMSs) and communications facilities. Pipeline strategies for processing queries in an environment where relations are fragmented are studied 相似文献
12.
Yongluan Zhou Beng Chin Ooi Kian-Lee Tan Wee Hyong Tok 《Data & Knowledge Engineering》2005,53(3):1-309
Traditionally, distributed query optimization techniques generate static query plans at compile time. However, the optimality of these plans depends on many parameters (such as the selectivities of operations, the transmission speeds and workloads of servers) that are not only difficult to estimate but are also often unpredictable and fluctuant at runtime. As the query processor cannot dynamically adjust the plans at runtime, the system performance is often less than satisfactory. In this paper, we introduce a new highly adaptive distributed query processing architecture. Our architecture can quickly detect fluctuations in selectivities of operations, as well as transmission speeds and workloads of servers, and accordingly change the operation order of a distributed query plan during execution. We have implemented a prototype based on the Telegraph system [Telegragraph project. Available from >]. Our experimental study shows that our mechanism can adapt itself to the changes in the environment and hence approach to an optimal plan during execution. 相似文献
13.
Mei BAI Junchang XIN Guoren WANG Roger ZIMMERMANN Xite WANG 《Frontiers of Computer Science》2016,10(2):330-352
The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot be applied to new applications. Therefore, in this paper, it is the first attempt to propose a scalable method to process skyline-join queries in distributed databases. First, a tailored distributed framework is presented to facilitate the computation of skyline-join queries. Second, the distributed skyline-join query algorithm (DSJQ) is designed to process skyline-join queries. DSJQ contains two phases. In the first phase, two filtering strategies are used to filter out unpromising tuples from the original tables. The remaining tuples are transmitted to the corresponding data nodes according a partition function, which can guarantee that the tuples with the same join value are transferred to the same node. In the second phase, we design a scheduling plan based on rotations to calculate the final skyline-join result. The scheduling plan can ensure that calculations are equally assigned to all the data nodes, and the calculations on each data node can be processed in parallel without creating a bottleneck node. Finally, the effectiveness of DSJQ is evaluated through a series of experiments. 相似文献
14.
15.
在分布式数据库系统中,由于数据的分布和冗余,使得分布式查询处理增加了许多新的内容和复杂性,通过分析现有分布式数据库查询处理技术,根据应用实际提出一种新的查询处理方法,该方法通过将常用查询结果存储在本地来减少查询时的数据传输量,从而缩短了响应时间.实验证明了该方法是有效的. 相似文献
16.
This paper is concerned with data provisioning services (information search, retrieval, storage, etc.) dealing with a large and heterogeneous information repository. Increasingly, this class of services is being hosted and delivered through Cloud infrastructures. Although such systems are becoming popular, existing resource management methods (e.g. load-balancing techniques) do not consider workload patterns nor do they perform well when subjected to non-uniformly distributed datasets. If these problems can be solved, this class of services can be made to operate in more a scalable, efficient, and reliable manner. The main contribution of this paper is a approach that combines proprietary cloud-based load balancing techniques and density-based partitioning for efficient range query processing across relational database-as-a-service in cloud computing environments. The study is conducted over a real-world data provisioning service that manages a large historical news database from Thomson Reuters. The proposed approach has been implemented and tested as a multi-tier web application suite consisting of load-balancing, application, and database layers. We have validated our approach by conducting a set of rigorous performance evaluation experiments using the Amazon EC2 infrastructure. The results prove that augmenting a cloud-based load-balancing service (e.g. Amazon Elastic Load Balancer) with workload characterization intelligence (density and distribution of data; composition of queries) offers significant benefits with regards to the overall system’s performance (i.e. query latency and database service throughput). 相似文献
17.
This work introduces decentralized query processing techniques based on MIDAS, a novel distributed multidimensional index. In particular, MIDAS implements a distributed k-d tree, where leaves correspond to peers, and internal nodes dictate message routing. MIDAS requires that peers maintain little network information, and features mechanisms that support fault tolerance and load balancing. The proposed algorithms process point and range queries over the multidimensional indexed space in only O(log n) hops in expectance, where n is the network size. For nearest neighbor queries, two processing alternatives are discussed. The first, termed eager processing, has low latency (expected value of O(log n) hops) but may involve a large number of peers. The second, termed iterative processing, has higher latency (expected value of O(log2 n) hops) but involves far fewer peers. A detailed experimental evaluation demonstrates that our query processing techniques outperform existing methods for settings involving real spatial data as well as in the case of high dimensional synthetic data. 相似文献
18.
In this paper, we propose an intelligent distributed query processing method considering the characteristics of a distributed ontology environment. We suggest more general models of the distributed ontology query and the semantic mapping among distributed ontologies compared with the previous works. Our approach rewrites a distributed ontology query into multiple distributed ontology queries using the semantic mapping, and we can obtain the integrated answer through the execution of these queries. Furthermore, we propose a distributed ontology query processing algorithm with several query optimization techniques: pruning rules to remove unnecessary queries, a cost model considering site load balancing and caching, and a heuristic strategy for scheduling plans to be executed at a local site. Finally, experimental results show that our optimization techniques are effective to reduce the response time. 相似文献
19.
为了实现分布式空间数据库之间的互操作,需要对分布式查询进行优化处理,这种查询处理指的是在任何一个数据处理语句中它访问的是各个节点的数据而不是仅仅对发起查询的节点。提出了一种查询优化器的体系结构,针对上述查询最优化做了详细的讨论,着重讨论包含空间选择和连接的复杂空间查询。建立了典型的空间数据库的案例程序,通过分析表明,带有过滤和修正的查询优化器在时间与空间上的效率优势比较明显,获得了具有参考价值的结果。 相似文献
20.
Relaxation and approximation techniques have been proposed as approaches for improving the quality of query results, in terms
of completeness and accuracy, in environments where the user may not be able to specify the query in a complete and exact
way, since data are quite heterogeneous or she may not know all the characteristics of data at hand. This problem, mainly
addressed for relational and XML data, is nowadays quite relevant also for geo-spatial data, due to their increasing usage
in highly critical decisional processes. Among geo-spatial queries, those based on spatial and more precisely topological
relations are currently used in an increasing number of applications. As far as we know, no approach has been proposed so
far for relaxing queries based on topological predicates when they return an empty or insufficient answer, in order to improve
result quality and user satisfaction. In this paper, we consider this problem and we present a general relaxation strategy
for, possibly multi-domain, topological selection and join queries. Two specific semantics are also provided: the first applies
the minimum amount of relaxation in order to get an acceptable answer; the second relaxes the given query of a certain fixed
amount, depending on the considered topological predicate. Index-based processing algorithms, for efficiently executing relaxed
queries based on the proposed semantics, are also presented and a specific topological similarity function, to be used for
relaxation purposes, is proposed. Experimental results show that the overhead given by query relaxation is acceptable. 相似文献