期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Deciding to correct distributed query processing

Bodorik P. Riordon J.S. Pyra J.S. 《Knowledge and Data Engineering, IEEE Transactions on》1992,4(3):253-265

Most algorithms for determining query processing strategies in distributed databases are static in nature; that is, the strategy is completely determined on the basis of a priori estimates of the size of intermediate results, and it remains unchanged throughout its execution. The static approach may be far from optimal because it denies the opportunity to reschedule operations if size estimates are found to be inaccurate. Adaptive query execution may be used to alleviate this problem. A low overhead delay method is proposed to decide when to correct a strategy. Sampling is used to estimate the size of relations, and alternative heuristic strategies prepared in a background mode are used to decide when to correct. Evaluation using a model of a distributed database indicates that the heuristic strategies are near optimal. Moreover, it also suggests that it is usually correct to abort creation of an intermediate relation which is much larger than predicted 相似文献

2.

A knowledge-based approach to multiple query processing

J. T. Park T. J. Teorey S. Lafortune 《Data & Knowledge Engineering》1989,3(4):261-284

The collective processing of multiple queries in a database system has recently received renewed attention due to its capability of improving the overall performance of a database system and its applicability to the design of knowledge-based expert systems and extensible database systems. A new multiple query processing strategy is presented which utilizes semantic knowledge on data integrity and information on predicate conditions of the access paths (plans) of queries. The processing of multiple queries is accomplished by the utilization of subset relationships between intermediate results of query executions, which are inferred employing both semantic and logical information. Given a set of fixed order access plans, the A^* algorithm is used to find the set of reformulated access plans which is optimal for a given collection of semantic knowledge. 相似文献

3.

Distributed stream join query processing with semijoins

Tri Minh Tran Byung Suk Lee 《Distributed and Parallel Databases》2010,27(3):211-254

This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively. 相似文献

4.

Fast graph query processing with a low-cost index

James Cheng Yiping Ke Ada Wai-Chee Fu Jeffrey Xu Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(4):521-539

This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a supergraph. Existing works usually construct an index and performs a filtering-and-verification process, which still requires many subgraph isomorphism testings. There are also significant overheads in both index construction and maintenance. In this paper, we design a graph querying system that achieves both fast indexing and efficient query processing. The index is constructed by a simple but fast method of extracting the commonality among the graphs, which does not involve any costly operation such as graph mining. Our query processing has two key techniques, direct inclusion and filtering. Direct inclusion allows partial query answers to be included directly without candidate verification. Our filtering technique further reduces the candidate set by operating on a much smaller projected database. Experimental results show that our method is significantly more efficient than the existing works in both indexing and query processing, and our index has a low maintenance cost. 相似文献

5.

一种多路空间距离连接查询处理方法

梁银张虹《计算机应用》2008,28(1):155-158

为了解决多路空间距离连接查询问题,提出了一种基于R树的非增量递归算法。该算法采用深度优先递归搜索策略,同步遍历n个空间数据集对应的R树,算法结束时,同时返回K个距离最短的n元组。并且采用基于距离的平面扫描技术对该算法进行了优化,有效减少磁盘访问次数和CPU响应时间。最后,通过实验验证了算法的有效性。相似文献

6.

A query processing algorithm for a system of heterogeneous distributed databases

Csaba J. Egyhazy Konstantinos P. Triantis Bharat Bhasker 《Distributed and Parallel Databases》1996,4(1):49-79

This paper presents a query processing algorithm, formulated and developed in support of the prototype architecture of the Distributed Access View Integrated Database (DAVID) which is a heterogeneous distributed database management system. The objective of the proposed query processing algorithm is to produce an inexpensive strategy for a given query. The inexpensive query strategy is obtained primarily by computing the most profitable semi-joins and by determining the best sequence of join operations per processing site. The latter is obtained by applying a zero-one integer linear program that uses a non-parametric statistical estimation technique to compute the sizes of the temporary clusters. A cluster is a subset of the cartesian product of a list of atomic and non-atomic domains and is the structure that can represent in a uniform way data stored in relational, hierarchical and network databases.Following some background information on the development of the DAVID prototype, this paper introduces the schema architecture. The schema architecture describes the mechanism by which the component heterogeneous database schemata are mapped into the uniform global schema. This is followed by the formulation of the query processing algorithm, its implementation and an illustration of its use in the context of NASA's Astrophysics Data System.Recommended by: Y. Breitbart 相似文献

7.

Performance issues in distributed query processing

Liu C. Yu C. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(8):889-905

The authors discuss various performance issues in distributed query processing. They validate and evaluate the performance of the local reduction (LR) the fragment and replicate strategy (FRS) and the partition and replicate strategy (PRS) optimization algorithms. The experimental results reveal that the choices made by these algorithms concerning which local operations should be performed, which relation should remain fragmented or which relation should be partitioned are valid. It is shown using experimental results that various parameters, such as the number of processing sites, partitioning speed relative to join speed, and sizes of the join relations, affect the performance of PRS significantly. It is also shown that the response times of query execution are affected significantly by the degree of site autonomy, interferences among processes, interface with the local database management systems (DBMSs) and communications facilities. Pipeline strategies for processing queries in an environment where relations are fragmented are studied 相似文献

8.

A graph theoretical approach to states and unitary operations

Dutta Supriyo Adhikari Bibhas Banerjee Subhashish 《Quantum Information Processing》2016,15(5):2193-2212

Quantum Information Processing - Building upon our previous work, on graphical representation of a quantum state by signless Laplacian matrix, we pose the following question. If a local unitary... 相似文献

9.

An adaptable distributed query processing architecture

Yongluan Zhou Beng Chin Ooi Kian-Lee Tan Wee Hyong Tok 《Data & Knowledge Engineering》2005,53(3):1-309

Traditionally, distributed query optimization techniques generate static query plans at compile time. However, the optimality of these plans depends on many parameters (such as the selectivities of operations, the transmission speeds and workloads of servers) that are not only difficult to estimate but are also often unpredictable and fluctuant at runtime. As the query processor cannot dynamically adjust the plans at runtime, the system performance is often less than satisfactory. In this paper, we introduce a new highly adaptive distributed query processing architecture. Our architecture can quickly detect fluctuations in selectivities of operations, as well as transmission speeds and workloads of servers, and accordingly change the operation order of a distributed query plan during execution. We have implemented a prototype based on the Telegraph system [Telegragraph project. Available from >]. Our experimental study shows that our mechanism can adapt itself to the changes in the environment and hence approach to an optimal plan during execution. 相似文献

10.

A network-centric approach to space-restricted distributed processing

M.R. E. F.H. 《Microprocessors and Microsystems》2009,33(5-6):356-364

The Space Efficient Embedded Cluster (SEEC) is a new system that offers a practical solution to space-restricted distributed processing. Utilising Linux compatible, embedded network controllers and standard Beowulf libraries presents developers with an easy to use, modular, distributed architecture. Networking characteristics are examined to quantify MPI operational overheads. Analysis in terms of weight and volume is undertaken when compared to a reference PC system for the example application RC5. A similar analysis is presented for DGEMM when utilising a possible, modular, per node FPGA enhancement. 相似文献

11.

面向移动计算环境的单连接查询处理模式研究

王敏朱玉全张春芬《计算机应用》2007,27(11):2756-2759

在移动计算环境下，基于准确的操作代价估算结果来选择合适的连接查询处理模式，可以减少数据的传输量和移动设备的能量消耗。探讨了该环境下移动设备能量消耗的一个新的非对称特征，提出了一种操作代价估算方法，并从数据传输量和能量消耗两个方面对连接查询处理模式进行了代价估算和性能比较，提出了4个实用准则，以指导连接查询处理模式的选择。试验结果充分论证了估算方法和准则的正确性，且比现有同类估算模型和结论具有更加广泛的应用范围。相似文献

12.

一种基于图归约的XPath高性能流数据查询方法

《微型机与应用》2017,(15):16-21

作为网络数据交换和数据共享的标准,XML数据越来越多地用于表示应用系统的流数据。然而,受制于流数据处理有限空间开销等特征,如何高效地实现这种查询成为值得探讨的问题。与传统的基于自动机或层次栈方法不同,文中提出了一种基于图归约的XML查询自动机(GRAT),采用一种图结构来表示针对不同XML流元素的子查询任务之间的关系,通过图的归约变化来实现XPath查询。实验结果表明,基于GRAT的查询算法能够高效地完成复杂的XML查询,流数据处理的吞吐量达到了较高水平。相似文献

13.

分布式数据库中查询处理的新方法研究

张文东石小艳李明壮夏伟伟《计算机工程与设计》2007,28(19):4600-4602

在分布式数据库系统中,由于数据的分布和冗余,使得分布式查询处理增加了许多新的内容和复杂性,通过分析现有分布式数据库查询处理技术,根据应用实际提出一种新的查询处理方法,该方法通过将常用查询结果存储在本地来减少查询时的数据传输量,从而缩短了响应时间.实验证明了该方法是有效的. 相似文献

14.

A workload-driven approach to database query processing in the cloud

Adnene Guabtni Rajiv Ranjan Fethi A. Rabhi 《The Journal of supercomputing》2013,63(3):722-736

This paper is concerned with data provisioning services (information search, retrieval, storage, etc.) dealing with a large and heterogeneous information repository. Increasingly, this class of services is being hosted and delivered through Cloud infrastructures. Although such systems are becoming popular, existing resource management methods (e.g. load-balancing techniques) do not consider workload patterns nor do they perform well when subjected to non-uniformly distributed datasets. If these problems can be solved, this class of services can be made to operate in more a scalable, efficient, and reliable manner. The main contribution of this paper is a approach that combines proprietary cloud-based load balancing techniques and density-based partitioning for efficient range query processing across relational database-as-a-service in cloud computing environments. The study is conducted over a real-world data provisioning service that manages a large historical news database from Thomson Reuters. The proposed approach has been implemented and tested as a multi-tier web application suite consisting of load-balancing, application, and database layers. We have validated our approach by conducting a set of rigorous performance evaluation experiments using the Amazon EC2 infrastructure. The results prove that augmenting a cloud-based load-balancing service (e.g. Amazon Elastic Load Balancer) with workload characterization intelligence (density and distribution of data; composition of queries) offers significant benefits with regards to the overall system’s performance (i.e. query latency and database service throughput). 相似文献

15.

Index-based query processing on distributed multidimensional data

George Tsatsanifos Dimitris Sacharidis Timos Sellis 《GeoInformatica》2013,17(3):489-519

This work introduces decentralized query processing techniques based on MIDAS, a novel distributed multidimensional index. In particular, MIDAS implements a distributed k-d tree, where leaves correspond to peers, and internal nodes dictate message routing. MIDAS requires that peers maintain little network information, and features mechanisms that support fault tolerance and load balancing. The proposed algorithms process point and range queries over the multidimensional indexed space in only O(log n) hops in expectance, where n is the network size. For nearest neighbor queries, two processing alternatives are discussed. The first, termed eager processing, has low latency (expected value of O(log n) hops) but may involve a large number of peers. The second, termed iterative processing, has higher latency (expected value of O(log² n) hops) but involves far fewer peers. A detailed experimental evaluation demonstrates that our query processing techniques outperform existing methods for settings involving real spatial data as well as in the case of high dimensional synthetic data. 相似文献

16.

An intelligent query processing for distributed ontologies

Jihyun Lee Author Vitae Jun-Ki Min^{Author Vitae} 《Journal of Systems and Software》2010,83(1):85-95

In this paper, we propose an intelligent distributed query processing method considering the characteristics of a distributed ontology environment. We suggest more general models of the distributed ontology query and the semantic mapping among distributed ontologies compared with the previous works. Our approach rewrites a distributed ontology query into multiple distributed ontology queries using the semantic mapping, and we can obtain the integrated answer through the execution of these queries. Furthermore, we propose a distributed ontology query processing algorithm with several query optimization techniques: pruning rules to remove unnecessary queries, a cost model considering site load balancing and caching, and a heuristic strategy for scheduling plans to be executed at a local site. Finally, experimental results show that our optimization techniques are effective to reduce the response time. 相似文献

17.

基于查询优化器的分布式空间查询优化方法

林键刘仁义刘南张丰《计算机工程与应用》2012,48(22):161-165

为了实现分布式空间数据库之间的互操作,需要对分布式查询进行优化处理,这种查询处理指的是在任何一个数据处理语句中它访问的是各个节点的数据而不是仅仅对发起查询的节点。提出了一种查询优化器的体系结构,针对上述查询最优化做了详细的讨论,着重讨论包含空间选择和连接的复杂空间查询。建立了典型的空间数据库的案例程序,通过分析表明,带有过滤和修正的查询优化器在时间与空间上的效率优势比较明显,获得了具有参考价值的结果。相似文献

18.

Topological operators: a relaxed query processing approach

Alberto Belussi Barbara Catania Paola Podestà 《GeoInformatica》2012,16(1):67-110

Relaxation and approximation techniques have been proposed as approaches for improving the quality of query results, in terms of completeness and accuracy, in environments where the user may not be able to specify the query in a complete and exact way, since data are quite heterogeneous or she may not know all the characteristics of data at hand. This problem, mainly addressed for relational and XML data, is nowadays quite relevant also for geo-spatial data, due to their increasing usage in highly critical decisional processes. Among geo-spatial queries, those based on spatial and more precisely topological relations are currently used in an increasing number of applications. As far as we know, no approach has been proposed so far for relaxing queries based on topological predicates when they return an empty or insufficient answer, in order to improve result quality and user satisfaction. In this paper, we consider this problem and we present a general relaxation strategy for, possibly multi-domain, topological selection and join queries. Two specific semantics are also provided: the first applies the minimum amount of relaxation in order to get an acceptable answer; the second relaxes the given query of a certain fixed amount, depending on the considered topological predicate. Index-based processing algorithms, for efficiently executing relaxed queries based on the proposed semantics, are also presented and a specific topological similarity function, to be used for relaxation purposes, is proposed. Experimental results show that the overhead given by query relaxation is acceptable. 相似文献

19.

Index design and query processing for graph conductance search

Soumen Chakrabarti Amit Pathak Manish Gupta 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(3):445-470

Graph conductance queries, also known as personalized PageRank and related to random walks with restarts, were originally proposed to assign a hyperlink-based prestige score to Web pages. More general forms of such queries are also very useful for ranking in entity-relation (ER) graphs used to represent relational, XML and hypertext data. Evaluation of PageRank usually involves a global eigen computation. If the graph is even moderately large, interactive response times may not be possible. Recently, the need for interactive PageRank evaluation has increased. The graph may be fully known only when the query is submitted. Browsing actions of the user may change some inputs to the PageRank computation dynamically. In this paper, we describe a system that analyzes query workloads and the ER graph, invests in limited offline indexing, and exploits those indices to achieve essentially constant-time query processing, even as the graph size scales. Our techniques—data and query statistics collection, index selection and materialization, and query-time index exploitation—have parallels in the extensive relational query optimization literature, but is applied to supporting novel graph data repositories. We report on experiments with five temporal snapshots of the CiteSeer ER graph having 74–702 thousand entity nodes, 0.17–1.16 million word nodes, 0.29–3.26 million edges between entities, and 3.29–32.8 million edges between words and entities. We also used two million actual queries from CiteSeer’s logs. Queries run 3–4 orders of magnitude faster than whole-graph PageRank, the gap growing with graph size. Index size is smaller than a text index. Ranking accuracy is 94–98% with reference to whole-graph PageRank. 相似文献

20.

一种高效的分布式序敏感轮廓查询处理算法

下载免费PDF全文

王刚邓波曾玮琳《计算机工程与应用》2008,44(26):162-165

提出了一种新颖的分布环境中的序敏感轮廓查询算法（即找出不被别的对象所“支配”的且聚集值较高的对象）。现有的算法在节点数m较大时会消耗大量的网络带宽。提出了一种新的分布式序敏感轮廓查询处理算法（Distributed Rank-aware Skylining,DRS）。DRS算法在任意数据集上只需要4次交互就能完成,并且通过剪除不必要的对象来减少通讯代价。通过模拟数据验证了DRS算法的效率。实验表明,当节点数m大于4时,DRS算法性能优于现有算法的性能。相似文献