共查询到20条相似文献,搜索用时 15 毫秒
1.
New applications of information systems need to integrate a large number of heterogeneous databases over computer networks. Answering a query in these applications usually involves selecting relevant information sources and generating a query plan to combine the data automatically. As significant progress has been made in source selection and plan generation, the critical issue has been shifting to query optimization. This paper presents a semantic query optimization (SQO) approach to optimizing query plans of heterogeneous multidatabase systems. This approach provides global optimization for query plans as well as local optimization for subqueries that retrieve data from individual database sources. An important feature of our local optimization algorithm is that we prove necessary and sufficient conditions to eliminate an unnecessary join in a conjunctive query of arbitrary join topology. This feature allows our optimizer to utilize more expressive relational rules to provide a wider range of possible optimizations than previous work in SQO. The local optimization algorithm also features a new data structure called AND-OR implication graphs to facilitate the search for optimal queries. These features allow the global optimization to effectively use semantic knowledge to reduce the data transmission cost. We have implemented this approach in the PESTO (Plan Enhancement by SemanTic Optimization) query plan optimizer as a part of the SIMS information mediator. Experimental results demonstrate that PESTO can provide significant savings in query execution cost over query plan execution without optimization 相似文献
2.
在分布式数据库系统中,由于数据的分布和冗余,使得分布式查询处理增加了许多新的内容和复杂性,通过分析现有分布式数据库查询处理技术,根据应用实际提出一种新的查询处理方法,该方法通过将常用查询结果存储在本地来减少查询时的数据传输量,从而缩短了响应时间.实验证明了该方法是有效的. 相似文献
3.
Semantic caching and query processing 总被引:2,自引:0,他引:2
Qun Ren Dunham M.H. Kumar V. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(1):192-210
Semantic caching is very attractive for use in distributed systems due to the reduced network traffic and the improved response time. It is particularly efficient for a mobile computing environment, where the bandwidth of wireless links is a major performance bottleneck. Previous work either does not provide a formal semantic caching model, or lacks efficient query processing strategies. This paper extends the existing research in three ways: formal definitions associated with semantic caching are presented, query processing strategies are investigated and, finally, the performance of the semantic cache model is examined through a detailed simulation study. 相似文献
4.
In this paper, we propose an intelligent distributed query processing method considering the characteristics of a distributed ontology environment. We suggest more general models of the distributed ontology query and the semantic mapping among distributed ontologies compared with the previous works. Our approach rewrites a distributed ontology query into multiple distributed ontology queries using the semantic mapping, and we can obtain the integrated answer through the execution of these queries. Furthermore, we propose a distributed ontology query processing algorithm with several query optimization techniques: pruning rules to remove unnecessary queries, a cost model considering site load balancing and caching, and a heuristic strategy for scheduling plans to be executed at a local site. Finally, experimental results show that our optimization techniques are effective to reduce the response time. 相似文献
5.
Beomseok Nam Minho Shin Henrique Andrade Alan Sussman 《Journal of Parallel and Distributed Computing》2010
In distributed query processing systems, load balancing plays an important role in maximizing system throughput. When queries can leverage cached intermediate results, improving the cache hit ratio becomes as important as load balancing in query scheduling, especially when dealing with computationally expensive queries. The scheduling policies must be designed to take into consideration the dynamic contents of the distributed caching infrastructure. In this paper, we propose and discuss several distributed query scheduling policies that directly consider the available cache contents by employing distributed multidimensional indexing structures and an exponential moving average approach to predicting cache contents. These approaches are shown to produce better query plans and faster query response times than traditional scheduling policies that do not predict dynamic contents in distributed caches. We experimentally demonstrate the utility of the scheduling policies using MQO, which is a distributed, Grid-enabled, multiple query processing middleware system we developed to optimize query processing for data analysis and visualization applications. 相似文献
6.
Semantic query optimization, or knowledge-based query optimization, has received increasing interest in recent years. The authors provide an effective and systematic approach to optimizing queries by appropriately choosing semantically equivalent transformations. Basically, there are two different types of transformations: transformations by eliminating unnecessary joins, and transformations by adding/eliminating redundant beneficial/nonbeneficial selection operations (restrictions). A necessary and sufficient condition to eliminate a single unnecessary join is provided. We prove that it is 𝒩𝒫-𝒞omplete to eliminate as many unnecessary joins as possible for various types of acyclic queries with the exception of the closure chain queries whose query graphs are chains and all equi-join attributes are distinct. An algorithm is provided to minimize the number of joins in tree queries. This algorithm has an important property that, when applied to a closure chain query, it will yield an optimal solution with the time complexity O(n*m), where n is the number of relations referenced in the chain query, and m is the time complexity of a restriction closure computation 相似文献
7.
隐私保护是当前数据挖掘领域中一个十分重要的研究问题,其目标是要在不精确访问真实原始数据的条件下,得到准确的模型和分析结果.为了提高对隐私数据的保护程度和挖掘结果的准确性,提出一种基于RSA算法的隐私保护挖掘方法.介绍了公共密钥加密算法RSA的概念,证明了RSA算法的可交换性和加密结果惟一性.然后采用RSA算法,引入了计算中心和混合中心,对原始数据进行了变换和隐藏,实现了保持隐私数据挖掘.最后,对算法的安全性、公平性,有效性和复杂度进行了分析. 相似文献
8.
In this paper, we will discuss a system that semantically interprets a formal database accessing language and generates natural language from this interpretation. In the past, the major way of communication between a user and a database was by means of a formal language. One such language is the SQL query language. Even though constructed as a user friendly language, SQL exemplifies the same difficulties for users as do other formal languages, namely a fairly rigid syntax, the necessity of variable binding, the lack of pronouns, and in the case of erroneous queries error messages that do not provide much insight. To alleviate some of the formal language problems, yet utilize the power of the formal language, we set out to build a natural language ‘umbrella’ for the SQL user. Our goal was not to build a natural language query system, but rather to use semantic knowledge and natural language for paraphrasing the formal language (SQL) and producing error messages as a feedback mechanism. In this way we build a genuine help facility, which would not only aid the user in dealing with SQL, but also trap erroneous queries. 相似文献
9.
Jorge-Arnulfo Quiané-Ruiz Philippe Lamarre Patrick Valduriez 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(3):649-674
In large-scale distributed information systems, where participants are autonomous and have special interests for some queries,
query allocation is a challenge. Much work in this context has focused on distributing queries among providers in a way that
maximizes overall performance (typically throughput and response time). However, preserving the participants’ interests is
also important. In this paper, we make the following contributions. First, we provide a model to define the participants’
perception of the system regarding their interests and propose measures to evaluate the quality of query allocation methods.
Then, we propose a framework for query allocation called Satisfaction-based Query Load Balancing (SQLB, for short), which dynamically trades consumers’ interests for providers’ interests based on their satisfaction. Finally, we compare SQLB, through experimentation, with two important baseline query allocation methods, namely Capacity based and Mariposa-like. The results demonstrate that SQLB yields high efficiency while satisfying the participants’ interests and significantly outperforms the baseline methods.
Work partially funded by ARA “Massive Data” of the French ministry of research (Respire project) and the European Strep Grid4All
project. 相似文献
10.
Yu C.T. Guh K.-C. Brill D. Chen A.L.P. 《IEEE transactions on pattern analysis and machine intelligence》1989,15(6):780-793
A partition-and-replicate strategy for processing distributed queries referencing no fragmented relation is sketched. An algorithm is given to determine which relation and which copy of the relation is to be partitioned into fragments, how the relation is to be partitioned, and where the fragments are to be sent for processing. Simulation results show that the partition strategy is useful for processing queries in fast local network environments. The results also show that the number of partitions does not need to be large. The use of semijoins in the partition strategy is discussed. A necessary and sufficient condition for a semijoin to yield an improvement is provided 相似文献
11.
The authors discuss various performance issues in distributed query processing. They validate and evaluate the performance of the local reduction (LR) the fragment and replicate strategy (FRS) and the partition and replicate strategy (PRS) optimization algorithms. The experimental results reveal that the choices made by these algorithms concerning which local operations should be performed, which relation should remain fragmented or which relation should be partitioned are valid. It is shown using experimental results that various parameters, such as the number of processing sites, partitioning speed relative to join speed, and sizes of the join relations, affect the performance of PRS significantly. It is also shown that the response times of query execution are affected significantly by the degree of site autonomy, interferences among processes, interface with the local database management systems (DBMSs) and communications facilities. Pipeline strategies for processing queries in an environment where relations are fragmented are studied 相似文献
12.
Bodorik P. Riordon J.S. Pyra J.S. 《Knowledge and Data Engineering, IEEE Transactions on》1992,4(3):253-265
Most algorithms for determining query processing strategies in distributed databases are static in nature; that is, the strategy is completely determined on the basis of a priori estimates of the size of intermediate results, and it remains unchanged throughout its execution. The static approach may be far from optimal because it denies the opportunity to reschedule operations if size estimates are found to be inaccurate. Adaptive query execution may be used to alleviate this problem. A low overhead delay method is proposed to decide when to correct a strategy. Sampling is used to estimate the size of relations, and alternative heuristic strategies prepared in a background mode are used to decide when to correct. Evaluation using a model of a distributed database indicates that the heuristic strategies are near optimal. Moreover, it also suggests that it is usually correct to abort creation of an intermediate relation which is much larger than predicted 相似文献
13.
Yongluan Zhou Beng Chin Ooi Kian-Lee Tan Wee Hyong Tok 《Data & Knowledge Engineering》2005,53(3):1-309
Traditionally, distributed query optimization techniques generate static query plans at compile time. However, the optimality of these plans depends on many parameters (such as the selectivities of operations, the transmission speeds and workloads of servers) that are not only difficult to estimate but are also often unpredictable and fluctuant at runtime. As the query processor cannot dynamically adjust the plans at runtime, the system performance is often less than satisfactory. In this paper, we introduce a new highly adaptive distributed query processing architecture. Our architecture can quickly detect fluctuations in selectivities of operations, as well as transmission speeds and workloads of servers, and accordingly change the operation order of a distributed query plan during execution. We have implemented a prototype based on the Telegraph system [Telegragraph project. Available from >]. Our experimental study shows that our mechanism can adapt itself to the changes in the environment and hence approach to an optimal plan during execution. 相似文献
14.
Perrizo W. Lin J.Y.Y. Hoffman W. 《Knowledge and Data Engineering, IEEE Transactions on》1989,1(2):215-225
Distributed query-processing algorithms for broadcast local-area networks are described which provide execution strategies and estimates of response time. Four semijoin-specific techniques, five transmission-specific techniques, and three size estimation update functions are incorporated into a baseline algorithm. These variants of the baseline algorithm are simulated and their response times compared using randomly generated data. The technique found to be most beneficial, on the average, for general queries involving multiple joining attributes is the composite semijoining technique. When used in combination with composite semijoining, relation transmission and bit-matrix transmission further reduce response time. Bit-matrix transmission is a data compression technique in which single attributes and a bit matrix of the value pairings are sent in place of a composite attribute. The authors examine these techniques in detail and compare their expected response times 相似文献
15.
王颖 《计算机工程与设计》2007,28(4):770-772
根据空间数据源的特点给出一种表示空间数据源能力信息的方法,包括导出模式、查询能力和转换能力.在此基础上查询计算引擎针对用户查询集成多个分布式空间数据源的能力,通过构造模式图和函数图为用户查询构造相应的查询转换步骤,使用户能够仅给出单一查询,系统可以完全自动地访问多个空间数据源从而返回最终查询结果.该系统可作为空间信息集成的一个重要模块,并具有很强的可扩展性. 相似文献
16.
分布式数据挖掘中间层 总被引:3,自引:0,他引:3
对如何简化机群系统上分布式数据挖掘系统的开发和维护,给出了一个完整的解决方案,并对数据挖掘系统的非算法部分进行深入的研究,给出了数据分布式存储、数据缓冲机制和负载平衡策略3个关键优化技术,并在实际应用中加以实现。 相似文献
17.
Chihping Wang Ming-Syan Chen 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(4):650-662
While a significant amount of research efforts has been reported on developing algorithms, based on joins and semijoins, to tackle distributed query processing, there is relatively little progress made toward exploring the complexity of the problems studied. As a result, proving NP-hardness of or devising polynomial-time algorithms for certain distributed query optimization problems has been elaborated upon by many researchers. However, due to its inherent difficulty, the complexity of the majority of problems on distributed query optimization remains unknown. In this paper we generally characterize the distributed query optimization problems and provide a frame work to explore their complexity. As it will be shown, most distributed query optimization problems can be transformed into an optimization problem comprising a set of binary decisions, termed Sum Product Optimization (SPO) problem. We first prove SPO is NP-hard in light of the NP-completeness of a well-known problem, Knapsack (KNAP). Then, using this result as a basis, we prove that five classes of distributed query optimization problems, which cover the majority of distributed query optimization problems previously studied in the literature, are NP-hard by polynomially reducing SPO to each of them. The detail for each problem transformation is derived. We not only prove the conjecture that many prior studies relied upon, but also provide a frame work for future related studies 相似文献
18.
Antony Browne 《Neural Processing Letters》1996,3(2):73-79
The concept of distribution is often encountered in neural network architectures without any formal quantification. A method of quantifying the amount of distribution present in the hidden layer representations of a feed-forward network with binary inputs is described. 相似文献
19.
A. MukherjeeAuthor Vitae P. Watson Author Vitae 《Future Generation Computer Systems》2012,28(1):171-183
Grid computing enables users to perform computationally expensive applications on distributed resources acquired dynamically. Users are allowed to combine structured data and analysis components into new applications from distributed sites into new applications. Distributed query processing offers an established way of structuring such computations, and well-known tools like OGSA-DAI and OGSA-DQP provide respectively a common interface to heterogeneous databases, and a way of exploiting distributed resources. Such significant benefits are however often undermined by high communication costs due to the need to move data between distributed resources. This paper describes an approach that addresses this by dynamically deploying query processing engines, analysis services and databases within virtual machines, on an internet-scale, so as to reduce communication costs. Results of internet-scale experiments are presented to demonstrate the performance benefits. Further, the use of dynamic deployment features based on requirements allows the creation of an ad-hoc runtime engine and thus opens up the possibility of creating a virtual marketplace for software and hardware resources. 相似文献
20.
This work introduces decentralized query processing techniques based on MIDAS, a novel distributed multidimensional index. In particular, MIDAS implements a distributed k-d tree, where leaves correspond to peers, and internal nodes dictate message routing. MIDAS requires that peers maintain little network information, and features mechanisms that support fault tolerance and load balancing. The proposed algorithms process point and range queries over the multidimensional indexed space in only O(log n) hops in expectance, where n is the network size. For nearest neighbor queries, two processing alternatives are discussed. The first, termed eager processing, has low latency (expected value of O(log n) hops) but may involve a large number of peers. The second, termed iterative processing, has higher latency (expected value of O(log2 n) hops) but involves far fewer peers. A detailed experimental evaluation demonstrates that our query processing techniques outperform existing methods for settings involving real spatial data as well as in the case of high dimensional synthetic data. 相似文献