期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multiple query scheduling for distributed semantic caches

Beomseok Nam Minho Shin Henrique Andrade Alan Sussman 《Journal of Parallel and Distributed Computing》2010

In distributed query processing systems, load balancing plays an important role in maximizing system throughput. When queries can leverage cached intermediate results, improving the cache hit ratio becomes as important as load balancing in query scheduling, especially when dealing with computationally expensive queries. The scheduling policies must be designed to take into consideration the dynamic contents of the distributed caching infrastructure. In this paper, we propose and discuss several distributed query scheduling policies that directly consider the available cache contents by employing distributed multidimensional indexing structures and an exponential moving average approach to predicting cache contents. These approaches are shown to produce better query plans and faster query response times than traditional scheduling policies that do not predict dynamic contents in distributed caches. We experimentally demonstrate the utility of the scheduling policies using MQO, which is a distributed, Grid-enabled, multiple query processing middleware system we developed to optimize query processing for data analysis and visualization applications. 相似文献

2.

一种分布式环境中的二分式多层网格skyline算法

下载免费PDF全文

丁日强《计算机工程与应用》2013,49(18):116-119

skyline计算在数据挖掘、多标准决策和数据库可视化等领域有着非常重要的作用,这些年已经得到了广泛的关注,以往对于skyline查询的研究大多集中在处理集中的数据集上,即集中式skyline查询,已经得到了很多的研究成果。然而,实际情况是：相关数据几乎分散在几个不同的服务器上,因此在分布式环境中的skyline查询计算需要从各个服务器收集大量的数据;现有的在分布式环境中的skyline查询方法有两个主要问题：一是skyline查询的处理时间较慢;二是在网络中服务器之间传输了很多不必要的重叠数据。提出了一种二分式多层网格法（DMLG）,可以有效地处理在分布式环境中的skyline查询。该方法利用网格的方法,借鉴二分法,最大限度地减少了不必要的重叠数据传输,基于不同的数据集的实验表明,这种方法优于现有的方法。相似文献

3.

An intelligent query processing for distributed ontologies

Jihyun Lee Author Vitae Jun-Ki Min^{Author Vitae} 《Journal of Systems and Software》2010,83(1):85-95

In this paper, we propose an intelligent distributed query processing method considering the characteristics of a distributed ontology environment. We suggest more general models of the distributed ontology query and the semantic mapping among distributed ontologies compared with the previous works. Our approach rewrites a distributed ontology query into multiple distributed ontology queries using the semantic mapping, and we can obtain the integrated answer through the execution of these queries. Furthermore, we propose a distributed ontology query processing algorithm with several query optimization techniques: pruning rules to remove unnecessary queries, a cost model considering site load balancing and caching, and a heuristic strategy for scheduling plans to be executed at a local site. Finally, experimental results show that our optimization techniques are effective to reduce the response time. 相似文献

4.

基于super-peer的连续查询策略

余敏李战怀张龙波《计算机工程与应用》2006,42(1):9-12

连续查询是能够执行较长的一段时间、用来监视底层的数据流语义来触发用户定义的行为的查询,它将被动的网络结构转换成主动的网络结构,在大量数据被频繁地远程更新的分布式网络环境中特别有用.目前,连续查询领域已经成为倍受关注的P2P应用环境.现有P2P连续查询系统存在一定缺陷,作者提出一种基于super-peer的连续查询策略进行相似查询聚簇来减少重复操作,并提出相应的负载平衡算法对查询聚簇进行微调, 改善连续查询网络的负载平衡.该策略能有效地避免洪泛整个网络,具有很好的可扩展性;它不限制系统动态性,不易产生瓶颈,能在尽量少地影响查询聚簇的情况下,改善连续查询网络的负载平衡. 相似文献

5.

Efficient Distributed Skyline Queries for Mobile Applications 总被引：3，自引：0，他引：3

下载免费PDF全文

Ying-Yuan Xiao 《计算机科学技术学报》2010,25(3):523-536

In this paper, we consider skyline queries in a mobile and distributed environment, where data objects are distributed in some sites (database servers) which are interconnected through a high-speed wired network, and queries are issued by mobile units (laptop, cell phone, etc.) which access the data objects of database servers by wireless channels. The inherent properties of mobile computing environment such as mobility, limited wireless bandwidth, frequent disconnection, make skyline queries more complicated. We show how to efficiently perform distributed skyline queries in a mobile environment and propose a skyline query processing approach, called efficient distributed skyline based on mobile computing (EDS-MC). In EDS-MC, a distributed skyline query is decomposed into five processing phases and each phase is elaborately designed in order to reduce the network communication, network delay and query response time. We conduct extensive experiments in a simulated mobile database system, and the experimental results demonstrate the superiority of EDS-MC over other skyline query processing techniques on mobile computing. 相似文献

6.

Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments

《Parallel Computing》2007,33(7-8):497-520

In this paper, we present a multi-query optimization framework based on the concept of active semantic caching. The framework permits the identification and transparent reuse of data and computation in the presence of multiple queries (or query batches) that specify user-defined operators and aggregations originating from scientific data-analysis applications. We show how query scheduling techniques, coupled with intelligent cache replacement policies, can further improve the performance of query processing by leveraging the active semantic caching operators. We also propose a methodology for functionally decomposing complex queries in terms of primitives so that multiple reuse sites are exposed to the query optimizer, to increase the amount of reuse. The optimization framework and the database system implemented with it are designed to be efficient irrespective of the underlying parallel and/or distributed machine configuration. We present experimental results highlighting the performance improvements obtained by our methods using real scientific data-analysis applications on multiple parallel and distributed processing configurations (e.g., single symmetric multiprocessor (SMP) machine, cluster of SMP nodes, and a Grid computing configuration). 相似文献

7.

A Tiered System for Serving Differentiated Content

Huamin Chen Arun Iyengar 《World Wide Web》2003,6(4):331-352

Contemporary Web sites typically consist of front–end Web servers, application servers, and back-end information systems such as database servers. There has been limited research on how to provide overload control and service differentiation for the back-end systems. In this paper we propose an architecture called tiered service (TS) for these purposes. In TS, there are several heterogeneous back-end systems to serve the Web applications. The Web applications communicate with a routing intermediary to intelligently route the queries to the appropriate back-end servers based on various policies such as client profiles and server load. In our system the back ends may store different qualities of data; lower quality data typically requires less overhead to serve. The main contributions of this paper include (i) a tiered content replication scheme that replicates tiered qualities of data on heterogeneous back ends with different capacity to satisfy clients with diverse requirements for latency and quality of data, and (ii) an application-transparent query routing architecture that automatically routes the queries to the appropriate back ends. The architecture was implemented in our test bed, and its performance was benchmarked. The experimental results demonstrate that TS offers significant performance improvement. 相似文献

8.

A new fuzzy-decision based load balancing system for distributed object computing

《Journal of Parallel and Distributed Computing》2004,64(2):238-253

Distributed object computing systems are widely envisioned to be the desired distributed software development paradigm due to the higher modularity and the capability of handling machine and operating system heterogeneity. Indeed, enabled by the tremendous advancements in processor and networking technologies, complex operations such as object serialization and data marshaling have become very efficient, and thus, distributed object systems are being built for many different applications. However, as the system scales up (e.g., with larger number of server and client objects, and more machines), a judicious load balancing system is required to efficiently distribute the workload (e.g., the queries, messages/objects passing) among the different servers in the system. Unfortunately, in existing distributed object middleware systems, such a load balancing facility does not exist. In this paper, we present the design and implementation of a new dynamic fuzzy-decision-based load balancing system incorporated in a distributed object computing environment. Our proposed approach works by using a fuzzy logic controller which informs a client object to use the most appropriate service such that load balancing among servers is achieved. We have chosen Jini to build our experimental middleware platform, on which our proposed approach as well as other related techniques are implemented and compared. Extensive experiments are conducted to investigate the effectiveness of our fuzzy-decision-based algorithm, which is found to be consistently better than other approaches. 相似文献

9.

Index-based query processing on distributed multidimensional data

George Tsatsanifos Dimitris Sacharidis Timos Sellis 《GeoInformatica》2013,17(3):489-519

This work introduces decentralized query processing techniques based on MIDAS, a novel distributed multidimensional index. In particular, MIDAS implements a distributed k-d tree, where leaves correspond to peers, and internal nodes dictate message routing. MIDAS requires that peers maintain little network information, and features mechanisms that support fault tolerance and load balancing. The proposed algorithms process point and range queries over the multidimensional indexed space in only O(log n) hops in expectance, where n is the network size. For nearest neighbor queries, two processing alternatives are discussed. The first, termed eager processing, has low latency (expected value of O(log n) hops) but may involve a large number of peers. The second, termed iterative processing, has higher latency (expected value of O(log² n) hops) but involves far fewer peers. A detailed experimental evaluation demonstrates that our query processing techniques outperform existing methods for settings involving real spatial data as well as in the case of high dimensional synthetic data. 相似文献

10.

Distributed processing of continuous sliding-window k-NN queries for data stream filtering

Kre?imir Pripu?i? Ivana Podnar ?arko Karl Aberer 《World Wide Web》2011,14(5-6):465-494

A sliding-window k-NN query (k-NN/w query) continuously monitors incoming data stream objects within a sliding window to identify k closest objects to a query. It enables effective filtering of data objects streaming in at high rates from potentially distributed sources, and offers means to control the rate of object insertions into result streams. Therefore k-NN/w processing systems may be regarded as one of the prospective solutions for the information overload problem in applications that require processing of structured data in real-time, such as the Sensor Web. Existing k-NN/w processing systems are mainly centralized and cannot cope with multiple data streams, where data sources are scattered over the Internet. In this paper, we propose a solution for distributed continuous k-NN/w processing of structured data from distributed streams. We define a k-NN/w processing model for such setting, and design a distributed k-NN/w processing system on top of the Content-Addressable Network (CAN) overlay. An extensive evaluation using both real and synthetic data sets demonstrates the feasibility of the proposed solution because it balances the load among the peers, while the messaging overhead within the P2P network remains reasonable. Moreover, our results clearly show the solution is scalable for an increasing number of queries and peers. 相似文献

11.

网格数据库物化查询缓存机制研究*

张延松张宇薛永生《计算机应用研究》2006,23(7):242-245

提出了基于XML Database的网格数据库物化查询缓存机制,提高用户查询的速度,均衡网格负载。定义了网格数据库服务质量与数据质量的标准,提出了物化查询选择算法MQS,为用户提供更好的数据服务。相似文献

12.

Query result caching for multiple event-driven continuous queries

Yousuke Watanabe Hiroyuki Kitagawa 《Information Systems》2010,35(1):94-110

With the increasing demands for advanced use of streaming data, efficient execution of continuous queries is an important research issue. This paper focuses on event-driven continuous queries that are activated by foreign events such as data arrival and the progression of time. Existing approaches to multiple continuous query optimization decide the optimal query plan by extracting common subexpressions from the given queries. Event-driven queries containing the common subexpressions may produce many common intermediate results when they are activated within a small interval, but may produce only disjoint data when activated at completely different timings.This paper proposes an efficient data stream processing scheme for multiple event-driven continuous queries. In the proposed approach, we introduce query result caching to achieve a flexible way to share common operators among queries activated by unpredictable events. When a query is activated, an intermediate result generated for the query is stored into the cache area if it is expected to be reused by other queries. When other queries including the same operator are activated, they reuse the cached result if the cache includes reusable data. Efficiency of the proposed scheme is validated by intensive experimental evaluations. 相似文献

13.

不确定数据上两种查询的分布式聚集算法 总被引：1，自引：1，他引：0

周逊李建中石胜飞《计算机研究与发展》2010,47(5)

不确定数据查询技术在军事、金融、电信等领域中起到了越来越重要的作用.不确定性数据在传感器网络、分布式Web Server及P2P系统等分布式系统中广泛存在.从这些系统中收集所有数据进行集中式查询将带来巨大的通信开销、时间延迟和存储代价.同时,由于不确定数据的特点,大多数集中式不确定查询算法在分布式环境下并不适用.给出不确定数据的最大值和Top-k聚集查询定义,并分别提出了基于过滤策略的分布式聚集算法.算法根据给出的3个过滤策略,利用数据的分布区间和概率进行筛选概率上限的计算,尽可能将不影响查询结果的数据抛弃.同时,算法以相对较小的代价归并保存并传输了计算最终查询结果所需要的不可丢弃数据.实验结果表明,在各类系统和数据条件下,过滤算法都能够正确地得到查询结果并显著降低系统的数据通信开销. 相似文献

14.

DICE: An Effective Query Result Cache for Distributed Storage Systems

下载免费PDF全文

Jun-Ki Min Mi-Young Lee 《计算机科学技术学报》2010,25(5):933-944

Due to the proliferation of Internet and Intranet, the distributed storage systems have received a lot of attention. These systems span a large number of machines and store huge amount of data for a lot of users. In the distributed storage systems, a row can be directly accessed using a row key. We concentrate on a problem of efficient processing of queries whose predicate is on a column but not a row key. In this paper, we present a cache management technique, called DICE which maintains query results of range queries to support the next range queries. To accelerate the search time of the cached query results, we use modified Interval Ski Lists. In addition, we devise a novel cache replacement policy since DICE maintains an interval rather than a data item. Since our cache replacement policy considers the properties of intervals, our proposed technique is more efficient than traditional buffer replacement algorithms. Our experimental result demonstrates the efficiency of our proposed technique. 相似文献

15.

Site and query scheduling policies in multicomputer databasesystems

Frieder O. Baru C.K. 《Knowledge and Data Engineering, IEEE Transactions on》1994,6(4):609-619

We study run-time issues, such as site allocation and query scheduling policies, in executing read-only queries in a hierarchical, distributed memory, multicomputer system. The particular architecture considered is based on the hypercube interconnection. The data are stored in a base cube, which is controlled by a control cube and host node hierarchy. Input query trees are transformed into operation sequence trees, and the operation sequences become the units of scheduling. These sequences are scheduled dynamically at run-time. Algorithms for dynamic site allocation are provided. Several query scheduling policies that support interquery concurrency are also studied. Average query completion times and initiation delays are obtained for the various policies using simulations 相似文献

16.

Distributed evaluation of network directory queries 总被引：1，自引：0，他引：1

Amer-Yahia S. Divesh Srivastava Suciu D. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(4):474-486

We describe novel efficient techniques for the distributed evaluation of hierarchical aggregate selection queries over LDAP directory data, distributed across multiple autonomous directory servers. Such queries are useful for emerging applications like the directory enabled networks initiative. Our techniques follow the LDAP approach of distributed query evaluation by referrals, where each relevant server computes answers locally, and the LDAP client coordinates between directory servers. We make a conceptual separation between the identification of relevant servers and the distributed computation of answers. We focus on the challenging task of generating an efficient plan for evaluating hierarchical aggregate selection queries, which involves correlating directory entries across multiple servers. The key features of our plan are: 1) the network traffic consists of query answers, and auxiliary messages that depend only on the number of servers and the size of the query (not on the data size), 2) the coordination effort at the client is independent of the data size, and 3) potentially expensive server-to-server communication and coordination is avoided. We complement our analysis with experiments that show the robustness and scalability of our techniques for highly distributed directory query processing. 相似文献

17.

k-Nearest Neighbor Query Processing Algorithms for a Query Region in Road Networks

下载免费PDF全文

Hyeong-Il Kim Jae-Woo Chang 《计算机科学技术学报》2013,28(4):585-596

Recent development of wireless communication technologies and the popularity of smart phones are making location-based services (LBS) popular. However, requesting queries to LBS servers with users’ exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an effcient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To effciently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time. 相似文献

18.

外包数据库服务隐私保护方法

余永红柏文阳《计算机应用》2010,30(10):2672-2676

针对目前基于数据库加密的隐私保护外包数据库服务技术需要对整个数据库进行频繁的加密和解密操作,不能有效实现数据处理性能与数据隐私保护之间平衡的不足,提出一种新的基于分布式外包数据库服务的隐私保护方法。该方法引入准标识属性集自动检测和概率匿名隐私保护技术,采用对部分敏感属性加密或匿名的方式和分解准标识属性集的方式实现数据的水平分解和垂直分解,并针对不同的数据分解方式,给出了分布式查询处理的方案。理论分析和实验结果表明,该方法可实现非可信数据库服务器的外包,并能较好地平衡数据查询性能和隐私保护之间的矛盾。相似文献

19.

Skyframe: a framework for skyline query processing in peer-to-peer systems

Shiyuan Wang Quang Hieu Vu Beng Chin Ooi Anthony K. H. Tung Lizhen Xu 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(1):345-362

This paper looks at the processing of skyline queries on peer-to-peer (P2P) networks. We propose Skyframe, a framework for efficient skyline query processing in P2P systems, which addresses the challenges of quick response time, low network communication cost and query load balancing among peers. Skyframe consists of two querying methods: one is optimized for network communication while the other focuses on query response time. These methods are different in the way in which the query search space is defined. In particular, the first method uses a high dominating point that has a large dominating region to prune the search space to achieve a low cost in network communication. On the other hand, the second method relaxes the search space in order to allow parallel query processing to speed up query response. Skyframe achieves query load balancing by both query load conscious data space splitting/merging during the join/departure of nodes and dynamic load migration. We further show how to apply Skyframe to both the P2P systems supporting multi-dimensional indexing and the P2P systems supporting single-dimensional indexing. Finally, we have conducted extensive experiments on both real and synthetic data sets over two existing P2P systems: CAN (Ratnasamy in A scalable content-addressable network. In: Proceedings of SIGCOMM Conference, pp. 161–172, 2001) and BATON (Jagadish et al. in A balanced tree structure for peer-to-peer networks. In: Proceedings of VLDB Conference, pp. 661–672, 2005) to evaluate the effectiveness and scalability of Skyframe. 相似文献

20.

Web search results caching service for structured P2P networks

《Future Generation Computer Systems》2014

This paper proposes a two-level P2P caching strategy for Web search queries. The design is suitable for a fully distributed service platform based on managed peer boxes (set-top-box or DSL/cable modem) located at the edge of the network, where both boxes and access bandwidth to those boxes are controlled and managed by an ISP provider. Our solution significantly reduces user query traffic going outside of the ISP provider to get query results from the respective Web search engine. Web users are usually very reactive to worldwide events which cause highly dynamic query traffic patterns leading to load imbalance across peers. Our solution contains a strategy to quickly ease imbalance on peers and spread communication flow among participating peers. Each peer maintains a local result cache used to keep the answers for queries originated in the peer itself and queries for which the peer is responsible for by contacting the Web search engine on-demand. When query traffic is predominantly routed to a few responsible peers our strategy replicates the role of “being responsible for” to neighboring peers so that they can absorb query traffic. This is a fairly slow and adaptive process that we call mid-term load balancing. To achieve a short-term fair distribution of queries we introduce a location cache in each peer which keeps pointers to peers that have already requested the same queries in the recent past. This lets these peers share their query answers with newly requesting peers. This process is fast as these popular queries are usually cached in the first DHT hop of a requesting peer which quickly tends to redistribute load among more and more peers. 相似文献