首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
LIGHT: A Query-Efficient Yet Low-Maintenance Indexing Scheme over DHTs   总被引:1,自引:0,他引:1  
DHT is a widely used building block for scalable P2P systems. However, as uniform hashing employed in DHTs destroys data locality, it is not a trivial task to support complex queries (e.g., range queries and k-nearest-neighbor queries) in DHT-based P2P systems. In order to support efficient processing of such complex queries, a popular solution is to build indexes on top of the DHT. Unfortunately, existing over-DHT indexing schemes suffer from either query inefficiency or high maintenance cost. In this paper, we propose LIGhtweight Hash Tree (LIGHT)—a query-efficient yet low-maintenance indexing scheme. LIGHT employs a novel naming mechanism and a tree summarization strategy for graceful distribution of its index structure. We show through analysis that it can support various complex queries with near-optimal performance. Extensive experimental results also demonstrate that, compared with state of the art over-DHT indexing schemes, LIGHT saves 50-75 percent of index maintenance cost and substantially improves query performance in terms of both response time and bandwidth consumption. In addition, LIGHT is designed over generic DHTs and hence can be easily implemented and deployed in any DHT-based P2P system.  相似文献   

2.
Distributed Hash Tables (DHTs) are scalable, self‐organizing, and adaptive to underlying topology changes, thus being a promising infrastructure for hosting large‐scale distributed applications. The ever‐wider use of DHT infrastructures has found more and more applications that require support for range queries. Recently, a number of DHT‐based range query schemes have been proposed. However, most of them suffer from high query delay or imbalanced load distribution. To address these problems, in this paper we first present an efficient indexing structure called Balanced Kautz (BK) tree that uniformly maps the m‐dimensional data space onto DHT nodes, and then propose a BK tree‐based range query scheme called ERQ that processes range queries in a parallel fashion and guarantees to return the results in a bounded delay. In a DHT with N nodes, ERQ can answer any range of query in less than rmlog N(2loglog N + 1) hops in a load‐balanced manner, irrespective of the queried range, the whole space size, or the number of queried attributes. The effectiveness of our proposals is demonstrated through experiments. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

3.
本文在常数度量的Cactus系统基础上设计了一种Smart-Broadcast算法,它在大规模节点的情况下同时具有高效搜索和低消息负载的特点。本文描述了Smart-Broadcast算法,并对其进行了性能模拟与分析。实验证明,Smart-Broadcast算法在消息开销和路由开销两个方面具有较好的折衷效率。  相似文献   

4.
With the growth of information technology and computer networks, there is a vital need for optimal design of distributed databases with the aim of performance improvement in terms of minimizing the round-trip response time and query transmission and processing costs. To address this issue, new fragmentation, data allocation, and replication techniques are required. In this paper, we propose enhanced vertical fragmentation, allocation, and replication schemes to improve the performance of distributed database systems. The proposed fragmentation scheme clusters highly-bonded attributes (i.e., normally accessed together) into a single fragment in order to minimize the query processing cost. The allocation scheme is proposed to find an optimized allocation to minimize the round-trip response time. The replication scheme partially replicates the fragments to increase the local execution of queries in a way that minimizes the cost of transmitting replicas to the sites. Experimental results show that, on average, the proposed schemes reduce the round-trip response time of queries by 23% and query processing cost by 15%, as compared to the related work.  相似文献   

5.
In wireless sensor networks, various schemes have been proposed to efficiently store and process sensed data. Among them, the data-centric storage (DCS) scheme is one of the most well-known. The DCS scheme distributes data regions and stores the data in the sensor that is responsible for the region. The DCS based scheme was proposed to reduce the communication cost for transmitting data and to efficiently process exact queries and range queries. Recently, a KDDCS scheme was proposed to overcome storage hot-spots by dynamically readjusting the distributed data regions to sensors based on the K-D tree. However, the existing DCS based schemes including KDDCS suffer from query hot-spots that are formed when query regions are not uniformly distributed. As a result, it reduces the lifetime of the sensor network.In this paper, we propose a new DCS based scheme, called Time-Parameterized Data-Centric Storage (TPDCS), that avoids the problems of storage hot-spots and query hot-spots. To decentralize the skewed data and queries, the data regions are assigned by a time dimension as well as data dimensions in our proposed scheme. Therefore, TPDCS extends the lifetime of sensor networks. It is shown through various experiments that our scheme outperforms the existing schemes.  相似文献   

6.
罗绪成  刘峤 《计算机应用》2007,27(8):1831-1834
根据非结构化P2P系统中资源分布的特点,提出一种基于复本网络的非结构化P2P系统,即RNP2P。通过查询反馈、主动探测和反向探测三种方式协调复本节点之间的相互感知,构建数据结构存储每种资源的其他复本节点信息,针对每种资源均构成一个复本网络。基于这种复本管理机制,RNP2P平均能够以命中3~5个复本的消息开销获得100%的命中率,其他查询方法均可以和RNP2P有效结合。模拟结果表明RNP2P的查询性能远远高于其他查询方案。当采用k-随机游走进行查询,RNP2P的消息开销为普通非结构化P2P中k-随机游走查询的5%,并且远远低于泛洪查询,RNP2P的查询时延也相应降低。  相似文献   

7.
Unstructured peer-to-peer infrastructure has been widely employed to support large-scale distributed applications. Many of these applications, such as location-based services and multimedia content distribution, require the support of range selection queries. Under the widely-adopted query shipping protocols, the cost of query processing is affected by the number of result copies or replicas in the system. Since range queries can return results that include poorly-replicated data items, the cost of these queries is usually dominated by the retrieval cost of these data items. In this work, we propose a popularity-aware prefetch-based approach that can effectively facilitate the caching of poorly-replicated data items that are potentially requested in subsequent range queries, resulting in substantial cost savings. We prove that the performance of retrieving poorly-replicated data items is guaranteed to improve under an increasing query load. Extensive experiments show that the overall range query processing cost decreases significantly under various query load settings.  相似文献   

8.
《Computer Networks》2007,51(8):1861-1881
The success of a P2P file-sharing network highly depends on the scalability and versatility of its search mechanism. Two particularly desirable search features are scope (ability to find infrequent items) and support for partial-match queries (queries that contain typos or include a subset of keywords). While centralized-index architectures (such as Napster) can support both these features, existing decentralized architectures seem to support at most one: prevailing unstructured P2P protocols (such as Gnutella and FastTrack) deploy a “blind” search mechanism where the set of peers probed is unrelated to the query; thus they support partial-match queries but have limited scope. On the other extreme, the recently-proposed distributed hash tables (DHTs) such as CAN and CHORD, couple index location with the item’s hash value, and thus have good scope but can not effectively support partial-match queries. Another hurdle to DHTs deployment is their tight control of the overlay structure and the information (part of the index) each peer maintains, which makes them more sensitive to failures and frequent joins and disconnects.We develop a new class of decentralized P2P architectures. Our design is based on unstructured architectures such as Gnutella and FastTrack, and retains many of their appealing properties including support for partial match queries, and relative resilience to peer failures. Yet, we obtain orders of magnitude improvement in the efficiency of locating rare items. Our approach exploits associations inherent in human selections to steer the search process to peers that are more likely to have an answer to the query. We demonstrate the potential of associative search using models, analysis, and simulations.  相似文献   

9.
This paper proposes a class of queueing schemes named general packet induced queueing schemes (GPIQS) in ADSL routers to reduce the queueing delays of non-P2P packets. The objective of the proposed queueing schemes is to send out the general packets first as well as P2P packets are able to be sent in a bounded queueing delay. The proposed queueing schemes use the general packet to induce the transmission of P2P packets which are from the same client and arrived at the ADSL router before the general packet. The outbound order of the packets transmitted from a specific client is not altered in the proposed schemes. Two queueing schemes named general packet induced queueing scheme with single P2P queue (GPIQS-SQ) and general packet induced queueing scheme with multiple P2P queues (GPIQS-MQ) are proposed. The two proposed queueing schemes differ in the number of P2P queues. In order to prevent the unlimited waiting time of P2P packets, we introduced a variable called the largest number of preempting packets to send out the P2P packets in a bounded time. Simulation results show that the proposed queueing schemes may send out the packets from ADSL router efficiently and the average queueing delay is smaller than the common used first-come first-served algorithm. Specifically, the GPIQS-MQ performs better than the GPIQS-SQ method in terms of average queueing delay of non-P2P packets. We also found that the increased average queueing delay of P2P packets is small. Finally, the values of the largest number of preempting packets are discussed.  相似文献   

10.
Our study introduces a novel distributed query plan refinement phase in an enhanced architecture of distributed query processing engine(DQPE) . Query plan refinement generates potentially efficient distributed query plan by reusable aggregate query shipping(RAQS) approach. The approach improves response time at the cost of pre-processing time. If the overheads could not be compensated by query results reusage,RAQS is no more favorable. Therefore a global cost estimation model is employed to get proper operators:RR Agg,R Agg,or R Scan. For the purpose of reusing results of queries with aggregate function in distributed query processing,a multi-level hybrid view caching(HVC) scheme is introduced. The scheme retains the advantages of partial match and aggregate query results caching. By our solution,evaluations with distributed TPC-H queries show significant improvement on average response time.  相似文献   

11.
We propose a new declustering scheme for allocating uniform multidimensional data among parallel disks. The scheme, aimed at reducing disk access time for range queries, is based on Golden Ratio Sequences for two dimensions and Kronecker Sequences for higher dimensions. Using exhaustive simulation, we show that, in two dimensions, the worst-case (additive) deviation of the scheme from the optimal response time for any range query is one when the number of disks (M) is at most 22; its worst-case deviation is two when M /spl les/ 94; and its worst-case deviation is four when M /spl les/ 550. In two dimensions, we prove that whenever M is a Fibonacci number, the average performance of the scheme is within 14 percent of the (generally, unachievable) strictly optimal scheme and its worst-case response time is within a multiplicative factor three of the optimal response time for any query, and within a factor 1.5 of the optimal for large queries. We also present comprehensive simulation results, on two-dimensional as well as on higher-dimensional data, that compare and demonstrate the advantages of our scheme over some recently proposed schemes in the literature.  相似文献   

12.
In many applications, XML documents need to be modelled as graphs. The query processing of graph-structured XML documents brings new challenges. In this paper, we design a method based on labelling scheme for structural queries processing on graph-structured XML documents. We give each node some labels, the reachability labelling scheme. By extending an interval-based reachability labelling scheme for DAG by Rakesh et al., we design labelling schemes to support the judgements of reachability relationships for general graphs. Based on the labelling schemes, we design graph structural join algorithms to answer the structural queries with only ancestor-descendant relationship efficiently. For the processing of subgraph query, we design a subgraph join algorithm. With efficient data structure, the subgraph join algorithm can process subgraph queries with various structures efficiently. Experimental results show that our algorithms have good performance and scalability. Support by the Key Program of the National Natural Science Foundation of China under Grant No.60533110; the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303000; the National Natural Science Foundation of China under Grant No. 60773068 and No. 60773063.  相似文献   

13.
Boolean query mapping across heterogeneous information sources   总被引:5,自引:0,他引:5  
Searching over heterogeneous information sources is difficult because of the nonuniform query languages. Our approach is to allow a user to compose Boolean queries in one rich front end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final result. We introduce the architecture and associated algorithms for generating the supported subsuming queries and filters. We show that generated subsuming queries return a minimal number of documents; we also discuss how minimal cost filters can be obtained. We have implemented prototype versions of these algorithms and demonstrated them on heterogeneous Boolean systems  相似文献   

14.
This work proposes the E-Top system for the efficient processing of top-k queries in mobile ad hoc peer to peer (M-P2P) networks using economic incentive schemes. In E-Top, brokers facilitate top-k query processing in lieu of a commission. E-Top issues economic rewards to the mobile peers, which send relevant data items (i.e., those that contribute to the top-k query result), and penalizes peers otherwise, thereby optimizing the communication traffic. Peers use the payoffs (rewards/penalties) as a means of feedback to re-evaluate the scores of their items for re-ranking purposes. The main contributions of E-Top are three-fold. First, it proposes two economic incentive schemes, namely ETK and ETK+, in which peers act individually towards top-k query processing. Second, it extends ETK and ETK+ to propose a peer group-based economic incentive scheme ETG. Third, our performance evaluation shows that our schemes are indeed effective in improving the performance of top-k queries in terms of query response times and accuracy at reasonable communication traffic cost.  相似文献   

15.
Distributed hash tables (DHTs) excel at exact-match lookups, but they do not directly support complex queries such as semantic search that is based on content. In this paper, we propose a novel approach to efficient semantic search on DHT overlays. The basic idea is to place indexes of semantically close files into same peer nodes with high probability by exploiting information retrieval algorithms and locality sensitive hashing. A query for retrieving semantically close files is answered with high recall by consulting only a small number (e.g., 10–20) of nodes that stores the indexes of the files semantically close to the query. Our approach adds only index information to peer nodes, imposing only a small storage overhead. Via detailed simulations, we show that our approach achieves high recall for queries at very low cost, i.e., the number of nodes visited for a query is about 10–20, independent of the overlay size.  相似文献   

16.
In Software as a Service (SaaS) environments, designing and realizing multitenant schema-mapping that supports a shared database with custom extensions is a non-trivial task. Universal Table is one promising schema-mapping technique that is commonly used. However, there has been little research devoted to the design and realization of a query rewriting scheme for Universal Table. In this paper, we present a collection of general query rewriting schemes for Universal Table that can transparently transform tenant-specific logical queries into corresponding physical queries. Based on the design, we have developed a Java-based schema-mapping and query rewriting middleware for Universal Table and a sample online shopping SaaS application to verify its feasibility. Additionally, analytical results that can be used to predict the overhead of our schemes are also reported. Finally, we conduct a series of experiments and find that the results not only agree well with our analytical predictions but also show that our schemes are scalable to the number of tenants and the number of concurrent database connections.  相似文献   

17.
Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest-neighbor queries are the most important query types. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typically explored by iteratively asking similarity queries for answers of previous similarity queries. We introduce a generic scheme for such data mining algorithms and we investigate two orthogonal approaches, reducing I/O cost as well as CPU cost, to speed-up the processing of multiple similarity queries. The proposed techniques apply to any type of similarity query and to an implementation based on an index or using a sequential scan. Parallelization yields an additional impressive speed-up. An extensive performance evaluation confirms the efficiency of our approach  相似文献   

18.
In this paper we describe a distributed system designed to efficiently store, query and update multidimensional data organized into concept hierarchies and dispersed over a network. Our system employs an adaptive scheme that automatically adjusts the level of indexing according to the granularity of the incoming queries, without assuming any prior knowledge of the workload. Efficient roll-up and drill-down operations take place in order to maximize the performance by minimizing query flooding. Updates are performed on-line, with minimal communication overhead, depending on the level of consistency needed. Extensive experimental evaluation shows that, on top of the advantages that a distributed storage offers, our method answers the vast majority of incoming queries, both point and aggregate ones, without flooding the network and without causing significant storage or load imbalance. Our scheme proves to be especially efficient in cases of skewed workloads, even when these change dynamically with time. At the same time, it manages to preserve the hierarchical nature of data. To the best of our knowledge, this is the first attempt towards the support of concept hierarchies in DHTs.  相似文献   

19.
Toward an accurate analysis of range queries on spatial data   总被引:2,自引:0,他引:2  
Analysis of range queries on spatial (multidimensional) data is both important and challenging. Most previous analysis attempts have made certain simplifying assumptions about the data sets and/or queries to keep the analysis tractable. As a result, they may not be universally applicable. This paper proposes a set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving a range query. The underlying philosophy behind these techniques is to maintain an auxiliary data structure, called a density file, whose creation is a one-time cost, which can be quickly consulted when the query is given. The schemes differ in what information is kept in the density file, how it is maintained, and how this information is looked up. It is shown that one of the proposed schemes, called cumulative density (CD), gives very accurate results (usually less than 5 percent error) using a diverse suite of point and rectangular data sets, that are uniform or skewed, and a wide range of query window parameters. The estimation takes a constant amount of time, which is typically lower than 1 percent of the time that it would take to execute the query, regardless of data set or query window parameters.  相似文献   

20.
Declustering is a common technique used to reduce query response times. Data is declustered over multiple disks and query retrieval can be parallelized. Most of the research on declustering is targeted at spatial range queries and investigates schemes with low additive error. Recently, declustering using replication has been proposed to reduce the additive overhead. Replication significantly reduces retrieval cost of arbitrary queries. In this paper, we propose a disk allocation and retrieval mechanism for arbitrary queries based on design theory. Using the proposed c-copy replicated declustering scheme, buckets can be retrieved using at most k disk accesses. Retrieval algorithm is very efficient and is asymptotically optimal with complexity for a query Q. In addition to the deterministic worst-case bound and efficient retrieval, proposed algorithm handles nonuniform data, high dimensions, supports incremental declustering and has good fault-tolerance property. Experimental results show the feasibility of the algorithm. Recommended by: Sunil Prabhakar  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号