首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Peer-to-peer (P2P) databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large-scale ad hoc analysis queries, for example, aggregation queries, on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement, given the distributed and dynamic nature of P2P databases. In this paper, we present novel sampling-based techniques for approximate answering of ad hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors: the data is distributed (usually in uneven quantities) across many peers, within each peer, the data is often highly correlated, and, moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach based on random walks of the P2P graph, as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solution.  相似文献   

纯Peer to Peer环境下有效的Top-k查询   总被引:19,自引:2,他引:19  
何盈捷  王珊  杜小勇 《软件学报》2005,16(4):540-552
目前大多数的Peer-to-Peer(P2P)系统只支持基于文件标识的搜索,用户不能根据文件的内容进行搜索.Top-k查询被广泛地应用于搜索引擎中,获得了巨大的成功.可是,由于P2P系统是一个动态的、分散的系统,在纯的P2P环境下进行top-k查询是具有挑战性的.提出了一种基于直方图的分层top-k查询算法.首先,采用层次化的方法实现分布式的top-k查询,将结果的合并和排序分散到P2P网络中的各个节点上,充分利用了网络中的资源.其次,根据节点返回的结果为节点构建直方图,利用直方图估计节点可能的分数上限,对节点进行选择,提高了查询效率.实验证明,top-k查询提高了查询效果,而直方图则提高了查询效率.  相似文献   

目前大多数的Peer-to-Peer(P2P)系统只支持基于文件标识的搜索,用户不能根据文件的内容进行搜索.Top-k查询被广泛地应用于搜索引擎中,获得了巨大的成功.可是,由于P2P系统是一个动态的、分散的系统,在P2P环境下进行top-k查询是具有挑战性的.提出了一种在集中式P2P系统中的基于中心文档的层次化的top-k查询算法.首先,采用层次化的方法实现分布式的top-k查询,将结果的合并和排序分散到P2P网络中的各个节点上,充分利用了网络中的资源.其次,将节点返回的结果录入到中心文档中,然后确定其分数上限,对节点进行选择,提高了查询效率.  相似文献   

In recent years there has been a significant interest in peer-to-peer (P2P) environments in the community of data management. However, almost all work, so far, is focused on exact query processing in current P2P data systems. The autonomy of peers also is not considered enough. In addition, the system cost is very high because the information publishing method of shared data is based on each document instead of document set. In this paper, abstract indices (AbIx) are presented to implement content-based approximate queries in centralized, distributed and structured P2P data systems. It can be used to search as few peers as possible but get as many returns satisfying users' queries as possible on the guarantee of high autonomy of peers. Also, abstract indices have low system cost, can improve the query processing speed, and support very frequent updates and the set information publishing method. In order to verify the effectiveness of abstract indices, a simulator of 10,000 peers, over 3 million documents is made, and several metrics are proposed. The experimental results show that abstract indices work well in various P2P data systems.  相似文献   

基于分布式哈希表(DHT)的结构化P2P网络具有扩展性好、健壮和自组织等优点,但只支持精确匹配的查询.本文提出一种基于分布式范围树的结构化P2P范围查询方法(DRT-RQ),该方法将多维索引的分布式范围树分发到已有的结构化DHT覆盖网络中,利用DHT系统提供的数据查找接口,有效实现数据对象的范围查询.实验结果表明,基于分布式范围树的范围查询(DRT-RQ)比基于前缀哈希树的范围查询(PHT-RQ)需要更短的查询延时.  相似文献   

Improving Query Response Delivery Quality in Peer-to-Peer Systems   总被引:1,自引:0,他引:1  
Unstructured peer-to-peer (P2P) system is the prevalent model in today's P2P systems. In such systems, a response is sent along the same path that carried the incoming query message. To guarantee the anonymity of the requestor, no requestor information is included in the response message, and each node in the query's incoming path only knows its direct neighbors who sent the query request to it. This mechanism introduces response loss when any one node or connection in the path fails, which is a common occurrence in the P2P system due to its dynamic feature. In this paper, we address the response loss problem and show that peers' oscillation can cause up to a 35 percent response loss in an unstructured P2P system. We also present three techniques to alleviate this problem: the redundant response delivery (RRD) scheme as a proactive approach, the adaptive response delivery (ARD) scheme as a reactive approach, and the extended adaptive response delivery scheme to render ARD to function in an unstructured P2P system with limited or no flooding-based search mechanism. We have evaluated our techniques in a large-scale network simulation. With limited traffic overhead, all three techniques reduce response loss rate by more than 65 percent and are fully distributed. We have designed our techniques to be simple to develop and implement in existing P2P systems  相似文献   

徐林昊  钱卫宁  周傲英 《软件学报》2007,18(6):1443-1455
对等计算数据管理中的一个重要问题是如何有效地支持多维数据空间上的相似性搜索.现有的非结构化对等计算数据共享系统仅支持简单的查询处理方法,即匹配查询处理.将近似技术和路由索引结合在一起,设计了一种简单、有效的索引结构EVARI(扩展近似向量路由索引).利用EVARI,每个节点不仅可以在本地共享的数据集上处理范围查询,而且还可以将查询转发给最有希望获得查询结果的邻居节点.为了建立EVARI,每个节点使用空间划分技术概括本地的共享内容,并与邻居节点交换概要信息.而且,每个节点都可以重新配置自己的邻居节点,使得相关节点位置相互邻近,优化了系统资源配置,提升了系统性能.仿真实验证明了该方法的良好性能.  相似文献   

An increasing number of large-scale applications exploit peer-to-peer network architecture to provide highly scalable and flexible services. Among these applications, data management in peer-to-peer systems is one of the interesting domains. In this paper, we investigate the multidimensional skyline computation problem on a structured peer-to-peer network. In order to achieve low communication cost and quick response time, we utilize the iMinMax(theta ) method to transform high-dimensional data to one-dimensional value and distribute the data in a structured peer-to-peer network called BATON. Thereafter, we propose a progressive algorithm with adaptive filter technique for efficient skyline computation in this environment. We further discuss some optimization techniques for the algorithm, and summarize the key principles of our algorithm into a query routing protocol with detailed analysis. Finally, we conduct an extensive experimental evaluation to demonstrate the efficiency of our approach.  相似文献   

Simple Efficient Load-Balancing Algorithms for Peer-to-Peer Systems   总被引:3,自引:0,他引:3  
Load balancing is a critical issue for the efficient operation of peer-to-peer (P2P) networks. We give two new load-balancing protocols whose provable performance guarantees are within a constant factor of optimal. Our protocols refine the consistent hashing data structure that underlies the Chord (and Koorde) P2P network. Both preserve Chord's logarithmic query time and near-optimal data migration cost. Consistent hashing is an instance of the distributed hash table (DHT) paradigm for assigning items to nodes in a P2P system: items and nodes are mapped to a common address space, and nodes have to store all items residing closeby in the address space. Our first protocol balances the distribution of the key address space to nodes, which yields a load-balanced system when the DHT maps items "randomly" into the address space. To our knowledge, this yields the first P2P scheme simultaneously achieving O(log n) degree, O(log n) look-up cost, and constant-factor load balance (previous schemes settled for any two of the three). Our second protocol aims to balance directly the distribution of items among the nodes. This is useful when the distribution of items in the address space cannot be randomized. We give a simple protocol that balances load by moving nodes to arbitrary locations "where they are needed." As an application, we use the last protocol to give an optimal implementation of a distributed data structure for range searches on ordered data.  相似文献   

预测性连续时空区域查询在用户指定的时间范围期间持续地返回给定未来查询时间范围期间将出现在查询区域的移动对象。论文提出了一种预测性连续时空区域查询处理方法,设计了支持连续查询处理的两种索引结构。移动对象索引用于记录移动对象不断更新的位置信息,它用于支持查询的首次处理。连续查询索引结构用于记录所有查询结果可能受到移动对象位置变化影响的连续查询,它用于支持连续查询处理。实验表明,论文提出的方法能够有效地提高处理大量连续查询的效率。  相似文献   

Peer-to-Peer存储系统中一种高效的数据维护方案   总被引:2,自引:0,他引:2  
杨智  朱君  代亚非 《软件学报》2009,20(1):80-95
提出一套完整的数据维护方案.该方案建立在P2P环境动态性特点的基础上.一方面,该方案考虑了节点动态性差异,它基于不同的动态性作相应的数据冗余,能够用更少的冗余开销来保证数据的目标可用性;另一方面,该方案给出如何利用判别器来区分永久失效和暂时失效,以减少由于不必要的数据修复而带来的额外修复开销.通过在真实P2P系统Maze上的实验结果表明,该方案比目前主流的方案能够节省大约80%的数据维护带宽.  相似文献   

随着网格从科学计算转到企业级应用,要求数据库提供多种服务支持以实现更强更丰富的资源共享和应用。网格上的数据库只能通过网格服务进行访问,而数据库中的数据也只能通过网格服务接口来存取。因此如何在网格环境下直接对分布在各地的数据库进行高效的检索就是迫切要解决的问题。本文首先提出了一个网格环境下数据检索的体系结构,然后针对该结构下的数值型数据的Top-k查询问题给出了GrangM算法,它有效解决了来自不同数据源查询结果的合并问题。对该算法的模拟实现表明,它可以快速、高效地合并网格中多结点检索出的结果,减少连接中间结果的大小,降低发送查询请求的通信量。  相似文献   

The goal of knowledge compilation is to enable fast queries. Prior approaches had the goal of small (i.e., polynomial in the size of the initial knowledge bases) compiled knowledge bases. Typically, query–response time is linear, so that the efficiency of querying the compiled knowledge base depends on its size. In this paper, a target for knowledge compilation called the ri-trie is introduced; it has the property that even if the knowledge bases are large, they nevertheless admit fast queries. Specifically, a query can be processed in time linear in the size of the query regardless of the size of the compiled knowledge base.  相似文献   

Top-k相互Skyline查询返回相互Skyline查询中的前k个对象.这种查询是数据分析者寻找有意义对象进行决策支持的一种重要直觉工具.然而,这种查询还没有引起研究社区足够的注意力.介绍了几种新颖的算法,包括Topk-TBBS,Topk-dMBBS,Topk-wMBBS.主要的思想是信息重用和高效的修剪策略.特别地,Topk-wMBBS算法由于完全重用了搜索中的节点信息,并利用了最好优先BF搜索策略.因而它获得了最好的性能.同时证明了该算法有最优的I/O访问效率.最后,使用了2个真实数据集和4个服从不同分布的合成数据集进行了集中实验.实验结果表明,提出的算法无论是变化参数k的大小、数据集的尺寸和Cache尺寸都是有效的,且具有很高的效率,尤其Topk-wMBBS具有最小的I/O访问次数.  相似文献   

不确定图数据库中高效查询处理   总被引:6,自引:3,他引:6  
近年来,在多种领域中产生的大量数据都可以自然地建模为图结构,比如蛋白质交互网络、社会网络等.测量手段的不准确性以及数据本身的性质导致不确定性在很多图数据中普遍存在.文中研究不确定图数据库中的高效查询处理方法.首先给出一种数据模型来表示图的不确定性.鉴于对用户提交的查询图通常会产生大量匹配结果,高效得到概率最大的k个匹配常常更具有现实意义.因此文中形式化提出概率top-k子图匹配查询的问题.为了解决提出的查询问题,以附带概率信息的邻居子图为基础,设计了一种有效的索引结构.另外,提出一种高效的基于索引的查询处理方法.该查询处理方法的核心是一个基于搜索树的匹配算法,其中运用了一种概率剪枝技术来提高性能.实验结果表明,所提出方法具有良好的效率和可扩展性.  相似文献   

一种高效的P2P环境中的窗口查询算法   总被引:1,自引:0,他引:1  
随着多媒体以及P2P网络的发展,针对高维数据基于属性的窗口查询已经成为一个重要研究课题.提出了一种在超级节点P2P网络中有效解决高维数据的窗口查询算法,在每个单独的网络节点上,数据通过一种降维算法映射到一维空间,在超级节点上,构造数据的统计信息表以及构造网络查询树,算法在每次查询时,按照查询树的规则来访问整个网络,并利用统计信息剪枝网络中的节点查询,避免网络的泛洪.实验中使用了不同的数据集来评测算法的查询效率,结果表明该算法具有很高的查询效率.  相似文献   

结构化P2P上的高效多属性区间查询   总被引:1,自引:0,他引:1       下载免费PDF全文
海沫 《计算机工程》2010,36(6):58-60
在结构化P2P上的多属性区间查询中,查询算法所需的跳数和消息数依赖于节点个数和被查询的区间大小,属性值改变时会产生大量的消息。针对这些问题,提出结构化P2P上基于节点动态分组(PDG)的多属性区间查询机制。仿真结果表明,PDG中解析每个查询所需的跳数和消息数与被查询的区间大小和节点个数无关,更新属性值所产生的消息数减少,并且节点分组的维护开销较低。  相似文献   

Consistency maintenance mechanism is necessary for the emerging Peer-to-Peer applications due to their frequent data updates. Centralized approaches suffer single point of failure, while previous decentralized approaches incur too many duplicate update messages because of locality-ignorant structures. To address this issue, we propose a scalable and efficient consistency maintenance scheme for heterogeneous P2P systems. Our scheme takes the heterogeneity nature into account and forms the replica nodes of a key into a locality-aware hierarchical structure, in which the upper layer is DHT-based and consists of powerful and stable replica nodes, while a replica node at the lower layer attaches to a physically close upper layer node. A d-ary update message propagation tree (UMPT) is dynamically built upon the upper layer for propagating the updated contents. As a result, the tree structure does not need to be maintained all the time, sav-ing a lot of cost. Through theoretical analyses and comprehensive simulations, we examine the efficiency and scalability of this design. The results show that, compared with previous designs, especially locality-ignorant ones, our approach is able to reduce the cost by about 25-67 percent.  相似文献   

张炜  李建中  刘禹 《软件学报》2007,18(2):279-290
提出了一种基于概率模型的预测性时空区域查询处理方法.该方法采用Filter-Refinement方式来处理查询.首先,从数据库中选择所有可能满足查询的候选移动对象;然后,根据概率模型中定义的方法来计算候选移动对象满足查询的概率;最后,根据查询中指定的最小概率阈值过滤候选移动对象并返回查询结果.该概率模型将移动对象未来可能出现的位置定义为一个随机变量,并给出了计算移动对象在两种不同的运动模式下满足查询的概率值的方法.还提出了一种通过对大量历史轨迹抽样来获得概率密度函数(probability density function,简称PDF)的轨迹分析算法,并设计了概率密度函数索引STP-Index(spatio-temporal PDF-index).该索引能够有效地提高轨迹分析算法和概率计算的效率.实验结果表明,该查询处理方法能够有效地支持预测性时空区域查询的处理,提高查询结果的正确性,特别适合于具有较小的空间区域和长时间范围的预测性时空区域查询.  相似文献   

全时态区域查询方法是可以同时支持对于移动对象过去、现在以及预测性未来信息区域查询处理的方法,是移动对象数据管理的一个重要方面.在移动对象数据库领域,大量技术被提出以支持历史信息查询或未来信息预测,但是缺乏对于全时态区域查询方法的研究.提出一个可以支持精确区域查询的移动对象全时态查询方法,并支持对于历史信息的轨迹查询.为提高查询效率,提出索引结构PPF-index.在PPF-index中,首先在移动对象信息到达时,利用提出的TB_TPR-tree结构来索引移动对象现在以及预测性未来信息;其次,历史轨迹信息经过轨迹切分后利用3D R-tree进行索引;最后,提出基于PPF-index索引结构的全时态区域查询算法.全时态区域查询算法中的时间范围不同,需要访问的索引结构也不同.实验结果表明,PPF-index可以高效支持全时态查询,并具有很高的更新效率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号