首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The past few years have seen tremendous advances in distributed storage infrastructure. Unstructured and structured overlay networks have been successfully used in a variety of applications, ranging from file-sharing to scientific data repositories. While unstructured networks benefit from low maintenance overhead, the associated search costs are high. On the other hand, structured networks have higher maintenance overheads, but facilitate bounded time search of installed keywords. When dealing with typical data sets, though, it is infeasible to install every possible search term as a keyword into the structured overlay.  相似文献   

2.
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations that are frequently present in user queries, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable latency and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the posting lists transmitted during retrieval never exceed a constant size. A novel index update mechanism efficiently handles adding of new documents to the document collection. Thus, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users and changes in the document collection.We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for Web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval.  相似文献   

3.
Using a distributed quadtree index in peer-to-peer networks   总被引:6,自引:0,他引:6  
Peer-to-peer (P2P) networks have become a powerful means for online data exchange. Currently, users are primarily utilizing these networks to perform exact-match queries and retrieve complete files. However, future more data intensive applications, such as P2P auction networks, P2P job-search networks, P2P multiplayer games, will require the capability to respond to more complex queries such as range queries involving numerous data types including those that have a spatial component. In this paper, a distributed quadtree index that adapts the MX-CIF quadtree is described that enables more powerful accesses to data in P2P networks. This index has been implemented for various prototype P2P applications and results of experiments are presented. Our index is easy to use, scalable, and exhibits good load-balancing properties. Similar indices can be constructed for various multidimensional data types with both spatial and non-spatial components.  相似文献   

4.
We develop a new model of the interaction of rational peers in a Peer-to-Peer (P2P) network that has at its heart altruism, an intrinsic parameter reflecting peers’ inherent willingness to contribute. Two different approaches for modelling altruistic behavior and its attendant benefit are introduced. With either approach, we use Game Theoretic analysis to calculate Nash equilibria and predict peer behavior in terms of individual contribution. We consider the cases of P2P networks of peers that (i) have homogeneous altruism levels or (ii) have heterogeneous altruism levels, but with known probability distributions. We find that, under the effects of altruism, a substantial fraction of peers will contribute when altruism levels are within certain intervals, even though no incentive mechanism is used. Our results corroborate empirical evidence of large P2P networks surviving or even flourishing without or with barely functioning incentive mechanisms. We also enhance the model with a simple but powerful incentive scheme to limit free-riding and increase contribution to the network, and show that the particular incentive scheme on networks with altruistic peers achieves its goal.
Vasilis VassalosEmail: URL: http://wim.aueb.gr/vassalos

Dimitrios K. Vassilakis   2005–today: PhD candidate in the Informatics Department of the Athens University of Economics and Business (AUEB). Research areas: Operations Research (OR), Game Theory, economic models and applications of Game Theory on the internet (anti-spam, P2P networks), applications of OR on electricity scheduling. Vasilis Vassalos   2003–today: Assistant Professor in the Informatics Department of the Athens University of Economics and Business (AUEB). 1999–2003: assistant professor in the Information Systems Group of Information, Operations and Management Sciences (IOMS) Department in the Stern School of Business at New York University. Research areas: databases, Web-based information systems and middleware development, generation of user interfaces and Web services for semistructured data sources, integration of mobile data sources, XML query processing, digital libraries.   相似文献   

5.
There are two basic concerns for supporting multi-dimensional range query in P2P overlay networks. The first is to preserve data locality in the process of data space partitioning, and the second is the maintenance of data locality among data ranges with an exponentially expanding and extending rate. The first problem has been well addressed by using recursive decomposition schemes, such as Quad-tree, K-d tree, Z-order, and Hilbert curve. On the other hand, the second problem has been recently identified by our novel data structure: HD Tree. In this paper, we explore how data locality can be easily maintained, and how range query can be efficiently supported in HD Tree. This is done by introducing two basic routing strategies: hierarchical routing and distributed routing. Although hierarchical routing can be applied to any two nodes in the P2P system, it generates high volume traffic toward nodes near the root, and has very limited options to cope with node failure. On the other hand, distributed routing concerns source and destination pairs only at the same depth, but traffic load is bound to some nodes at two neighboring depths, and multiple options can be found to redirect a routing request. Because HD Tree supports multiple routes between any two nodes in the P2P system, routing in HD Tree is very flexible; it can be designed for many purposes, like fault tolerance, or dynamic load balancing. Distributed routing oriented combined routing (DROCR) algorithm is one such routing strategy implemented so far. It is a hybrid algorithm combining advantages from both hierarchical routing and distributed routing. The experimental results show that DROCR algorithm achieves considerable performance gain over the equivalent tree routing at the highest depth examined. For supporting multi-dimensional range query, the experimental results indicate that the exponentially expanding and extending rate have been effectively controlled and minimized by HD Tree overlay structure and DROCR routing.  相似文献   

6.
Communities of Practices (CoPs) are informal structures within organizations that bind people together through informal relationships and the sharing of expertise and experience. As such, they are effective tools for the creation and sharing of organizational knowledge, and an increasing number of organizations are adopting them as part of their knowledge management strategies. In this paper, we examine the knowledge sharing characteristics and roles of CoPs and develop a peer-to-peer knowledge sharing architecture that matches the behavioral characteristics of the members of the CoPs. We also propose a peer-to-peer knowledge sharing tool called KTella that enables a community's members to voluntarily share and retrieve knowledge more effectively.  相似文献   

7.
Today’s peer-to-peer networks are designed based on the assumption that the participating nodes are cooperative, which does not hold in reality. Incentive mechanisms that promote cooperation must be introduced. However, the existing incentive schemes (using either reputation or virtual currency) suffer from various attacks based on false reports. Even worse, a colluding group of malicious nodes in a peer-to-peer network can manipulate the history information of its own members, and the damaging power increases dramatically with the group size. Such malicious nodes/collusions are difficult to detect, especially in a large network without a centralized authority. In this paper, we propose a new distributed incentive scheme, in which the amount that a node can benefit from the network is proportional to its contribution, malicious nodes can only attack others at the cost of their own interests, and a colluding group cannot gain advantage by cooperation regardless of its size. Consequently, the damaging power of colluding groups is strictly limited. The proposed scheme includes three major components: a distributed authority infrastructure, a key sharing protocol, and a contract verification protocol.  相似文献   

8.
A hyperplane based indexing technique for high-dimensional data   总被引:1,自引:0,他引:1  
In this paper, we propose a novel hyperplane based indexing method to support efficient processing of similarity search queries in high-dimensional spaces. The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a hyperplane, which further partitions a subspace and can also take advantage of the twin node concept used in the key dimension based index. Compared with the key dimension concept, the hyperplane is more effective in data filtering. High space utilization is achieved by dynamically performing data reallocation between twin nodes. In addition, a post processing step is used after index building to ensure effective filtration. Extensive experiments based on two types of real data sets are conducted and the results illustrate a significantly improved filtering efficiency. Because of the feature of hyperplane, the proposed indexing method is only suitable to Euclidean spaces.  相似文献   

9.
在基于分布式哈希表构造的对等网络中,路由表的结构影响关键字的查询效率。B+树是一种有效查找的树型索引结构。考虑便于管理网络中众多的节点路由信息,提出一种基于B+树的路由结构,它通过为节点的路由信息建立索引,不仅提高了查询效率,将查找长度控制在树的高度内,而且使每个节点维护的路由信息尽可能少,减少了存储开销。  相似文献   

10.
11.
XBRL is a specification used to exchange financial/economic information. It is actively used by many international institutions and agencies. In the USA, Canada, Europe, China, etc. all financial entities and companies quoted on the stock market have to report compulsorily to the supervisory and regulatory authority using the XBRL specification. XBRL consists of a set of taxonomies defining different accounting regulations for a specific statement and the statement itself. Reports are generated from various sources and are validated at origin. XBRL displays business information which is multidimensional and whose logical destination for storage is a data warehouse. The proposal presented here focuses on the automation of the mapping between XBRL and the multidimensional data model (MDM) and includes a formalization of the validation rules in the MDM. The approach is designed in accordance with the Model Driven Architecture (MDA) paradigm which consists of a new way to validate XBRL reports through an RDBMS, and offers a proof-of-concept. Additionally, the study aims to provide more clarity about XBRL, a highly complex language made by and for expert users, and to improve interoperability between applications. The proposal also analyses certain semantic questions associated with the XBRL formula specification and its performance.  相似文献   

12.
Database applications very often require a sophisticated class of storage structures in order to answer different types of queries efficiently. This often dictates that the file should be organized on multiple keys. Several storage structures have been proposed to satisfy these needs. Most of these are a generalization of the storage structures used for managing one-dimensional data. Thek-d tree is one such example and it is a natural generalization of the standard one-dimensional binary search tree. Recently, a new storage structure, called theBD tree, was proposed to manage multidimensional data. This structure has good dynamic characteristics. Several variations are possible on the basick-d tree structure. This paper studies the performance implications of three variations. Further, it provides an empirical performance comparison of thek-d tree andBD tree in database applications.  相似文献   

13.
The Journal of Supercomputing - With the increasing daily production of data in recent years, indexing, storing and retrieving huge amounts of data have become a common problem, especially for...  相似文献   

14.
针对大数据环境下完整性查询时间代价消耗过高的问题,提出了一种采用近似完整性查询方法的系统——Probery。Probery所采用的近似完整性查询方法不同于传统的近似查询,其近似性主要体现为数据查全的可能性,是一种新型的数据查询方法。Probery首先将存入系统的数据划分为多个数据分段;然后,根据概率放置模型将各个数据分段的数据存储在分布式文件系统中;最后,对于给定的查询条件,Probery采用一种启发式查询方法进行概率查询。通过与其他主流的非关系型数据管理系统的查询性能进行比较,对Probery进行验证,Probery在损失8%查询完整性的情形下,查询时间较HBase相比节约了51%,较Cassandra相比节约了23%,较MongoDB相比节约了12%,较Hive相比节约了3%。实验结果表明,Probery可以适当地损失查询完整性来提高数据的查询性能,具有较好的通用性、适应性和可扩展性。  相似文献   

15.
如何高效地搜索资源是P2P网络中最为关键的问题.非结构化的对等网络,一般以广播方式作为其搜索的基本策略,引发较大的网络流量.针对以上问题,提出了一种利用节点积累的经验指导节点传播查询的路由搜索算法.在该算法中,通过记录节点关注的主题、主题的信息量大小和满足主题的目标节点,并建立对应关系表.当节点收到查询后,就利用该表来指导节点选择查询,以便更快地找到查询结果.仿真结果表明,该算法有效地减少了查询带来的网络流量,提高了查找的成功率.  相似文献   

16.
针对P2P网络的搭便车行为及网络资源的同质化现象,提出了一个基于PKI体系和结构化P2P网络的激励机制。该激励机制不但鼓励节点提供资源下载,还让资源发布者从中受益,从而有效地抑制搭便车行为,减轻了资源的同质化现象。  相似文献   

17.
A novel index structure based on the generalized suffix tree (PIGST) is proposed. Combined with post lists, PIGST can answer both structural and content queries. The distinct paths in an XML collection are mapped into strings. The construction algorithm of the PIGST for the path strings is presented based on the modification and improvement of a well-known suffix tree construction algorithm that only requires linear time and space complexity. The query process merely needs m character comparisons for direct containment queries, where m is the length of a query string. An efficient processing method for the indirect containment queries that avoids the inefficient tree traversal operation is also presented. Experiments show that PIGST outperforms earlier approaches.  相似文献   

18.
Multidimensional data are exploited in many application areas such as scientific data analysis, business intelligence, and geographic information systems. One of the most frequent operations applied to such multidimensional data is the selection of a subspace of the given multidimensional space, which involves predicate evaluation on multiple dimensions. Existing main-memory data layouts optimized for evaluating predicates on the columnar data can be used to accelerate the subspace extraction by sequentially performing filter scans on each dimension one at a time. However, optimization opportunities emerge if we can consider all predicates together. In this paper, we propose DimensionSlice, a new main-memory data layout optimized for evaluating predicates on multiple dimensions. More specifically, the dimension values are sliced into portions and the portions with the same order of each dimension are arranged together. Multiple predicates are simultaneously evaluated with the sliced dimension values during the scan. In addition, by storing the different portions separately, unnecessary loads and computations of lower portions can be eliminated if the evaluation results are assured after examining the upper portions. For further acceleration of scans, the DimensionSlice layout is designed to easily leverage the SIMD capabilities that most mainstream processors are equipped with. Through experiments, we demonstrate the performance gains of the proposed method over the columnar main-memory layout that evaluates the partial predicates one dimension at a time. We also show that the proposed method outperforms the state-of-the-art multidimensional index structure when the selectivity is over a very low threshold.  相似文献   

19.
Motivating peers to contribute services is critical to the success of peer-to-peer (P2P) systems. Incentive protocols use reciprocity to enforce contributions. Indirect reciprocity schemes are more efficient than direct reciprocity schemes for large-scale P2P systems under high churn rate. In this paper, we propose an indirect reciprocity scheme, called FairTrade, in which peers issue personal currencies to trade services in a P2P system. Personal currency enables indirect reciprocity without relying on any central banks or authorities. It wins extra robustness over global currency as well as much improved trading flexibility and efficiency over direct reciprocity schemes. The acceptance degree of a personal currency depends on the issuer’s service capability and reliance. Peer credit limit is introduced to represent the amount of personal currency that will be accepted by other peers. Every peer as a creditor applies a Bayesian network model to setting peer credit limit for a trading partner peer as a creditee. The Bayesian network model learns the creditee’s capability and reliability and anticipates the associated profits and risks for credit setting. Using simulations on a file-sharing P2P system, we demonstrate that FairTrade achieves 100%100% success rate of download requests without malicious peers, and maintains over 90%90% success rate even with 50%50% malicious nodes. The system warms up quickly and does not assume any altruistic service or other kind of help. On average, the system traffic stabilizes before peers issue their second download requests. All these good performances are achieved with extremely low trading overhead, which takes up less than 1%1% of the total traffic.  相似文献   

20.
Motivated by the globalization trend and Internet speed competition, enterprise nowadays often divides into many departments or organizations or even merges with other companies that located in different regions to bring up the competency and reaction ability. As a result, there are a number of data warehouse systems in a geographically-distributed enterprise. To meet the distributed decision-making requirements, the data in different data warehouses is addressed to enable data exchange and integration. Therefore, an open, vendor-independent, and efficient data exchange standard to transfer data between data warehouses over the Internet is an important issue. However, current solutions for cross-warehouse data exchange employ only approaches either based on records or transferring plain-text files, which are neither adequate nor efficient. In this research, issues on multidimensional data exchange are studied and an Intelligent XML-based multidimensional data exchange model is developed. In addition, a generic-construct-based approach is proposed to enable many-to-many systematic mapping between distributed data warehouses, introducing a consistent and unique standard exchange format. Based on the transformation model we develop between multidimensional data model and XML data model, and enhanced by the multidimensional metadata management mechanism proposed in this research, a general-purpose intelligent XML-based multidimensional data exchange process over web is facilitated efficiently and improved in quality. Moreover, we develop an intelligent XML-based prototype system to exchange multidimensional data, which shows that the proposed multidimensional data exchange model is feasible, and the multidimensional data exchange process is more systematic and efficient using metadata.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号