首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Semantic-based searching in peer-to-peer (P2P) networks has drawn significant attention recently. A number of semantic searching schemes, such as GES proposed by Zhu Y et al., employ search models in Information Retrieval (IR). All these IR-based schemes use one vector to summarize semantic contents of all documents on a single node. For example, GES derives a node vector based on the IR model: VSM (Vector Space Model). A topology adaptation algorithm and a search protocol are then designed according to the similarity between node vectors of different nodes. Although the single semantic vector is suitable when the distribution of documents in each node is uniform, it may not be efficient when the distribution is diverse. When there are many categories of documents at each node, the node vector representation may be inaccurate. We extend the idea of GES and present a new class-based semantic searching scheme (CSS) specifically designed for unstructured P2P networks with heterogeneous single-node document collection. It makes use of a state-of-the-art data clustering algorithm, online spherical k-means clustering (OSKM), to cluster all documents on a node into several classes. Each class can be viewed as a virtual node. Virtual nodes are connected through virtual links. As a result, the class vector replaces the node vector and plays an important role in the class-based topology adaptation and search process. This makes CSS very efficient. Our simulation using the IR benchmark TREC collection demonstrates that CSS outperforms GES in terms of higher recall, higher precision, and lower search cost.  相似文献   

2.
Distributed hash tables (DHTs) excel at exact-match lookups, but they do not directly support complex queries such as semantic search that is based on content. In this paper, we propose a novel approach to efficient semantic search on DHT overlays. The basic idea is to place indexes of semantically close files into same peer nodes with high probability by exploiting information retrieval algorithms and locality sensitive hashing. A query for retrieving semantically close files is answered with high recall by consulting only a small number (e.g., 10–20) of nodes that stores the indexes of the files semantically close to the query. Our approach adds only index information to peer nodes, imposing only a small storage overhead. Via detailed simulations, we show that our approach achieves high recall for queries at very low cost, i.e., the number of nodes visited for a query is about 10–20, independent of the overlay size.  相似文献   

3.
支持语义的P2P搜索研究   总被引:6,自引:0,他引:6  
传统的P2P系统基于单特征词搜索,且不支持语义,有一定的局限性。向量空间模型VSM技术的应用解决了P2P系统中多特征词搜索的问题;标识符空间的分割,使相似文档在邻近的节点范围内聚集,提高了搜索的速度;语义思想的应用,使P2P系统能够理解搜索请求,有利于检索性能,特别是查全率的提高。仿真实验的结果表明:实现了多特征词的搜索;搜索收敛的速度较快;支持语义,检索性能得到了提高;节点达到了较好的负载平衡。  相似文献   

4.
提出了一种节点聚类及信息检索算法——NCSearch。NCSearch利用Hilbert曲线的局部性特征保持能力,将有相似内容的节点聚类,形成若干个簇。搜索算法能快速定位到与查询最相关的簇,然后在簇内洪泛查找,返回的结果按相关度排序。模拟测试表明,NCSearch 稳定高效,相比传统算法在搜索效率方面有明显提高。  相似文献   

5.
针对基于兴趣驱动的P2P搜索方法在挖掘节点兴趣和扩展搜索兴趣的上下文语义等方面不足,改进Social-P2P算法,给出考虑搜索行为和节点内容的P2P搜索方法。引入概念格理论,根据节点内容和用户搜索行为建立朋友列表,以朋友列表为形式背景构造概念格,建立兴趣域。搜索消息在概念格内查询,缩短搜索路径和减少搜索消息,概念偏序关系扩展查询消息的上下文语义,增强搜索精确度。实验验证该方法比Social-P2P搜索方法和泛洪搜索方法具有更好的召回率和精确率。  相似文献   

6.
结合One-hop和Chord路由机制,考虑实际网络中节点能力的差异,构造一种双层环路由结构,设计相应的区间查询定位和消息广播算法,提出一种基于本体聚类的双层环P2P网络的Web服务发现方法,实现了服务注册节点分类和查询请求迅速准确的定位。模拟实验结果表明,该方法具有较高搜索效率和较短的响应时间,能够显著地提高查询性能。  相似文献   

7.
为了增强关系数据库中的关键字搜索查询结果,考虑了多表之间以及元组之间的语义关系,提出了一种语义评分函数.该语义评分函数不仅涵盖了当前的评分思想,并且加入新指标来衡量查询结果与查询关键字之间的相关性.基于该评分函数,提出两种以数据块为处理单位的Top-K搜索算法,分别为BA(blocking algorithm)算法和EBA(early-stopping blocking algorithm)算法.EBA在BA基础上引入了过滤域值,以便尽早终止算法的迭代次数.最后实验结果显示语义评分函数保证了搜索结果的高查准率和查全率,所提出的BA算法和EBA算法改善了现有方法的查询性能.  相似文献   

8.
目前大多搜索引擎结果聚类算法针对用户查询生成的网页摘要进行聚类,由于网页摘要较短且质量良莠不齐,聚类效果难以保证。提出了一种基于频繁词义序列的检索结果聚类算法,利用WordNet结合句法和语义特征对搜索结果构建聚类及标签。不像传统的基于向量空间模型的聚类算法,考虑了词语在文档中的序列模式。算法首先对文本进行预处理,生成压缩文档以降低文本数据维度,构建广义后缀树,挖掘出最大频繁项集,然后获取频繁词义序列。从文档中获取的有序频繁项集可以更好地反映文档的主题,把相同主题的搜索结果聚类在一起,与用户查询相关度高的优先排序。实验表明,该算法可以获得与查询相关的高质量聚类及基于语义的聚类标签,具有更高的聚类准确度和更高的运行效率,并且可扩展性良好。  相似文献   

9.
为使基于DHT的结构化P2P网络支持语义检索,提高查全率,提出一种基于DHT和本体的搜索方法SOC (semantic ontology chord).针对结构化P2P网络搜索时只能根据关键词精确匹配的缺点,改进了DHT中的资源标识符,利用本体技术进行模糊搜索,并使兴趣相似节点在逻辑上处于邻近位置,提高了P2P网络中资源检索的查全率.使用Peer-Sim模拟器进行了仿真模拟,仿真实验结果表明,随着网络规模的增加,该搜索方法相比Chord模型具有较高的查全率.  相似文献   

10.
王斌  杨晓春  王国仁 《软件学报》2008,19(9):2362-2375
为了增强关系数据库中的关键字搜索查询结果,考虑了多表之间以及元组之间的语义关系,提出了一种语义评分函数.该语义评分函数不仅涵盖了当前的评分思想,并且加入新指标来衡量查询结果与查询关键字之间的相关性.基于该评分函数,提出两种以数据块为处理单位的Top-K搜索算法,分别为BA(blocking algorithm)算法和EBA(early-stopping blocking algorithm)算法.EBA在BA基础上引入了过滤域值,以便尽早终止算法的迭代次数.最后实验结果显示语义评分函数保证了  相似文献   

11.
SSON:一种基于结构化P2P网络路由的语义覆盖网络结构   总被引:1,自引:0,他引:1  
本文基于结构化P2P网络路由机制,采用基于主题划分的方法,提出了基于结构化P2P网络路由的语义覆盖网络SSON。SSON通过结构化P2P网络的标识符映射机制,根据资源类别将结点组织成层次化的覆盖网络,该覆盖网络结构确保搜索限制在与查询主题相关的局部结点子集中。该结构充分利用了结构化P2P网络的优点,解决了基于非结构化P2P网络建立的语义覆盖网络的对主题群的搜索低效问题,同时克服了结构化P2P网络仅支持精确匹配查找的缺点,为结构化P2P网络提供了可靠、高效的语义查询机制,极大地提高了查全率。  相似文献   

12.
A desired P2P file sharing system is expected to achieve the following design goals: scalability, routing efficiency and complex query support. In this paper, we propose a powerful P2P file sharing system, PSON, which can satisfy all the three desired properties. PSON is essentially a semantic overlay network of logical nodes. Each logical node represents a cluster of peers that are close to each other. A powerful peer is selected in each cluster to support query routing on the overlay network while the less powerful peers are responsible for the maintenance of shared contents. To facilitate query routing, super peers are organized in form of a balanced binary search tree. By exploiting the concept of semantics, PSON can support complex queries in a scalable and efficient way. In this paper, we present the basic system design such as the semantic overlay construction, query routing and system dynamics. A load balancing scheme is proposed to further enhance the system performance. By simulation experiments, we show that PSON is scalable, efficient and is able to support complex queries.  相似文献   

13.
针对用户在大规模云对等网络环境下多维区间查询问题,将基于m叉平衡树的索引架构引入到云对等网络环境下,在该架构上实现集中式环境下支持多维数据索引的层次化树结构,例如R树,QR树等。多维区间查询算法保证查询从树的任意位置开始,避免了根节点引起的系统性能瓶颈问题。通过计算和实验验证,对于N个节点的网络,多维区间查询效率为O(logmN)(m>2)(m表示扇出),由此可见,查询效率和维数d无关,查询效率不会随着维数d的增加而降低。最后建立基于扇出m的代价模型,并且计算出了最优的m值。  相似文献   

14.
土木工程监理视频是提高土木工程监理质量的一种有效手段。首先以土木工程监理视频检索为研究对象,建立土木工程监理视频的语义,且对土木工程监理视频数据进行了语义划分,随后结合维基百科相关的部分中文词条和从土木工程监理领域整理的词条进行词向量训练,并使用这些词向量数据对标注数据条目进行训练,为监理视频R树提供含有语义的词向量数据;然后研究基于谱聚类的节点分裂,提出了基于谱聚类的R树节点分裂算法和基于词向量的R树节点检索算法。最后用实际工程的例子说明了所确定的监理视频语义能准确表示监理视频的主要内容,同时实验结果表明本文的算法优化能有效提高土木工程监理视频的索引速度和检索查全率。  相似文献   

15.
主题驱动的P2P分布式信息搜索机制研究   总被引:8,自引:0,他引:8  
Peer—to—Peer(P2P)对于分布式文件共享具有很好的前景,但当前的P2P系统仍然缺乏有效的信息搜索机制.本文提出一种主题驱动的P2P信息搜索机制,通过对节点上的文档进行聚类获得全局主题,然后将包含有相似主题的节点组织到一起构成主题覆盖网络.当在P2P网络中进行信息搜索时,根据查询与主题的相关性路由查询,从而改善搜索效率.本文详细阐述了进行主题驱动搜索的索引结构、主题聚类方法、主题覆盖网络的构造与维护算法.在Chord上的模拟实验结果表明,主题驱动的P2P信息搜索机制可以减少信息搜索时的平均网络带宽和平均搜索路径长度,提高搜索的成功率.  相似文献   

16.
子图同构问题是非确定多项式(NP)完全问题,而轴心子图同构是一种特殊的子图同构问题.针对现在已经有许多高效的子图同构算法,然而对于轴心子图同构问题目前并没有基于GPU的搜索算法,且通过改造已有的子图同构算法来解决轴心子图匹配问题会产生大量不必要的中间结果这一问题,提出了一种基于GPU的轴心子图同构算法.首先,通过一种新...  相似文献   

17.
Search engines result pages (SERPs) for a specific query are constructed according to several mechanisms. One of them consists in ranking Web pages regarding their importance, regardless of their semantic. Indeed, relevance to a query is not enough to provide high quality results, and popularity is used to arbitrate between equally relevant Web pages. The most well-known algorithm that ranks Web pages according to their popularity is the PageRank.The term Webspam was coined to denotes Web pages created with the only purpose of fooling ranking algorithms such as the PageRank. Indeed, the goal of Webspam is to promote a target page by increasing its rank. It is an important issue for Web search engines to spot and discard Webspam to provide their users with a nonbiased list of results. Webspam techniques are evolving constantly to remain efficient but most of the time they still consist in creating a specific linking architecture around the target page to increase its rank.In this paper we propose to study the effects of node aggregation on the well-known ranking algorithm of Google (the PageRank) in the presence of Webspam. Our node aggregation methods have the purpose to construct clusters of nodes that are considered as a sole node in the PageRank computation. Since the Web graph is way to big to apply classic clustering techniques, we present four lightweight aggregation techniques suitable for its size. Experimental results on the WEBSPAM-UK2007 dataset show the interest of the approach, which is moreover confirmed by statistical evidence.  相似文献   

18.
19.
以语义网络理论为基础,结合GCNET拓扑结构,提出一种基于分组的语义对等网络——Semantic GCNET,充分利用GCNET网络具有小世界特征的优点,确保其搜索限制在与查询主题相关的局部节点子集中,解决其他一些语义对等网络对主题群搜索低效的问题,克服一些语义对等网络仅支持精确匹配查找的缺点。实验结果表明,Semantic GCNET具有高效的语义查询性能和查全率。  相似文献   

20.
在非结构化P2P搜索中,由于缺少全局性的管理机制,网络节点无法获得整个网络的拓扑结构及目标数据的定位信息,因此查询消息的路由过程具有较高的随机性,不仅查询性能低,而且宽带消耗大。为在有效控制网络冗余消息规模的同时提高数据的搜索范围,在分析现有2类典型非结构化P2P路由算法的基础上,提出一种基于节点的MQR算法。利用网络节点的状态信息及搜索过程中查询消息的TTL值状态信息,从数据的搜索范围与网络使用情况2个方面来提高非结构化P2P网络搜索性能。仿真实验结果表明,与传统的P2P路由算法APS和Random Walk相比,该算法在搜索准确率、网络利用率及召回率方面有更好的表现。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号