首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 80 毫秒
1.
We develop a new algorithm for clustering search results. Differently from many other clustering systems that have been recently proposed as a post-processing step for Web search engines, our system is not based on phrase analysis inside snippets, but instead uses latent semantic indexing on the whole document content. A main contribution of the paper is a novel strategy – called dynamic SVD clustering – to discover the optimal number of singular values to be used for clustering purposes. Moreover, the algorithm is such that the SVD computation step has in practice good performance, which makes it feasible to perform clustering when term vectors are available. We show that the algorithm has very good classification performance, and that it can be effectively used to cluster results of a search engine to make them easier to browse by users. The algorithm has being integrated into the Noodles search engine, a tool for searching and clustering Web and desktop documents.  相似文献   

2.
应用链接分析的web搜索结果聚类   总被引:3,自引:0,他引:3  
随着web上信息的急剧增长,如何有效地从web上获得高质量的web信息已经成为很多研究领域里的热门研究主题之一,比如在数据库,信息检索等领域。在信息检索里,web搜索引擎是最常用的工具,然而现今的搜索引擎还远不能达到满意的要求,使用链接分析,提出了一种新的方法用来聚类web搜索结果,不同于信息检索中基于文本之间共享关键字或词的聚类算法,该文的方法是应用文献引用和匹配分析的方法,基于两web页面所共享和匹配的公共链接,并且扩展了标准的K-means聚类算法,使它更适合于处理噪音页面,并把它应用于web结果页面的聚类,为验证它的有效性,进行了初步实验,实验结果显示通过链接分析对web搜索结果聚类取得了预期效果  相似文献   

3.
一种基于聚类技术的个性化信息检索方法   总被引:7,自引:2,他引:5       下载免费PDF全文
实践证明聚类技术是改进搜索结果显示方式的一种有效手段。然而,目前的聚类方法没有考虑到用户兴趣,对于相同的查询,返回给所有用户同样的聚类结果。由此提出一种个性化聚类检索方法。该方法改进了k-means算法,利用该算法对传统搜索引擎返回的结果结合用户兴趣进行聚类,返回针对特定用户的网页簇。实验证明该方法能够提供个性化服务,改善了聚类的效果,提高了用户的检索效率。  相似文献   

4.
Future automated question answering systems will typically involve the use of local knowledge available on the users' systems as well as knowledge retrieved from the Web. The determination of what information we should seek out on the Web must be directed by its potential value or relevance to our objective in the light of what knowledge is already available. Here we begin to provide a formal quantification of the concept of relevance and related ideas for systems that use fuzzy‐set‐based representations to provide the underlying semantics. We also introduce the idea of ease of extraction to quantify the ability of extracting relevant information from complex relationships. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 385–396, 2007.  相似文献   

5.
6.
一种基于容错粗糙集的Web搜索结果聚类方法   总被引:1,自引:0,他引:1  
一些Web聚类方法把类严格作为互斥的关系,聚类效果不理想.一种基于容错粗糙集的k均值的聚类解决了这一问题.首先运用向量模型表示Web文档信息,采用常规方法得到文本特征词集,然后利用某些特征词协同出现的价值,构造特征词客错关系,扩充特征词的描述能力,最后用特征词容错类描述文档之间的相似关系,实现了Web搜索结果聚类,并提出了简单直观的衡量聚类精度的T模型.实验结果表明,利用容错关系聚类的类标记描述性强、容易理解、明显优于普通k均值算法.  相似文献   

7.
搜索引擎根据特定关键字查询返回的结果,可以基于语义进行分类组织,提高用户查询效率。但分类方法是基于预定义类别的,由于类别不全或更新不及,对于互联网上的信息可能会造成遗漏。本文提出了一种将分类与聚类方法相结合的方法来优化搜索结果,即分类之后,用聚类的方法来处理未被归入任何类别的信息。研究表明,该方法可以兼顾效率和信息完整性。  相似文献   

8.
9.
夏斌  徐彬 《电脑开发与应用》2007,20(5):16-17,20
针对目前搜索引擎返回候选信息过多从而使用户不能准确查找与主题有关结果的问题,提出了基于超链接信息的搜索引擎检索结果聚类方法,通过对网页的超链接锚文档和网页文档内容挖掘,最终将网页聚成不同的子类别。这种方法在依据网页内容进行聚类的同时,充分利用了Web结构和超链接信息,比传统的结构挖掘方法更能体现网站文档的内容特点,从而提高了聚类的准确性。  相似文献   

10.
针对模糊聚类算法中存在的对初始值敏感、易陷入局部最优等问题,提出了一种融合改进的混合蛙跳算法(SFLA)的模糊C均值算法(FCM)用于Web搜索结果的聚类。新算法中,使用SFLA的优化过程代替FCM的基于梯度下降的迭代过程。改进的SFLA通过混沌搜索优化初始解,变异操作生成新个体,并设计了一种新的搜索策略,有效地提高了算法寻优能力。实验结果表明,该算法提高了模糊聚类算法的搜索能力和聚类精度,在全局寻优能力方面具有优势。  相似文献   

11.
基于 K-center和信息增益的 Web搜索结果聚类方法 *   总被引:1,自引:0,他引:1  
丁振国  孟星 《计算机应用研究》2008,25(10):3125-3127
基于 K-center和信息增益的概念 ,将改进后的 FPF( furthest-point-first)算法用于 Web搜索结果聚类 ,提出了聚类标志方法 ,使得聚类呈现出的结果更易于用户理解 ,给出了评价聚类质量的模型。将该算法与 Lingo, K-means算法进行比较 ,其结果表明 ,本算法能够较好地平衡聚类质量和速度 ,更加适用于 Web检索聚类。  相似文献   

12.
随着信息的爆炸式增长,现有的搜索引擎在很多方面不能满足人们的需要。Web文档聚类可以减小搜索空间,加快检索速度,提高查询精度。提出了一种融合SOM(Self-Organizing Maps)粗聚类和改进PSO(Particle Swarm Optimization)细聚类的Web文档集成聚类算法。首先根据向量空间模型表示法,用特征词条及其权值表示Web文档信息,其次用SOM算法对文档特征集进行粗聚类,得到一组输出权值,然后用这组权值初始化改进的PSO算法,用改进PSO算法对此聚类结果进行细化,最终实现Web文档聚类。仿真结果表明,该算法能有效提高文档查询的查准率和查全率,具有一定的实用价值。  相似文献   

13.
The proliferation of the Internet and World Wide Web applications has created new opportunities as well as new challenges for institutions and individuals who are either receiving or delivering education. Electronic (e) learning is one of the most important developments in education. It recognizes the shift from teaching to learning and puts the learner or user before the institution. The objectives and expected outcomes of e‐learning are largely dependent on the quality of the teaching processes and the effectiveness of online access. Hence, assessing methods for the effectiveness of e‐learning Web sites are a critical issue in both practice and research. However, Web site quality is a complex concept and its measurement is expected to be multidimensional in nature. Multicriteria decision‐making (MCDM) techniques are widely used for evaluating and ranking such problems containing multiple, usually conflicting criteria. For this reason, this article presents a quality evaluation model based on the MCDM to measure the e‐learning Web sites' performance. In addition, the subjectivity and vagueness in the assessment process are dealt with using fuzzy logic. The study has investigated 10 worldwide and 11 locally successful Web sites with the proposed method. By suggesting an aggregated measure based on the Web site quality criteria, it is expected that the method could be useful to the e‐learning service providers and system developers, as well as to the researchers related with Web research. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 567–586, 2007.  相似文献   

14.
Web information may currently be acquired by activating search engines. However, our daily experience is not only that web pages are often either redundant or missing but also that there is a mismatch between information needs and the web's responses. If we wish to satisfy more complex requests, we need to extract part of the information and transform it into new interactive knowledge. This transformation may either be performed by hand or automatically. In this article we describe an experimental agent-based framework skilled to help the user both in managing achieved information and in personalizing web searching activity. The first process is supported by a query-formulation facility and by a friendly structured representation of the searching results. On the other hand, the system provides a proactive support to the searching on the web by suggesting pages, which are selected according to the user's behavior shown in his navigation activity. A basic role is played by an extension of a classical fuzzy-clustering algorithm that provides a prototype-based representation of the knowledge extracted from the web. These prototypes lead both the proactive suggestion of new pages, mined through web spidering, and the structured representation of the searching results. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 1101–1122, 2007.  相似文献   

15.
In this, the Information Age, most people are accustomed to gleaning information from the World Wide Web. To survive and prosper, a Web site has to constantly enliven its content while providing various and extensive information services to attract users. The Web Recommendation System, a personalized information filter, prompts users to visit a Web site and browse at a deeper level. In general, most of the recommendation systems use large browsing logs to identify and predict users' surfing habits. The process of pattern discovery is time-consuming, and the result is static. Such systems do not satisfy the end users' goal-oriented and dynamic demands. Accordingly, a pressing need for an adaptive recommendation system comes into play. This article proposes a novel Web recommendation system framework, based on the Moving Average Rule, which can respond to new navigation trends and dynamically adapts recommendations for users with suitable suggestions through hyperlinks. The framework provides Web site administrators with various methods to generate recommendations. It also responds to new Web trends, including Web pages that have been updated but have not yet been integrated into regular browsing patterns. Ultimately, this research enables Web sites with dynamic intelligence to effectively tailor users' needs. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 621–639, 2007.  相似文献   

16.
搜索引擎结果聚类算法研究   总被引:6,自引:1,他引:5  
随着Web文档数量的剧增,搜索引擎也暴露了许多问题,用户不得不在搜索引擎返回的大量文档摘要列表中查找。而对搜索引擎结果聚类能使用户在更高的主题层次上来查看搜索引擎返回的结果。该文提出了搜索引擎结果聚类的几个重要指标并给出了一个新的基于PAT—tree的搜索引擎结果聚类算法。  相似文献   

17.
随着人们在互联网上的行为日益丰富,互联网上的社交行为和关系逐渐接近传统的客观世界的社交网络,并能够真实反映出人与人之间在客观世界的真实关系。可以从互联网中通过搜索的方式来构建一个真实客观世界的社会网络。社会网络搜索技术及其方法逐渐成为目前的研究热点,如何对每个Web进行人名同一性判断是社会网络搜索的关键技术。为了从文本中抽取准确的特征并降低向量维度,本文给出了一个基于C-value和逆文档频率IDF的特征向量权值计算方法;实现了基于余弦夹角的相似度计算的算法;通过对文本聚类算法中层次聚类算法和划分聚类算法的研究,给出一种改进的层次聚类算法来实现人名同一性判断。以搜索引擎的人名检索结果进行测试,说明了基于改进的层次聚类算法能有效地提高人名同一性判断的性能。  相似文献   

18.
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. The main advantage of fuzzy ant based clustering, a technique inspired by the behavior of ants clustering dead nestmates into piles, is that no specification of the number of output clusters is required. This makes the algorithm very well suited for the Web Person Disambiguation task, where we do not know in advance how many individuals each person name refers to. We compare our results with state-of-the-art partitional and hierarchical clustering approaches (k-means and Agnes) and demonstrate favorable results. This is particularly interesting as the latter involve manual setting of a similarity threshold, or estimating the number of clusters in advance, while the fuzzy ant based clustering algorithm does not.  相似文献   

19.
为了实现Web服务请求数据的快速聚类,并提高聚类的准确率,提出一种基于增量式时间序列和任务调度的Web数据聚类算法,该算法进行了Web数据在时间序列上的聚类定义,并采用增量式时间序列聚类方法,通过数据压缩的形式降低Web数据的复杂性,进行基于服务时间相似性的时间序列数据聚类。针对Web集群服务的最佳服务任务调度问题,通过以服务器执行能力为标准来分配服务任务。实验仿真结果表明,相比基于网格的高维数据层次聚类算法和基于增量学习的多目标模糊聚类算法,提出的算法在聚类时间、聚类精度、服务执行成功率上均获得了更好的效果。  相似文献   

20.
基于本体的Web页面聚类研究   总被引:4,自引:1,他引:3  
提出了一个基于本体的Web页面聚类系统原型,通过构建一个简单的搜索引擎并对结果进行聚类,大大节省用户发现所需信息的时间.同时将领域本体引入聚类系统中,提高了聚类效率和增强了聚类结果的可解释性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号