首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
面向主题的Web信息采集需判断提取的URL链接主题相关性。基于主题链接上下文提取,主题型语义块采用提取链接周围一定长度的文本,目录型和图片型语义块利用DOM树层次结构,对链接数据进行URL相关性判定;利用知网基于语义相似度的链接判定,给出一种综合内容和链接结构分析的URL主题相关性判定NPR算法,比较PageRank算法能提供更精确的主题页面。其成果对我国信息机构进行学科网络信息资源的深度建设有实用价值。  相似文献   

2.
Web搜索算法研究综述   总被引:1,自引:0,他引:1  
介绍了PageRank和HITS两种最常见的算法,对基于链接结构分析的web搜索算法的研究进展进行了综述,主要包括:介绍了独立于查询的各种改进算法以及基于查询主题的有关算法,并分析上述算法的优缺点及其改进策略或方法,以及web搜索算法的关键技术和应用,最后是关于Web搜索算法存在的问题和研究展望。  相似文献   

3.
基于查询扩展的Web链接主题提取算法   总被引:1,自引:0,他引:1  
HITS(Hypertext-Induced Topic Search)算法被广泛用于W曲链接结构分析,但它很容易产生主题漂移.从语义相关性角度进行分析,发现HITS算法产生主题漂移的原因在于页面被投影到错误的潜在语义基上.提出一种基于查询扩展的超链主题提取算法,利用用户查询日志扩展查询词,构造符合用户需要的个性化根集和基础集合,再利用HITS算法计算Web页面的权成值和中心值.实验结果表明,基于查询扩展的超链主题提取算法可以很好地改善HITS算法所导致的主题漂移问题,更适合于Web查询的需要.  相似文献   

4.
链接分析对主题爬虫的改进   总被引:5,自引:0,他引:5  
汪涛  樊孝忠 《计算机应用》2004,24(Z2):174-176
在分析总结两种主题爬虫设计的基础之上,研究了用链接分析改进主题爬虫的方法.通过实验,比较引入链接分析前后的结果,论证了其设计可行性与可操作性,为实现定向信息采集奠定了良好的基础.  相似文献   

5.
一种基于相似度分析的主题提取和发现算法   总被引:19,自引:1,他引:19       下载免费PDF全文
王晓宇  熊方  凌波  周傲英 《软件学报》2003,14(9):1578-1585
试图从另一个角度来考察主题提取算法HITS,即提出一种基于相似度的链接分析模型来观察主题提取的过程.通过给出一种一般化的相似度定义,提出了一种仅使用链接分析来改善主题提取的质量的主题提取算法.同时,还将主题发现的功能也结合到了算法的框架中.通过该功能,用户可以搜索到次流行的主题.实验结果显示了这一新算法的两个优点:不必使用内容分析即能改善主题提取的质量以及能够进一步发现在查询结果中显现出来的不同主题.  相似文献   

6.
万维网的链接结构分析及其应用综述   总被引:47,自引:0,他引:47  
王晓宇  周傲英 《软件学报》2003,14(10):1768-1780
当今万维网的规模已经快速发展到包含大约80亿个网页和560亿个超链接.此外,对万维网的创建进行全局规划显然是不可能的.这些都对万维网的相关研究提出了挑战.另一方面,互联网环境下通过超链连接起来的网页,为人们的日常和商务用途提供了非常丰富的信息资源,但前提是必须掌握有效的办法来理解万维网.链接结构分析在万维网的很多研究领域起着越来越重要的作用.全面介绍了万维网链接分析方面的最新研究进展和应用情况,对链接分析在Web信息搜索、万维网潜在社区发现及万维网建模等方面的研究进展和实际应用进行了综述.  相似文献   

7.
Inherit/Feedback:一种新的Web主题挖掘方法   总被引:4,自引:0,他引:4  
经典链接分析方法(如PageRank和HITS)更多地关注的是网页的权威度,而不是其主题相关度,所以在引导主题搜索的过程中,很快就发生主题漂移.为此,在构建主题关联拓扑模型的基础上,提出了Inherit/Feedback方法,以用于Web主题挖掘.基本思想是:在搜索路径上,一个结点继承其父辈结点的主题相关度,并且将其主题相关度反馈给父辈结点.同时,提出了基于Inhefit/feedback的主题搜索算法(IFC).实验结果表明,这种方法能有效地引导主题搜索,适用于对领域型网站做深层次的搜索和挖掘.  相似文献   

8.
本文对网页链接结构以及主题信息检索系统进行分析,将链接分析方法应用到主题信息检索系统,概述了链接分析方法在主题信息检索系统搜索策略和检索结果排序中的应用以及运用链接分析进行主题页面相关度分析的方法和策略,运用链接分析衡量主题页面权重,使用建立链接分析主题词典的方法对主题信息检索系统进行改进以便于提高定向信息搜索采集效率。  相似文献   

9.
介绍了PageRank和HITS两种最常见的算法,对基于链接结构分析的Web搜索算法的研究进展进行了综述,主要包括:介绍了独立于查询的各种改进算法以及基于查询主题的有关算法,并分析上述算法的优缺点及其改进策略或方法,以及Web搜索算法的关键技术和应用,最后是关于Web搜索算法存在的问题和研究展望。  相似文献   

10.
朱明  王镇  周津 《计算机仿真》2005,22(9):109-112
随着互联网的迅猛发展,如何快速、有效、准确地搜索信息成为迫切需要解决的问题.该文针对传统的基于主题搜索算法执行效率不高、精确度低的缺点,设计了一种基于机器学习的链接分层搜索算法.该算法通过机器学习,得到页面链接模式并对待扩展结点分层.此算法能够有效地获得期望页面,从而避免遍历大量无关页面,提高了主题相关页面的获取效率和准确性.在对100家公司基于产品主题页面的搜索实验中获得了较好的效果,证明该算法具有很好的执行效率和实际可行性.  相似文献   

11.
超链路预测是利用已观测到网络的特性来复现网络中缺失的链路。现有的超链路预测算法通常利用整个网络来进行预测,预测结果会遗漏训练样本数据较少的链路类别,导致预测种类不够全面。为了解决这个问题,提出了基于聚类的超链路预测算法C-CMM,首先对数据集进行聚类分簇,进而对每一个簇建立模型进行超链路预测。所提算法能够充分利用各个簇的观察样本所蕴含的信息,扩大预测结果覆盖的类别。在三个真实数据集上的实验结果表明,C-CMM和多个先进的链路预测算法相比具有更高的预测精度和效率,同时其预测覆盖种类也更加全面。  相似文献   

12.
Abstract

Inducing functions from examples is an important requirement in many learning systems. Blind search is the most general approach, but is vastly less efficient than specialized problem-solving methods. This paper presents a new strategy to accelerate search without sacrificing generality. Experiments with numeric functions show several orders of magnitude performance increase over the standard search technique. Two factors account for this improvement. First, the new strategy manipulates functions in groups instead of singly, so that many can be selected or discarded with only one comparison. Second, functional equivalence is handled automatically by the internal organization of search space.  相似文献   

13.
14.
互联网的崛起为地理信息更新检索提供了一条新的途径,具有实时性强、成本低的优势。文中从实际出发,针对现有爬虫算法的缺陷,提出一种基于链接回溯的地理信息更新主题爬虫方法。首先,结合支持向量机分类技术,能够快速有效地找出一个网站中最有可能包含主题相关内容的链接方向;然后,回溯到这些链接后继续进行爬取,并通过地理信息变化要素知识库确定主题内容,从而优化爬取路径,减少低效率的爬取过程。实验结果表明,该方法可以找出最有可能包含地理信息的链接方向,大幅提高主题爬取效率,在其他主题方向也具有一定的可推广性。  相似文献   

15.
Existing PageRank algorithm exploits the Hyperlink Structure of the web with uniform transition probability distribution to measure the relative importance of web pages. This paper proposes a novel method namely Proportionate Prestige Score (PPS) for prestige analysis. This proposed PPS method is purely based on the exact prestige of web pages, which is applicable to Initial Probability Distribution (IPD) matrix and Transition Probability Distribution (TPD) matrix. This proposed PPS method computes the single PageRank vector with non-uniform transition probability distribution, using the link structure of the web pages offline. This non-uniform transition probability distribution has efficiently overcome the dangling page problem than the existing PageRank algorithm. This paper provides benchmark analysis of ranking methods: PageRank and proposed PPS. These methods are tested with real social network data from three different domains: Social Circle:Facebook, Wikipedia vote network and Enron email network. The findings of this research work propose that the quality of the ranking has improved by using the proposed PPS method compared with the existing PageRank algorithm.  相似文献   

16.
In this article, the effect of a local, content (as opposed to structure) oriented navigation tool is investigated, i.e. mouse-over hyperlink previews. A usability experiment is described in which three groups of participants were exposed to three different versions of a website: without hyperlink previews, with content oriented, semantic previews, and with task-oriented, pragmatic previews. Participants were asked to execute search and recall tasks, and to evaluate task and hypertext. The results showed a decisive overall advantage for previews in terms of efficiency, but no effects on effectiveness or appreciation. Although semantic and pragmatic previews did not differ significantly, a post hoc analysis showed a learning effect of pragmatic previews that was absent in the semantic preview condition. It was concluded that previews fit in with the step-by-step goal orientation of hypertext users. Once users are acquainted with them, pragmatic previews speed up decision making.Apart from the experimental part, the article surveys research into the usability of navigation tools, thereby focusing on the analysis of navigation tools. The bottom line of this review is that most navigation tools as they are used in the experiments provide users with different types of information, e.g. local vs. global, content vs. structure oriented. This complicates the unequivocal explanation of their effect and may explain, together with user and task differences, the variety and inconsistencies observed in the results.  相似文献   

17.
Web Science has favoured macroscopic approaches which have revealed much about the Web's structural patterns. We argue that contextualised knowledge about hyperlinks on the Web has not advanced at the same rate and that complementary intermediate and micro-scale investigations are essential for a better understanding of the motivations, functions and meanings of these links.

We present an investigation that attempted to overcome the shortcomings of current theoretical frameworks and methodological techniques. The focus of this article lies in the demonstration of the viability of studying the web at different scales of analysis without loss of coherence guided by the assumption of the Web as media.

Results of our quali-quantitative, multi-scale and study of the international connectivity of websites registered in Brazil are presented. At the macro-scale, previous indications of high international connectivity are confirmed. Intermediate (meso) and micro-scale analyses focused on the connectivity between Brazilian and German websites and contradicted the conclusions about the meanings and functions of hyperlinks commonly associated with structural analysis. Links between Brazilian and German websites were shown to derive from a large number of formal and generic links, challenging the prevalent association that large quantities of incoming links are an indication of high relevance.  相似文献   

18.
This study investigates the impact of hyperlink affordance, psychological reactance, perceived loss of freedom, perceived business tie between sites, and trust in source site, on trust in target site. Hyperlink affordance represents the extent that the Web encourages users’ behavior. Perceived loss of freedom is based on psychological reactance, which refers to the extent that users react to hyperlink affordance. In order to examine the research model, this study used 305 responses from Korean users to conduct three experiments: (1) evaluate trust transfer from the online source Web site to another online target site (Experiment 1), (2) evaluate trust transfer from an online site to an offline target site (Experiment 2), and (3) evaluate trust transfer from an offline site to an online target site (Experiment 3). Trust is transferred from source to target site in the test results of all three models. The hyperlink affordance affects trust transfer in the test results of Experiment 1. Perceived loss of freedom based on psychological reactance negatively affects trust transfer in the test results of Experiments 2 and 3, which decreases the effect of hyperlink affordances on trust transfer. The perceived business tie between sites affects trust transfer in the test results of Experiment 3. The study provides insights into the application of trust transfer in various settings of source and target site in online and offline business.  相似文献   

19.
Nonprofit, nongovernmental organization (NGO) hyperlink networks are connective public goods, or sets of interorganizational links that enable members and nonmembers to reach like‐minded organizations in order to enhance the visibility of the network’s goals. We extend collective action theory to account for both the level and structural signatures of contributions that generalist and specialist organizations make to these connective public goods. This study examines contributions that 48 English Speaking Islamic Resistance organizations make to a NGO hyperlink network. We found that generalist organizations, or organizations with heterogeneous goals, play several key roles in the connective public good. Generalist NGOs promoted the most legitimate face of the issue network, acting as brokers and authorities to other generalist NGOs, and initiators for both specialist and generalist NGOs.  相似文献   

20.
Jansen  B.J. 《Computer》2006,39(7):88-90
With paid search, the content provider, search engine, and user have mutually supporting goals. With paid or sponsored search, content providers pay Web search engines to display sponsored links in response to user queries alongside the algorithmic links, also known as organic or nonsponsored links.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号