首页 | 本学科首页   官方微博 | 高级检索  
     

Web结构挖掘中基于熵的链接分析法
引用本文:王勇,杨华千,李建福. Web结构挖掘中基于熵的链接分析法[J]. 计算机工程与设计, 2006, 27(9): 1622-1624,1688
作者姓名:王勇  杨华千  李建福
作者单位:重庆教育学院,计算机与现代教育技术系,重庆,400067;重庆教育学院,计算机与现代教育技术系,重庆,400067;重庆教育学院,计算机与现代教育技术系,重庆,400067
摘    要:在Web结构挖掘中,传统的HITS(hyperlinkinducedtopics search)算法被广泛应用来寻找搜索引擎返回页面中的Auto-rity页面和Hub页面.但是在网站中除了有价值的页面内容外,还有很多与页面内容无关的链接,如广告、链接导航等.由于这些链接的存在,应用HITS算法时就会导致某些广告网页或无关网页获得较高的Authority值和Hub值.为了解决这个问题,在原有HITS算法的基础上,引入了香农信息熵的概念,提出了基于熵的网页链接分析方法来挖掘网页结构.该算法的核心思想是用信息熵来表示链接文本所隐含的知识.

关 键 词:主题提取    链接分析  Web结构挖掘
文章编号:1000-7024(2006)09-1622-03
收稿时间:2005-04-01
修稿时间:2005-04-01

Entropy-based link analysis algorithm for web structure mining
WANG Yong,YANG Hua-qian,LI Jian-fu. Entropy-based link analysis algorithm for web structure mining[J]. Computer Engineering and Design, 2006, 27(9): 1622-1624,1688
Authors:WANG Yong  YANG Hua-qian  LI Jian-fu
Affiliation:Department of Computer and Modem Education Technology, Chongqing Education College, Chongqing 400067, China
Abstract:In Web structure mining,hyperlink induced topics search(HITS) algorithm has been widely employed to analyze authorities and hubs of pages returned by search engine.However,except for useful information,most of content sites contain some irrelevant hy-perlinks,such as advertisements and navigation panels.And because of these extra hyperlinks,HITS is found insufficient in analyzing advertisement or irrelevantpages,which would result in high authority values or hub values for these pages.In order to solve this problem,Shannon information entropy is introduced to HITS algorithm,thus the entropy-based link analysis algorithm is presented to mine Web informative structures.The key idea of this algorithm is to utilize shannon information entropy to represent the knowledge hided in link texts.
Keywords:topic distillation   entropy   link analysis   web structure mining
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号