首页 | 本学科首页   官方微博 | 高级检索  
     

基于遗传算法的定题信息搜索策略
引用本文:许欢庆,王永成,孙强.基于遗传算法的定题信息搜索策略[J].中文信息学报,2003,17(1):25-31.
作者姓名:许欢庆  王永成  孙强
作者单位:上海交通大学计算机系
基金项目:国家自然科学基金资助项目 (6 0 0 82 0 0 3)
摘    要:定题检索将信息检索限定在特定主题领域,提供主题领域内信息的检索服务。它是新一代搜索引擎的发展方向之一。定题检索的关键技术是主题相关信息的搜索。本文提出了基于遗传算法的定题信息搜索策略,提高链接于内容相似度不高的网页之后的页面被搜索的机会,扩大了相关网页的搜索范围。同时,借助超链Metadata的提示信息预测链接页面的主题相关度,加快了搜索速度。对比搜索试验证明了算法具有较好的性能。

关 键 词:计算机应用  中文信息处理  定题检索  定题信息搜索  遗传算法  Hub  authority  
文章编号:1003-0077(2003)01-0025-07
修稿时间:2002年3月18日

Focused Crawling Based on Genetic Algorithm
XU Huan-qing,WANG Yong-cheng,SUN Qiang.Focused Crawling Based on Genetic Algorithm[J].Journal of Chinese Information Processing,2003,17(1):25-31.
Authors:XU Huan-qing  WANG Yong-cheng  SUN Qiang
Affiliation:Department of Computer Science ,Shanghai Jiao Tong University
Abstract:The exponential growth of information available on the WWW makes it increasingly difficult to crawl and index the entire internet for general-purpose crawlers.Rather than collecting and indexing all accessible web documents to answer all possible ad-hoc queries,focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl,and avoids irrelevant regions of the Web.In this paper,a new focused crawling approach based on Generic Algorithm is proposed.The method electively seeks out pages that are relevant to a pre-defined set of topics using Generic Algorithm,increases the crawling chance of the web page following the web page with the low content-relevance,and broadens the relevant-searching scope of crawlers.Meanwhile,the hyperlink metadata is used to predict the topic-relevance of the web page pointed and quickens the information crawling.Experimental results indicate that our approach has better performance.
Keywords:computer application  Chinese information processing  topic-specific retrieval  focused crawling  GA  Hub  authority
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号