首页 | 本学科首页   官方微博 | 高级检索  
     

WEB主题检索的性能优化设计
引用本文:田学东,李树成.WEB主题检索的性能优化设计[J].计算机工程与应用,2006,42(4):183-185,188.
作者姓名:田学东  李树成
作者单位:河北大学数学与计算机学院,河北,保定,071002
摘    要:Web主题检索是信息检索领域一个将采集技术与过滤方法结合的新兴方向,也是信息处理领域的研究热点。针对现有主题检索系统在Web页面文本的主题相关性判断和Spider搜索策略方面存在的问题,引入两个性能优化方案,即利用信息抽取技术,提出了一种基于模式集的主题相关性判断方法来提高主题判断准确度;针对pagerank在主题检索中存在的不足,引入基于增强学习的页面评估算法,提出了Web环境优先的搜索策略。最后根据实验结果评估两个算法的性能。

关 键 词:信息抽取技术  信息抽取模式  模式匹配  WEB环境  增强学习
文章编号:1002-8331-(2006)03-0183-03
收稿时间:2005-06
修稿时间:2005-06

Performance Optimization of Web Topic Search
Tian Xuedong,Li Shucheng.Performance Optimization of Web Topic Search[J].Computer Engineering and Applications,2006,42(4):183-185,188.
Authors:Tian Xuedong  Li Shucheng
Affiliation:College of Mathematics and Computer Science, Hebei University, Baoding, Hebei 071002
Abstract:Focused web crawling is a new crawling direction in the field of information retrieval which is combined with filtering methods.And it also is a research hotspot in the information processing field.In order to improve the performace of the Web Topic Search System,the paper introduces two performace optimization methods.One method, based on information extraction,is presented to improve the accuracy of obtained documents;The other one is a new Web Topic search strategy based on WEB environment precedence,which uses a function,based on reinforcement learning,to value Web pages and characterize Web topic environment.Thls method works well in promoting the search efficiency on rare information in effect.Finally,the performace of two methods is evaluated by experiments.
Keywords:information extraction  extraction pattern  pattern matching  WEB Environment  reinforcement learning
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号