首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进决策树算法的网络关键资源页面判定
引用本文:刘奕群,张敏,马少平.基于改进决策树算法的网络关键资源页面判定[J].软件学报,2005,16(11):1958-1966.
作者姓名:刘奕群  张敏  马少平
作者单位:智能技术与系统国家重点实验室(清华大学),北京,100084
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.60223004,60321002,60303005(国家自然科学基金);the National Grand Fundamental Research 973 Program of China under Grant No.2004CB318108(国家重点基础研究发展规划(973));the Key Project of Chinese Ministry of Education under GrantNo.104236(国家教育部科学技术研究重大项目资助)
摘    要:关键资源页面是网络信息环境中一种重要的高质量页面,是用户进行网络信息检索的主要目标.决策树算法是机器学习中应用最广的归纳推理算法之一,适用于关键资源页面的判定.然而由于Web数据均一采样的困难性,算法缺乏有足够代表性的反例进行训练.为了解决这个问题,提出一种利用训练样例的统计信息而非个体信息进行学习的改进决策树算法,并利用这种算法实现了独立用户查询的关键资源页面判定.在2003年文本信息检索会议(Text Retrieval Confefence,简称TREC)标准的评测条件下,基于此种改进决策树算法的大规模网络信息检索实验获得了超过基本算法40%的性能提高.这不仅提供了一种查找Web关键资源页面的有效方式,也给出了提高决策树算法性能的一个可行途径.

关 键 词:网络信息检索  关键资源页面  机器学习  决策树
文章编号:1000-9825/2005/16(11)1958
收稿时间:07 26 2004 12:00AM
修稿时间:6/2/2005 12:00:00 AM

Web Key Resource Page Judgment Based on Improved Decision Tree Algorithm
LIU Yi-Qun,ZHANG Min and MA Shao-Ping.Web Key Resource Page Judgment Based on Improved Decision Tree Algorithm[J].Journal of Software,2005,16(11):1958-1966.
Authors:LIU Yi-Qun  ZHANG Min and MA Shao-Ping
Abstract:Key resource page is one of the most important search target pages for Web search users. Decision tree learning is one of the most widely-used and practical methods for inductive inference in machine learning. Because of the difficulty in uniform sampling of Web pages, there are not enough negative instances for training a key resource decision tree. To solve the problem, the original algorithm is partly modified to learn from global instead of individual instance information. With the same evaluation method as TREC (Text Retrieval Conference) 2003, large scale retrieval experiments based on improved decision tree algorithm achieves more than 40% improvement than the ones based on the original algorithm. It not only offers an effective way for selecting Web key resource pages, but also shows a possible way to improve decision tree learning performances.
Keywords:Web information retrieval  key resource page  machine learning  decision tree
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号