首页 | 本学科首页   官方微博 | 高级检索  
     

面向网页分类的网页摘要方法
引用本文:鲁明羽,沈抖,郭崇慧,陆玉昌. 面向网页分类的网页摘要方法[J]. 电子学报, 2006, 34(8): 1475-1480
作者姓名:鲁明羽  沈抖  郭崇慧  陆玉昌
作者单位:大连海事大学计算机科学与技术学院,辽宁,大连,116026;清华大学计算机科学与技术系,北京,100084;清华大学计算机科学与技术系,北京,100084;清华大学计算机科学与技术系,北京,100084;大连理工大学应用数学系,辽宁,大连,116024
基金项目:家自然科学基金(No.60473115)
摘    要:网页分类是网络挖掘的重要研究内容之一.与文本分类相比,网页分类面临的困难更多.去除网页中的噪声信息可以提高网页分类的精度,基于摘要的网页分类方法利用了这一思想.本文对三种传统的网页摘要方法进行了分析和改进,提出了Content Body摘要方法以及基于四种摘要方法的混合摘要方法;在此基础上,进行了大量基于摘要的网页分类实验.实验结果表明,所有的摘要方法都可以提高分类效果,其中混和摘要方法效果最好,可以使分类的F1值得到12.9%的改进.

关 键 词:网页分类  网页摘要  Content Body  混合摘要方法
文章编号:0372-2112(2006)08-1475-06
收稿时间:2005-08-16
修稿时间:2005-08-162006-01-11

Web-page Summarization Methods for Web-page Classification
LU Ming-yu,SHEN Dou,GUO Chong-hui,LU Yu-chang. Web-page Summarization Methods for Web-page Classification[J]. Acta Electronica Sinica, 2006, 34(8): 1475-1480
Authors:LU Ming-yu  SHEN Dou  GUO Chong-hui  LU Yu-chang
Affiliation:1. Institute of Computer Science and Technology,Dalian Maritime University,Dalian,Liaoning 116026,China;2. Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China;3. Department of Applied Mathematics,Dalian University of Technology,Dalian,Liaoning 116024,China
Abstract:Web-page classification is an important research direction of web mining and much more difficult than pure-text classification. The accuracy of web-page classification can be heightened by getting rid of noisy information embedded in web pages, and the idea is utilized by our proposed summarization-based web-page classification method. In the paper, three traditional web-page summarization methods are analyzed and improved, and the Content Body sum- marization method and an ensemble summarization method based on four summarization methods are proposed. A large amount of experimental results of web-page classification based on summarization show that all the summarization methods can improve the performance of web-page classification algorithms and the ensemble summarization method achieves a 12.9% improvement over pure-text based methods.
Keywords:web-page classification   web-page summarization   content body   ensemble summarization method
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《电子学报》浏览原始摘要信息
点击此处可从《电子学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号