首页 | 本学科首页   官方微博 | 高级检索  
     

超文本的集成分类算法研究
引用本文:阮群锟,许灿,吕劲松. 超文本的集成分类算法研究[J]. 电脑与信息技术, 2010, 18(2): 49-52
作者姓名:阮群锟  许灿  吕劲松
作者单位:湖南大学计算机与通信学院,湖南,长沙,410082
摘    要:随着Internet技术的发展,万维网上的文档数目成指数级增长。在如此浩瀚的信息库中,用户很难找到自己所需要的信息,如何自动且高效地处理这些海量文档信息成为了目前重要的研究课题。文章通过对抽取到的数据集文档中的标题,超连接和标记等超文本信息,以及文档内容本身分别建立分类模型。然后根据神经网络集成各个分类模型得出判别结果,提出了一种基于元信息的超文本集成分类算法,该算法能更好的综合利用超文本的多元结构化信息。实验结果表明,相对于单独利用某种超文本结构信息进行分类的方法。基于元信息的超文本集成分类算法具有更好的分类性能。

关 键 词:文本分类  超文本分类  集成算法

Research on the Ensemble Classification Algorithm of Web Text
RUAN Qun-kun,XU Can,LV Jing-song. Research on the Ensemble Classification Algorithm of Web Text[J]. Computer and Information Technology, 2010, 18(2): 49-52
Authors:RUAN Qun-kun  XU Can  LV Jing-song
Affiliation:College of Computer and Communication;Hunan University;Changsha 410082;China
Abstract:With the development of the Internet techniques, the information on the Internet increases exponentially. It's very difficult for user to find what he wanted in the mass of information. One important research focuses on how to deal with these great capacities of online documents. Test classification is to classify the information extracted from the Intemet into categories, for the convenience of retrieval. This thesis mainly studies some related algorithms on text classification and hypertext classification.nn the research of hypertext classification, this thesis study the hypettext information rules and analysis the classification performance with these rules. Via substracting the hypertext rules some classifiers are constructed and neural network is used to ensemble these results. And an ensemble hypertext meta information classification is proposed (EHC). This classification can integrate the structure information effectly. The experiment shows that EHC gained better performance contrast to only using single rule. K
Keywords:text classification  hypertext classification  ensemble algorithm  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号