首页 | 本学科首页   官方微博 | 高级检索  
     

基于向量空间模型的中文文本层次分类方法研究
引用本文:肖雪,何中市.基于向量空间模型的中文文本层次分类方法研究[J].计算机应用,2006,26(5):1125-1126.
作者姓名:肖雪  何中市
作者单位:重庆大学计算机学院
摘    要:在文本分类的类别数量庞大的情况下,层次分类是一种有效的分类途径。针对层次分类的结构特点,考虑到不同的层次对特征选择和分类方法有不同的要求,提出了新的基于向量空间模型的二重特征选择方法FDS以及层次分类算法HTC。二重特征选择方法对每一层均进行一次特征选择,并逐层改变特征数量和权重计算方法;HTC算法把分别对粗分和细分更有效的类中心向量法与SVM方法相结合。实验表明,该方法相对于平面分类和一般的层次分类方法,有较高的准确率。

关 键 词:层次分类  向量空间模型  二重特征选择  权重计算  
文章编号:1001-9081(2006)05-1125-02
收稿时间:2005-11-03
修稿时间:2005-11-032006-01-13

Hierarchical categorization methods of Chinese text based on vector space model
XIAO Xue,HE Zhong-shi.Hierarchical categorization methods of Chinese text based on vector space model[J].journal of Computer Applications,2006,26(5):1125-1126.
Authors:XIAO Xue  HE Zhong-shi
Affiliation:College of Computer Science, Chongqing University, Chongqing 400044, China
Abstract:On large amount conditions of text quantity, hierarchical text categorization was an effective approach. Aiming at structural characteristics of hierarchical text categorization, and considering various demands of texts in different levels on both feature selection and categorization method, a new method, Feature Dual-Selection(FDS), and an algorithm of Hierarchical Text Categorization(HTC) based on vector space model was proposed. FDS is to perform feature selection in each level, and then modify feature number along with term weighting method accordingly; HTC algorithm integrates together center classification method and Support Vector Machine(SVM), which proves more effective for broad classification and subdivision respectively. Finally, experiment results show that the new approach, proposed in this paper, outperforms plain or generic hierarchical methods with improved accuracy.
Keywords:hierarchical categorization  vector space model  FDS(feature dual-selection)  term weighting  
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号