首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于向量空间模型的多层次文本分类方法
引用本文:刘少辉,董明楷,张海俊,李蓉,史忠植.一种基于向量空间模型的多层次文本分类方法[J].中文信息学报,2002,16(3):9-14,26.
作者姓名:刘少辉  董明楷  张海俊  李蓉  史忠植
作者单位:中国科学院计算技术研究所智能信息处理重点实验室
基金项目:国家自然科学基金 (6 0 1730 17),北京自然科学基金 (40 110 0 3)支持
摘    要:本文研究和改进了经典的向量空间模型(VSM)的词语权重计算方法,并在此基础上提出了一种基于向量空间模型的多层次文本分类方法。也就是把各类按照一定的层次关系组织成树状结构,并将一个类中的所有训练文档合并为一个类文档,在提取各类模型时只在同层同一结点下的类文档之间进行比较;而对文档进行自动分类时,首先从根结点开始找到对应的大类,然后递归往下直到找到对应的叶子子类。实验和实际系统表明,该方法具有较高的正确率和召回率。

关 键 词:文本分类  向量空间模型  信息增益  特征提取  

An Approach of Multi-hierarchy Text Classification Based on Vector Space Model
LIU Shao,hui,DONG Ming,kai,ZHANG Hai,jun,LI Rong,SHI Zhong,zhi.An Approach of Multi-hierarchy Text Classification Based on Vector Space Model[J].Journal of Chinese Information Processing,2002,16(3):9-14,26.
Authors:LIU Shao  hui  DONG Ming  kai  ZHANG Hai  jun  LI Rong  SHI Zhong  zhi
Affiliation:Laboratory of Intelligent Information Processing ,Institute of Computing Technology ,Chinese Academy of Sciences
Abstract:This paper does research and improves on the classical approach of calculating the term weight in Vector Space Model.Furthermore,an approach of multi hierarchy text classification based on Vector Space Model is proposed.In this approach,all classes are organized as a tree according to some given hierarchical relations,and all the training documents in a class are combined into a class document.In order to construct the class models,it is just only to compare among the class documents attached to the same node of the same layer.When it is going to classify the documents,one matching process is hierarchically performed from the root node to the leaf nodes until a corresponding subclass is found.The experiment and real systems indicate that the approach is of high classification Precision and Recall.
Keywords:Text Classification  Vector Space Model  Information Gain  Feature Selection
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号