类别特征词权重加权文本分类方法 Text Classification Method Based on Class Feature Word Quadric Weight期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

类别特征词权重加权文本分类方法

引用本文：	万乐,刘万春.类别特征词权重加权文本分类方法[J].军民两用技术与产品,2006(3):38-39.

作者姓名：	万乐刘万春

作者单位：	北京理工大学计算机科学与工程系,北京,100081

摘要：	提出了一种针对小训练集环境的文本自动分类方法。在传统自动训练过程中通过训练集为每个类别建立初步类别特征向量,由于初步类别特征向量是在小训练集基础上建立的,含有的类别特征信息不够充分。在初步类别特征向量基础上,标定了一定数量的一级和二级类别核心特征词,在文本/类别相似度计算中,利用自动训练过程得到的核心特征词权重因子对核心特征词权重加权,以提高类别特征向量中类别特征信息的含量。实验结果显示,这种分类方法自动分类重合率达到94.12%以上,与不进行权重加权方法的52.94%相比,有很大提高。
关键词：	文本分类特征抽取核心特征词权重因子
文章编号：	1009-8119(2006)03-0038-02
Text Classification Method Based on Class Feature Word Quadric Weight

Wan Le,Liu Wanchun.Text Classification Method Based on Class Feature Word Quadric Weight[J].Universal Technologies & Products,2006(3):38-39.

Authors:	Wan Le Liu Wanchun

Abstract:	A text classification method for small training set is provided. Initial feature vector extracted by traditional automatic training is lack of plenty feature information. In order to enrich the feature information in feature vectors, this paper proposes a retraining method. In this method, some first-degree and second-degree class core feature words are picked out from the initial feature vectors. By the second automatic training, a weight factors for each degree can be obtained. Such factors are used as the weight of the weight of core feature words while measuring text/class similarity. In illustrations, the minimum coincidence rate of automatic classification accuracy in our method is 94.12%, which is greatly improved from 52.94% in the traditional method.

Keywords:	Text classification Feature extraction Core feature word Weight factor
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏