首页 | 本学科首页   官方微博 | 高级检索  
     

类别特征词权重加权文本分类方法
引用本文:万乐,刘万春.类别特征词权重加权文本分类方法[J].军民两用技术与产品,2006(3):38-39.
作者姓名:万乐  刘万春
作者单位:北京理工大学计算机科学与工程系,北京,100081
摘    要:提出了一种针对小训练集环境的文本自动分类方法。在传统自动训练过程中通过训练集为每个类别建立初步类别特征向量,由于初步类别特征向量是在小训练集基础上建立的,含有的类别特征信息不够充分。在初步类别特征向量基础上,标定了一定数量的一级和二级类别核心特征词,在文本/类别相似度计算中,利用自动训练过程得到的核心特征词权重因子对核心特征词权重加权,以提高类别特征向量中类别特征信息的含量。实验结果显示,这种分类方法自动分类重合率达到94.12%以上,与不进行权重加权方法的52.94%相比,有很大提高。

关 键 词:文本分类  特征抽取  核心特征词  权重因子
文章编号:1009-8119(2006)03-0038-02

Text Classification Method Based on Class Feature Word Quadric Weight
Wan Le,Liu Wanchun.Text Classification Method Based on Class Feature Word Quadric Weight[J].Universal Technologies & Products,2006(3):38-39.
Authors:Wan Le  Liu Wanchun
Abstract:A text classification method for small training set is provided. Initial feature vector extracted by traditional automatic training is lack of plenty feature information. In order to enrich the feature information in feature vectors, this paper proposes a retraining method. In this method, some first-degree and second-degree class core feature words are picked out from the initial feature vectors. By the second automatic training, a weight factors for each degree can be obtained. Such factors are used as the weight of the weight of core feature words while measuring text/class similarity. In illustrations, the minimum coincidence rate of automatic classification accuracy in our method is 94.12%, which is greatly improved from 52.94% in the traditional method.
Keywords:Text classification  Feature extraction  Core feature word  Weight factor
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号