首页 | 本学科首页   官方微博 | 高级检索  
     

结合优化的文档频和PA的特征选择方法*
引用本文:朱颢东,钟勇.结合优化的文档频和PA的特征选择方法*[J].计算机应用研究,2010,27(1):36-38.
作者姓名:朱颢东  钟勇
作者单位:中国科学院,成都计算机应用研究所,成都,610041;中国科学院,研究生院,北京,100039
基金项目:四川省科技计划资助项目(2008GZ0003)
摘    要:特征空间的高维特点限制了分类算法的选择,影响了分类器的设计和准确度,降低了分类器的泛化能力,从而出现分类器过拟合的现象,因此需要进行特征选择以避免维数灾难。首先简单分析了几种经典特征选择方法,总结了它们的不足;然后给出了一个优化的文档频方法,并用它过滤掉一些词条以降低文本矩阵的稀疏性;最后应用模式聚合(PA)理论建立文本集的向量空间模型,从分类贡献的角度强化词条的作用,消减原词条矩阵中包含的冗余模式,从而有效地降低了向量空间的维数,提高了文本分类的精度和速度。实验结果表明此种综合性特征选择方法效果良好。

关 键 词:特征选择    文本分类    词频    文档频    模式聚合

Feature selection method combined optimized document frequency with PA
ZHU Hao-dong,ZHONG Yong.Feature selection method combined optimized document frequency with PA[J].Application Research of Computers,2010,27(1):36-38.
Authors:ZHU Hao-dong  ZHONG Yong
Affiliation:(1.Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu 610041, China; 2.Graduate School, Chinese Academy of Sciences, Beijing 100039, China)
Abstract:Feature space has characteristic of high dimensional, which restricts choice of classification algorithms and makes the classifier hardly design, also lows the generalization ability and makes the classifier overfitting, so feature selection is necessary to avoid curse of dimensionality. This paper firstly analyzed simply several classic feature selection methods and summarized their deficiencies. And then it presented an optimized document frequency method and used this method to filter out some terms to reduce the sparsity of text matrix. Finally,it established the vector space model of text sets weight by means of the theory of PA, which enhanced the function of the words from the viewpoint of categorization effect, decreased the dimension of vector by eliminating redundant features and raised speed and accuracy of text categorization. The experimental results show that the combined method is promising.
Keywords:feature selection  text categorization  word frequency  document frequency  PA
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号