首页 | 官方网站   微博 | 高级检索  
     

结合优化的文档频和LSA的特征选择方法
引用本文:朱颢东,钟勇.结合优化的文档频和LSA的特征选择方法[J].计算机工程与应用,2009,45(34):121-123.
作者姓名:朱颢东  钟勇
作者单位:中国科学院成都计算机应用研究所,成都,610041;中国科学院研究生院,北京,100039
基金项目:四川省科技计划项目,四川省科技厅科技攻关项目 
摘    要:为了提高文本分类算法的效率和精度,必须使用特征选择算法来降低特征空间的维数。然而许多常用特征选择算法在选择属性时,只是利用特征的权重而并没有考虑特征之间的隐含关系,使得得到的特征集存在一定的冗余,并不具备较好的代表性。首先给出了一个基于最小词频的文档频方法,并用它过滤掉一些词条以降低文本矩阵的稀疏性,然后使用LSA进行词语间的语义分析,消除同义词和多义词的影响,提高了文本分类的速度与精确度。实验结果表明此种特征选择方法效果良好。

关 键 词:文本分类  词频  文档频  潜在语义分析
收稿时间:2008-12-9
修稿时间:2009-2-27  

Feature selection method combined on optimized document frequency with LSA
ZHU Hao-dong,ZHONG Yong.Feature selection method combined on optimized document frequency with LSA[J].Computer Engineering and Applications,2009,45(34):121-123.
Authors:ZHU Hao-dong  ZHONG Yong
Affiliation:1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu 610041,China 2.Graduate University of Chinese Academy of Sciences,Beijing 100039,China
Abstract:In order to improve efficiency and accuracy of text categorization algorithms,feature selection algorithm must be used.However,a number of feature selection algorithms selected features by means of weights and do not take into consideration features of hidden relationship,so selected feature subset has some redundancy and is not better representative.This paper presents document frequency method based on minimum word frequency and uses this method to filter out some terms to reduce the sparsity of text matrix,then LSA method is used to analyze semanteme among words and to eliminate the influence of synonyms and polysemous words.The combined method raises the speed and accuracy of text categorization.The experimental results show that the combined method is promising.
Keywords:text categorization  word frequency  document frequency  Latent Semantic Analysis(LSA)
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号