首页 | 本学科首页   官方微博 | 高级检索  
     

大规模文本分类中特征提取方法的比较研究
引用本文:何海斌,司建辉.大规模文本分类中特征提取方法的比较研究[J].数字社区&智能家居,2009(21).
作者姓名:何海斌  司建辉
作者单位:河北大学数学与计算机学院;
摘    要:文本分类中特征向量空间是高维和稀疏的,降维处理是分类的关键步骤。针对传统特征提取方法的不足,提出采用基于迭代的CCIPCA和ICA特征提取方法处理大规模文本分类问题,实验结果表明降维提高了分类效果。在CCIPCA、ICA及ICA与IG组合降维的方法中,基于ICA降维的分类效果是最好的。

关 键 词:大规模文本分类  特征提取  直观无协方差增量主元分析  独立成分分析  

Comparative Study on Feature Extraction in Large-scale Text Categorization
HE Hai-bin,SI Jian-hui.Comparative Study on Feature Extraction in Large-scale Text Categorization[J].Digital Community & Smart Home,2009(21).
Authors:HE Hai-bin  SI Jian-hui
Affiliation:College of Mathematics and Computer Science;Hebei University;Baoding 071002;China
Abstract:Feature space is high dimensional and sparse in text categorization, the process of dimension reduction is a very key problem for large-scale text categorization. The classical methods of feature extraction are inadequate to deal with these problems. In this paper the contrast experiment carries on large-scale text categorization by using CCIPCA and ICA, the result shows that ICA achieves the best performance among CCIPCA,ICA and ICA-IG in the same data set.
Keywords:large-scale text categorization  feature extraction  CCIPCA  ICA  
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号