首页 | 本学科首页   官方微博 | 高级检索  
     

统计模式识别中的维数削减与低损降维
引用本文:宋枫溪,高秀梅,刘树海,杨静宇.统计模式识别中的维数削减与低损降维[J].计算机学报,2005,28(11):1915-1922.
作者姓名:宋枫溪  高秀梅  刘树海  杨静宇
作者单位:1. 哈尔滨工业大学深圳研究生院,深圳,518000;炮兵学院二系,合肥,230031
2. 淮阴师范学院计算机系,淮阴,223001
3. 炮兵学院二系,合肥,230031
4. 南京理工大学计算机系,南京,210094
摘    要:较为全面地回顾了统计模式识别中常用的一些特征选择、特征提取等主流特征降维方法,介绍了它们各自的特点及其适用范围,在此基础上,提出了一种新的基于最优分类器——贝叶斯分类器的可用于自动文本分类及其它大样本模式分类的特征选择方法——低损降维.在标准数据集Reuters-21578上进行的仿真实验结果表明,与互信息、χ^2统计量以及文档频率这三种主流文本特征选择方法相比,低损降维的降维效果与互信息、χ^2统计量相当,而优于文档频率.

关 键 词:维数削减  特征选择  特征抽取  低损降维  文本分类
收稿时间:2002-10-22
修稿时间:2002-10-222005-09-25

Dimensionality Reduction in Statistical Pattern Recognition and Low Loss Dimensionality Reduction
SONG Feng-Xi,GAO Xiu-Mei,LIU Shu-Hai,YANG Jing-Yu.Dimensionality Reduction in Statistical Pattern Recognition and Low Loss Dimensionality Reduction[J].Chinese Journal of Computers,2005,28(11):1915-1922.
Authors:SONG Feng-Xi  GAO Xiu-Mei  LIU Shu-Hai  YANG Jing-Yu
Affiliation:1.Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518000; 2.New Star Research Institute of Applied Technology in Hefei City, Hefei 230031; 3.Department of Computer, Huaiyin Teachers College, Huaiyin 223001; 4.Department of Computer, Nanjing University of Science and Technology, Nanjing 210094
Abstract:First,authors review the prevailing feature selection methods such as Exhaustive Search,Genetic Algorithm,Sequential Forward Floating Selection,and Best Individual Features,and feature extraction approaches such as Principal Component Analysis,Fisher Discriminant Analysis,and Projection Pursuit for feature space dimensionality reduction in statistical pattern recognition.Second,authors discuss the characteristics and the applicable domains of all these techniques.Third,authors propose a novel feature selection method based on so-called optimal classifier,Bayesian classifier.The new feature selection method,i.e.the low loss dimensionality reduction(LLDR),is applied in automatic text categorization and compared with the prevailing feature selection methods such as Mutual Information(MI),Chi-square Statistic(CHI),and Document Frequency(DF) in automatic text categorization.Experimental results performed on the well known dataset Reuters-21578 show that the ability for dimensionality reduction of LLDR compared with those of MI and CHI,and higher than that of DF.Considering that LLDR is more computational efficient than MI and CHI,LLDR is a promising feature selection method for automatic text categorization.
Keywords:dimensionality reduction  feature selection  feature extraction  low loss dimensionality reduction  text categorization
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号