首页 | 本学科首页   官方微博 | 高级检索  
     

中文文本分类中的特征选择研究
引用本文:寇苏玲,蔡庆生.中文文本分类中的特征选择研究[J].计算机仿真,2007,24(3):289-291.
作者姓名:寇苏玲  蔡庆生
作者单位:中国科学技术大学计算机系,安徽,合肥230027
摘    要:有多种特征选择算法被用于文本自动分类,YimingYang教授曾针对英文文本分类中的特征选择做过深入的研究,并得出结论:IG和CHI方法效果相对较好.考虑到该结论不一定适合对中文文本的分类,对中文文本分类中的特征选择方法进行研究,采用了包含500篇新闻的中文语料库对几种特征选择算法进行测试,结果表明:在测试的特征选择算法中,χ2估计方法无需因训练集的改变而人为调节特征阀值,并且分类准确率较高.

关 键 词:特征选择  特征提取  文本分类  中文文本分类  特征选择算法  选择研究  Chinese  Text  Classification  分类准确率  阀值  调节  训练集  估计方法  结果  测试  中文语料库  新闻  选择方法  效果  英文  自动
文章编号:1006-9348(2007)03-0289-03
修稿时间:2005-11-30

Research on Feature-Selection in Chinese Text Classification
KOU Su-ling,CAI Qing-sheng.Research on Feature-Selection in Chinese Text Classification[J].Computer Simulation,2007,24(3):289-291.
Authors:KOU Su-ling  CAI Qing-sheng
Affiliation:Department of Computer Science, USTC, Hefei Anhui 230027, China
Abstract:Several algorithms of feature-selection are used in text-classification.Professor YumingYang has deeply studied the feature-selection in English text-classification,and got a conclusion that the IG and CHI are better than the others.Because Chinese is more complicated than English,this paper is about a research into feature -selection in Chinese text-classification.The results of the experiments based on 500 news articles indicate that of all the algorithms the X~2 statistic(CHI)has the highest precision.And when the size of the training set is changed, manual threshold-adjustment is not required.
Keywords:Feature-selection  Feature-abstraction  Text-classification
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号