首页 | 本学科首页   官方微博 | 高级检索  
     

一种用于贝叶斯分类器的文本特征选择方法
引用本文:陈景年,黄厚宽,田凤占,瞿有利.一种用于贝叶斯分类器的文本特征选择方法[J].计算机工程与应用,2008,44(13):24-26.
作者姓名:陈景年  黄厚宽  田凤占  瞿有利
作者单位:1.北京交通大学 计算机与信息技术学院,北京 100044 2.山东财政学院 信息与计算科学系,济南 250014
摘    要:特征选择是文本分类中一种重要的文本预处理技术,它能够有效地提高分类器的精度和效率。文本分类中特征选择的关键是寻求有效的特征评价指标。一般来说,同一个特征评价指标对不同的分类器,其效果不同,由此,一个好的特征评价指标应当考虑分类器的特点。由于朴素贝叶斯分类器简单、高效而且对特征选择很敏感,因此,对用于该种分类器的特征选择方法的研究具有重要的意义。有鉴于此,提出了一种有效的用于贝叶斯分类器的多类别文本特征评价指标:CDM。利用贝叶斯分类器在两个多类别的文本数据集上进行了实验。实验结果表明提出的CDM指标具有比其它特征评价指标更好的特征选择效果。

关 键 词:文本分类  特征选择  文本预处理  朴素贝叶斯  
文章编号:1002-8331(2008)13-0024-03
收稿时间:2007-12-12
修稿时间:2007年12月12

Method of feature selection for text categorization with bayesian classifiers
CHEN Jing-nian,HUANG Hou-kuan,TIAN Feng-zhan,QU You-li.Method of feature selection for text categorization with bayesian classifiers[J].Computer Engineering and Applications,2008,44(13):24-26.
Authors:CHEN Jing-nian  HUANG Hou-kuan  TIAN Feng-zhan  QU You-li
Affiliation:1.School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China 2.Department of Information and Computing Science,Shandong University of Finance,Ji’nan 250014,China
Abstract:Feature selection is an important preprocessing technology in text classification.It can improve the efficiency and accuracy of a text classifier.The key of feature selection in text classification is to find an effective feature evaluation metric.In general,the effect of a feature evaluation metric for various classifiers can be very different,and thus a good feature evaluation metric should consider classifier characteristics.As the Na ve Bayesian classifier is very simple and efficient and highly sensitive to feature selection,so the research of feature selection specially for it is important.This paper presents a feature evaluation metric for the Na ve Bayesian classifier applied on multi-class text datasets:Class Discriminating Measure(CDM).Experiments of text classification with Na ve Bayesian classifiers were carried out on two multi-class texts collections.As the results indicate,CDM gains obviously better selecting effect than other feature selection approaches.
Keywords:text classification  feature selection  text preprocessing  Na ve Bayes
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号