首页 | 本学科首页   官方微博 | 高级检索  
     

词间相关性在贝叶斯文本分类中的应用研究
引用本文:章舜仲,王树梅,黄河燕,陈肇雄.词间相关性在贝叶斯文本分类中的应用研究[J].计算机工程与应用,2009,45(16):159-161.
作者姓名:章舜仲  王树梅  黄河燕  陈肇雄
作者单位:1.南京理工大学 计算机科学系,南京 210094 2.南京财经大学 电子商务系,南京 210046 3.中国科学院 计算机语言信息工程研究中心,北京 100083
摘    要:针对朴素贝叶斯分类的属性独立性假设的不足,讨论了相关性及多变量相关的概念,给出词间相关度的定义。在TAN分类器的词间相关性分析基础上,提出一种文档特征词相关度估计公式及其在改进朴素贝叶斯分类模型中应用的算法,在Reuters-21578文本数据集上的实验表明,改进算法简单易行,能有效改进贝叶斯分类性能。

关 键 词:文本分类  朴素贝叶斯  事件相关  相关度  树扩展型朴素贝叶斯分类器  
收稿时间:2008-4-1
修稿时间:2008-6-11  

Research on application of word correlation in Naive Bayes text classification
ZHANG Shun-zhong,WANG Shu-mei,HUANG He-yan,CHEN Zhao-xiong.Research on application of word correlation in Naive Bayes text classification[J].Computer Engineering and Applications,2009,45(16):159-161.
Authors:ZHANG Shun-zhong  WANG Shu-mei  HUANG He-yan  CHEN Zhao-xiong
Affiliation:1.Department of Computer Science,Nanjing University of Science and Techology,Nanjing 210094,China 2.Department of Electronic Business,Nanjing University of Finance and Economics,Nanjing 210046,China 3.Computer Language Information Engineering Research Center,Chinese Academy of Sciences,Beijing 100083,China
Abstract:Aiming at the deficiency of Naive Bayes' attribute independence assumption,the concept of correlation and that between multi -variations were discussed,and the definition of correlation degree between terms was presented.Based on the analysis of the correlation between terms of TAN classifier,authors proposed a fomula to evaluate the correlation degree between document feature words and the algorithm of its application to ameliorating Naive Bayesian classifier.The experiments on Reuters-21578 collection sho...
Keywords:text classification  Naive Bayes  event correlation  correlation degree  Tree Augmented Naive Bayes(TAN) classifier
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号