首页 | 本学科首页   官方微博 | 高级检索  
     

改进的最大嫡权值算法在文本分类中的应用
引用本文:李学相.改进的最大嫡权值算法在文本分类中的应用[J].计算机科学,2012,39(6):210-212.
作者姓名:李学相
作者单位:郑州大学软件技术学院 郑州 450002
基金项目:国家高技术研究发展计划
摘    要:由于传统算法存在着特征词不明确、分类结果有重叠、工作效率低的缺陷,为了解决上述问题,提出了一种改进的最大熵文本分类方法。最大熵模型可以综合观察到的各种相关或不相关的概率知识,对许多问题的处理都可以达到较好的结果。提出的方法充分结合了均值聚类和最大熵值算法的优点,算法首先以香农熵作为最大熵模型中的目标函数,简化分类器的表达形式,然后采用均值聚类算法对最优特征进行分类。经过实验论证,所提出的新算法能够在较短的时间内获得分类后得到的特征集,大大缩短了工作的时间,同时提高了工作的效率。

关 键 词:文本分类  最大熵算法  均值聚类  特征选择

Research of Text Categorization Based on Improved Maximum Entropy Algorithm
LI Xue-xiang.Research of Text Categorization Based on Improved Maximum Entropy Algorithm[J].Computer Science,2012,39(6):210-212.
Authors:LI Xue-xiang
Affiliation:LI Xue-xiang(School of Software Technology,Zhengzhou University,Zhengzhou 450002,China)
Abstract:This paper discussed the problems in text categorization accuracy. In traditional text classification algorithm,different feature words have the same affecte on classification result, and classification accuracy is lower, causing the increase algorithm time complexity. Because the maximum entropy model can integrated various relevant or irrelevant probability knowledge observed, the processing of many issues can achieve better results. In order to solve the above problems, this paper proposed an improved maximum entropy text classification, which fully combines rmcan and maximum entropy algorithm advantages. I}he algorithm firstly takes Shannon entropy as maximum entropy model of the objective function, simplifies classifier expression form, and then uses c-mean algorithm to classify the optimal feature. The simulation results show that the proposed method can quickly get the optimal classification feature subsets,grcatly improve text classification accuracy, compared with the traditional text classification.
Keywords:Next classification  Maximum entropy algorithm  C-mean  Feature selection
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号