首页 | 本学科首页   官方微博 | 高级检索  
     

文本分类中CTM模型的优化和可视化应用研究
引用本文:马长林,杨正良,谢罗迪. 文本分类中CTM模型的优化和可视化应用研究[J]. 计算机工程与科学, 2017, 39(3): 599-604
作者姓名:马长林  杨正良  谢罗迪
作者单位:;1.华中师范大学计算机学院
基金项目:国家自然科学基金(61003192)
摘    要:如何从海量文本中自动提取相关信息已成为巨大的技术挑战,文本分类作为解决该问题的重要方法已引起广大关注,而其中文本表示是影响分类效果的关键因素。为此采用相关主题模型进行文本表示,以保证信息完整同时表现主题相关性;基于该模型,对主题数目和特征提取实施了优化处理,综合复杂度和对数似然函数来确定最优主题数目,引入基于互信息的主成分分析算法进行最优特征提取,降低数据维度和特征冗余,使用R语言进行可视化实验分析。

关 键 词:文本分类  CTM模型  特征提取
收稿时间:2016-09-20
修稿时间:2017-03-25

Optimization and visualization applicationof CTM model in text classification
MA Chang-lin,YANG Zheng-liang,XIE Luo-di. Optimization and visualization applicationof CTM model in text classification[J]. Computer Engineering & Science, 2017, 39(3): 599-604
Authors:MA Chang-lin  YANG Zheng-liang  XIE Luo-di
Affiliation:(School of Computer,Central China Normal University,Wuhan 430079,China)
Abstract:How to automatically extract related information from enormous texts has become a huge challenge. As an efficient way to solve this problem, text classification has attracted much attention, in which text representation is a critical factor to affect classification results. The correlated topic model can implement text representation, which can correctly reflect the correlation between topics under the case to remain the integrity of information. Based on this model, we optimize feature selection and the number of topics, and determine the number of topics with perplexity and log-likelihood function. We adopt the principal component analysis algorithm based on mutual information to optimize feature selection, which can reduce data dimension and the redundancy of text features. The R language is used to visualize the experimental results.
Keywords:text classification  CTM model  feature selection  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号