首页 | 本学科首页   官方微博 | 高级检索  
     

基于主题模型的水利信息分类方案设计
引用本文:诸葛庆子,张审问,蔡朝晖,徐华,周琦. 基于主题模型的水利信息分类方案设计[J]. 水利信息化, 2018, 0(6)
作者姓名:诸葛庆子  张审问  蔡朝晖  徐华  周琦
作者单位:武汉大学计算机学院,,,,
摘    要:水利信息分类是水利科学数据共享标准化最为重要的一项工作,因此对水利领域大量数据信息的分类十分有必要。针对水利文本数据非结构化的特点,设计一个基于主题模型的水利文本信息分类方案,通过结合LDA主题模型和GloVe词向量模型的优点,提出一种新的主题模型。利用AdaBoost算法改进KNN分类器,在迭代中对分类器的错误进行适应性调整,最终得到分类器的集合。实验结果表明,使用AdaBoost提升KNN对于水利文本分类效果良好,分类效果远好于常见的朴素贝叶斯和决策树,和原来的KNN分类器相比,微观准确率提高1.1个百分点,宏观准确率提高了4.1个百分点,说明在水利文本分类中使用AdaBoost算法可提升KNN分类器的有效性。

关 键 词:主题模型  水利文本信息  文本分类  方案  LDA  GloVe
收稿时间:2018-09-29
修稿时间:2018-11-20

Design of water conservancy information classification scheme based on theme model
Abstract:The classification of water conservancy information is the most important work of data sharing standardization in water conservancy science. Therefore, it is necessary to classify a large amount of data information in water conservancy fields. Aiming at the unstructured characteristics of water-based text data, a topic-based model of water-based text information classification scheme was designed. By combining the advantages of LDA theme model and GloVe word vector, a new topic model was proposed. The AdaBoost algorithm is used to improve the KNN classifier, and the error of the classifier is adaptively adjusted in the iteration, and finally the set of classifiers is obtained. The experimental results show that using AdaBoost to improve KNN has a good effect on classification of water conservancy texts, and the classification effect is much better than the common naive Bayes and decision trees. Compared with the original KNN classifier, the microscopic accuracy is improved by 1.1%, and the macro accuracy rate is improved. Increased by 4.1 percentage points. Explain that the AdaBoost algorithm is used to improve the validity of the KNN classifier in the classification of hydraulic texts.
Keywords:topic model   hydraulic text information   text classification   Design   LDA   GloVe
本文献已被 CNKI 等数据库收录!
点击此处可从《水利信息化》浏览原始摘要信息
点击此处可从《水利信息化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号