首页 | 本学科首页   官方微博 | 高级检索  
     

BERT与GSDMM融合的聚类短文本分类
引用本文:刘豪,王雨辰. BERT与GSDMM融合的聚类短文本分类[J]. 计算机系统应用, 2022, 31(2): 267-272
作者姓名:刘豪  王雨辰
作者单位:中国科学技术大学 管理学院 统计与金融系, 合肥 230041;中国科学技术大学 管理学院 国际金融研究院, 合肥 230041
基金项目:安徽省自然科学基金青年项目 (1908085AG299)
摘    要:在文本分类任务中,由于短文本具有特征稀疏,用词不规范等特点,传统的自然语言处理方法在短文本分类中具有局限性.针对短文本的特点,本文提出一种基于BERT(bidirectional encoder representations from Transformers)与GSDMM(collapsed Gibbs sampl...

关 键 词:GSDMM  BERT  SVM  短文本分类  聚类指导  语义向量
收稿时间:2021-04-06
修稿时间:2021-04-30

Clustering Short Text Classification Based on Fusion of BERT and GSDMM
LIU Hao,WANG Yu-Chen. Clustering Short Text Classification Based on Fusion of BERT and GSDMM[J]. Computer Systems& Applications, 2022, 31(2): 267-272
Authors:LIU Hao  WANG Yu-Chen
Affiliation:Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230041, China; International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230041, China
Abstract:In the task of text classification, traditional natural language processing methods have limitations in short text classification due to the sparse features and irregular wording of short texts. Considering the characteristics of short texts, this study proposes a classification algorithm based on the fusion of bidirectional encoder representations from Transformers (BERT) and a collapsed Gibbs sampling algorithm for the Dirichlet multinomial mixture model (GSDMM) and clustering guidance to improve the effectiveness and accuracy of short text classification. First, the model converts short texts into integrated semantic vectors by using the fusion model of BERT and GSDMM. The integrated vectors reflect global semantic features and topic features and solve the problems of sparse short text features and the lack of topic information. Then, the clustering guidance algorithm is introduced into the front-end training of the classifier, which realizes the expansion of the labeled data and improves the interpretability of the results. Finally, the expanded labeled data set is used to train the classifier to complete the automatic classification of short texts. Taking the negative comment of an e-commerce platform as the verification data set, this study verifies the effectiveness and advantages of the algorithm in short text classification in multiple groups of comparative experiments.
Keywords:GSDMM  bidirectional encoder representations from Transformers (BERT)  SVM  short text classification  clustering guidance  semantic vector
本文献已被 维普 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号