首页 | 本学科首页   官方微博 | 高级检索  
     

融合字注释的文本分类模型
引用本文:杨先凤,赵家和,李自强. 融合字注释的文本分类模型[J]. 计算机应用, 2022, 42(5): 1317-1323. DOI: 10.11772/j.issn.1001-9081.2021030489
作者姓名:杨先凤  赵家和  李自强
作者单位:西南石油大学 计算机科学学院,成都 610500
四川师范大学 影视与传媒学院,成都 610066
基金项目:国家自然科学基金资助项目(61802321);
摘    要:针对传统文本特征表示方法无法充分解决一词多义的问题,构建了一种融合字注释的文本分类模型。首先,借助现有中文字典,获取文本由字上下文选取的字典注释,并对其进行Transformer的双向编码器(BERT)编码来生成注释句向量;然后,将注释句向量与字嵌入向量融合作为输入层,并用来丰富输入文本的特征信息;最后,通过双向门控循环单元(BiGRU)学习文本的特征信息,并引入注意力机制突出关键特征向量。在公开数据集THUCNews和新浪微博情感分类数据集上进行的文本分类的实验结果表明,融合BERT字注释的文本分类模型相较未引入字注释的文本分类模型在性能上有显著提高,且在所有文本分类的实验模型中,所提出的BERT字注释_BiGRU_Attention模型有最高的精确率和召回率,能反映整体性能的F1-Score则分别高达98.16%和96.52%。

关 键 词:一词多义  字注释  基于Transformer的双向编码器  双向门控循环单元  注意力机制  文本分类  
收稿时间:2021-03-31
修稿时间:2021-07-08

Text classification model combining word annotations
Xianfeng YANG,Jiahe ZHAO,Ziqiang LI. Text classification model combining word annotations[J]. Journal of Computer Applications, 2022, 42(5): 1317-1323. DOI: 10.11772/j.issn.1001-9081.2021030489
Authors:Xianfeng YANG  Jiahe ZHAO  Ziqiang LI
Affiliation:School of Computer Science,Southwest Petroleum University,Chengdu Sichuan 610500,China
College of Movie and Media,Sichuan Normal University,Chengdu Sichuan 610066,China
Abstract:The traditional text feature representation method cannot fully solve the polysemy problem of word. In order to solve the problem, a new text classification model combining word annotations was proposed. Firstly, by using the existing Chinese dictionary, the dictionary annotations of the text selected by the word context were obtained, and the Bidirectional Encoder Representations from Transformers (BERT) encoding was performed on them to generate the annotated sentence vectors. Then, the annotated sentence vectors were integrated with the word embedding vectors as the input layer to enrich the characteristic information of the input text. Finally, the Bidirectional Gated Recurrent Unit (BiGRU) was used to learn the characteristic information of the input text, and the attention mechanism was introduced to highlight the key feature vectors. Experimental results of text classification on public THUCNews dataset and Sina weibo sentiment classification dataset show that, the text classification models combining BERT word annotations have significantly improved performance compared to the text classification models without combining word annotations, the proposed BERT word annotation _BiGRU_Attention model has the highest precision and recall in all the experimental models for text classification, and has the F1-Score of reflecting the overall performance up to 98.16% and 96.52% respectively.
Keywords:polysemy  word annotation  Bidirectional Encoder Representations from Transformers (BERT)  Bidirectional Gated Recurrent Unit (BiGRU)  attention mechanism  text classification  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号