首页 | 本学科首页   官方微博 | 高级检索  
     

基于Att-iBi-LSTM的新闻主题词提取方法研究
引用本文:柴 悦,赵彤洲,江逸琪,高佩东. 基于Att-iBi-LSTM的新闻主题词提取方法研究[J]. 武汉工程大学学报, 2020, 42(5): 575-580. DOI: 10.19843/j.cnki.CN42-1779/TQ.202003021
作者姓名:柴 悦  赵彤洲  江逸琪  高佩东
作者单位:武汉工程大学计算机科学与工程学院,湖北 武汉 430205
摘    要:针对LSTM网络进行主题词提取时因没有考虑中心词的下文对主题词的影响而导致提取准确率低的问题,提出了一种双向LSTM引入Attention机制模型(Att-iBi-LSTM)的主题词提取方法。首先利用LSTM模型将中心词的上文和下文信息在两个方向上建模;然后在双向LSTM模型中引入注意力机制,为影响力更高的特征分配更高的权重;最后利用softmax层将文档中的词分为主题词或非主题词。并且还提出了一种两阶段模型训练方法,即在自动标注的训练集上进行预训练之后,再利用人工标注数据集训练模型。实验在体育、娱乐和科技3种新闻文本上进行主题词提取任务,实验结果表明本文提出的Att-iBi-LSTM模型与SVM、TextRank和LSTM相比F1值分别提高了13.78%、24.31%和3.32%,使用两阶段训练方法的Att-iBi-LSTM比一阶段训练的F1值提高了1.56%。

关 键 词:LSTM  Attention机制  主题词提取

Method for Extracting Topic Words of News Based on Att-iBi-LSTM Model
CHAI Yue,ZHAO Tongzhou,JIANG Yiqi,GAO Peidong. Method for Extracting Topic Words of News Based on Att-iBi-LSTM Model[J]. Journal of Wuhan Institute of Chemical Technology, 2020, 42(5): 575-580. DOI: 10.19843/j.cnki.CN42-1779/TQ.202003021
Authors:CHAI Yue  ZHAO Tongzhou  JIANG Yiqi  GAO Peidong
Affiliation:School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
Abstract:Aiming at the problem of low recognition accuracy of topic words extraction due to the lack of partial contextual information based on long short-term memory (LSTM), we presented a bi-directional LSTM network introduced attention mechanism model (Att-iBi-LSTM) for topic words extraction. First, the LSTM model was used to model the context information of the central word in two directions. Then, the attention mechanism was introduced to assign higher weight to the significant features. Finally, the words in the document were divided into topic words or non-topic words by using softmax layer. We also proposed a two-stage model training method, that is, after pre-training on the automatically labeled training set, the model is manually trained using the labeled data set. The topic words extraction task was performed on three types of news texts: sports news, entertainment news and scientific news. Experimental results show that the Att-iBi-LSTM model improves the F1-measure by 13.78%, 24.31% and 3.32% respectively compared with models support vector machine, TextRank, and LSTM. The F1-measure of Att-iBi-LSTM model is 1.56% higher than that of one-stage training.
Keywords:LSTM  Attention  topic words  extraction
本文献已被 CNKI 等数据库收录!
点击此处可从《武汉工程大学学报》浏览原始摘要信息
点击此处可从《武汉工程大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号