首页 | 本学科首页   官方微博 | 高级检索  
     

基于自注意力机制的中文标点符号预测模型
引用本文:段大高,梁少虎,赵振东,韩忠明. 基于自注意力机制的中文标点符号预测模型[J]. 计算机工程, 2020, 46(5): 291-297
作者姓名:段大高  梁少虎  赵振东  韩忠明
作者单位:北京工商大学计算机与信息工程学院,北京100048;北京工商大学计算机与信息工程学院,北京100048;北京工商大学计算机与信息工程学院,北京100048;北京工商大学计算机与信息工程学院,北京100048
基金项目:国家自然科学基金;教育部人文社会科学研究项目;北京市自然科学基金
摘    要:中文标点符号预测是自然语言处理的一项重要任务,能够帮助人们消除歧义,更准确地理解文本。为解决传统自注意力机制模型不能处理序列位置信息的问题,提出一种基于自注意力机制的中文标点符号预测模型。在自注意力机制的基础上堆叠多层Bi-LSTM网络,并结合词性与语法信息进行联合学习,完成标点符号预测。自注意力机制可以捕获任意两个词的关系而不依赖距离,同时词性和语法信息能够提升预测标点符号的正确率。在真实新闻数据集上的实验结果表明,该模型F1值达到85.63%,明显高于传统CRF、LSTM预测方法,可实现对中文标点符号的准确预测。

关 键 词:标点符号预测  自注意力机制  Bi-LSTM网络  深度神经网络  自然语言处理

Prediction Model of Chinese Punctuation Based on Self-Attention Mechanism
DUAN Dagao,LIANG Shaohu,ZHAO Zhendong,HAN Zhongming. Prediction Model of Chinese Punctuation Based on Self-Attention Mechanism[J]. Computer Engineering, 2020, 46(5): 291-297
Authors:DUAN Dagao  LIANG Shaohu  ZHAO Zhendong  HAN Zhongming
Affiliation:(School of Computer and Information Engineering,Beijing Technology and Business University,Beijing 100048,China)
Abstract:Chinese Punctuation Prediction(PP)is an important task of Natural Language Pprocessing(NLP),which can help people eliminate ambiguity and understand the text more accurately.In order to solve the problem that the self-attention mechanism cannot process sequence position information,this paper proposes a Chinese punctuation prediction model based on the self-attention mechanism.This model stacks multiple layers of Bi-directional Long Short-Term Memory(Bi-LSTM)network on the basis of self-attention mechanism,and combines the part of speech and grammar information for joint learning to complete the punctuation prediction.The self-attention mechanism can capture the relationship between any two words without relying on their distance,and the accuracy of predicted punctuation can be improved by part of speech and grammatical information.Experimental results on real news datasets show that the F1 value of the proposed model reaches 85.63%,which is significantly higher than traditional CRF and LSTM prediction methods,and achieves accurate prediction of Chinese punctuation.
Keywords:Punctuation Prediction(PP)  self-attention mechanism  Bi-LSTM network  Deep Neural Network(DNN)  Natural Language Processing(NLP)
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号