首页 | 本学科首页   官方微博 | 高级检索  
     

基于双向LSTM神经网络模型的中文分词
引用本文:金宸,李维华,姬晨,金绪泽,郭延哺. 基于双向LSTM神经网络模型的中文分词[J]. 中文信息学报, 2018, 32(2): 29-37
作者姓名:金宸  李维华  姬晨  金绪泽  郭延哺
作者单位:1.云南大学 信息学院,云南 昆明 650503;
2.河南师范大学 教育学院,河南 新乡 453007
基金项目:国家自然科学基金(11661081)
摘    要:中文分词是中文自然语言处理的基础。分词质量的好坏直接影响之后的自然语言处理任务。目前主流的分词是基于传统的机器学习模型。近年来,随着人工智能大潮的又一次兴起,长短期记忆(LSTM)神经网络模型改进了普通循环神经网络模型无法长期依赖信息的缺点,被广泛应用于自然语言处理的各种任务中,并取得了不错的效果。对中文分词,该文在经典单向LSTM模型上进行改进,增加了自后向前的LSTM层,设计了双向LSTM模型,改进了单向LSTM对后文依赖性不足的缺点;并引入了贡献率α,对前传LSTM层和后传LSTM层的权重矩阵进行调节,并设计了四个实验,验证了所建模型的正确性和优越性。

关 键 词:中文分词  自然语言处理  双向LSTM  贡献率  

Bi-directional Long Short-term Memory Neural Networks for Chinese Word Segmentation
JIN Chen,LI Weihua,JI Chen,JIN Xuze,GUO Yanbu. Bi-directional Long Short-term Memory Neural Networks for Chinese Word Segmentation[J]. Journal of Chinese Information Processing, 2018, 32(2): 29-37
Authors:JIN Chen  LI Weihua  JI Chen  JIN Xuze  GUO Yanbu
Affiliation:1.Science and Engineering Department of YunnanUniversity, Kunming, Yunnan 650503, China;
2.Education Department of Henan Normal University, Xinxiang, Henan 453007, China
Abstract:Chinese word segmentation (CWS) is a fundamental issue of Chinese language processing (NLP). which affects the subsequent NLP tasks substantially. At present, the state-of-the-art solution is based on the classical machine learning model. Recently, Long Short-term Memory (LSTM) model has been proposed to solve the long-term dependencies in classical RNN model, and already well daapted in various kinds of NLP tasks. As for CWS task, we add a layer of backward LSTM based on unidirectional classical LSTM to build a Bi-directional Long Short-term Memory Neural Network model (Bi-LSTM). And we also propose a contribution rate to balance the matrix’s value in forward LSTM layer and backward LSTM layer. We design four experiments to demonstrate that our model is reliable and preferable.
Keywords:CWS    NLP    Bi-LSTM    contribution rate  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号