首页 | 本学科首页   官方微博 | 高级检索  
     

针对中文分词的带标签注意力的成词记忆网络
引用本文:韩士洋,马致远,杨芳艳,李想,汪伟.针对中文分词的带标签注意力的成词记忆网络[J].计算机应用研究,2022,39(6).
作者姓名:韩士洋  马致远  杨芳艳  李想  汪伟
作者单位:上海理工大学,上海理工大学,上海理工大学,上海理工大学,上海理工大学
基金项目:南京大学计算机软件新技术国家重点实验室开放课题项目(KFKT2021B39)
摘    要:成词信息是一种对中文分词任务十分重要的文本特征。最新中文分词模型之一的WMSEG就是通过引入成词信息来获得最顶尖的分词性能。然而这类模型在建模时并未考虑标签之间的依赖关系,导致其分词性能特别是对未登录词的识别有所欠缺。针对这一问题,通过在学习过程中引入标签嵌入的注意力机制,提出了一种带标签注意力的成词记忆网络来增强标签之间的依赖关系以及标签和字符之间的相关性。实验结果表明,该模型在四个常用数据集上都取得了不弱于WMSEG的分词性能,同时提高了对未登录词的识别能力。

关 键 词:成词信息    中文分词    标签嵌入    注意力机制    未登录词
收稿时间:2021/11/15 0:00:00
修稿时间:2022/5/19 0:00:00

Wordhood memory networks with label attention for Chinese word segmentation
hanshiyang,mazhiyuan,yangfangyan,lixiang and wangwei.Wordhood memory networks with label attention for Chinese word segmentation[J].Application Research of Computers,2022,39(6).
Authors:hanshiyang  mazhiyuan  yangfangyan  lixiang and wangwei
Affiliation:University of Shanghai for Science & Technology,,,,
Abstract:Wordhood information is an extremely important contextual feature for Chinese word segmentation, and as one of the newest segmentation models, WMSEG obtains the state-of-the-art segmentation performance by incorporating the wordhood information. However, the model does not consider the label dependencies in modeling, which leads to the dissatisfactory segmentation performance, especially the recognition of out-of-vocabulary words. Aiming for the issue, this paper introduced an attention mechanism with label embedding in the learning process, and proposed a wordhood memory networks with label attention to enhance the label dependencies and the correlations between labels and characters. The experimental results show that the model achieves equivalent if not better performance than WMSEG on four widely used datasets, and improves the recognition ability of out-of-vocabulary words.
Keywords:wordhood information  Chinese word segmentation  label embedding  attention mechanism  out-of-vocabulary words
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号