首页 | 本学科首页   官方微博 | 高级检索  
     

基于多任务学习的汉语基本篇章单元和主述位联合识别
引用本文:葛海柱,孔芳. 基于多任务学习的汉语基本篇章单元和主述位联合识别[J]. 中文信息学报, 2020, 34(1): 71-79
作者姓名:葛海柱  孔芳
作者单位:苏州大学 计算机科学与技术学院,江苏 苏州 215006
基金项目:国家自然科学基金(61876118,61751206)
摘    要:基本篇章单元(elementary discourse units,EDU)识别是构建篇章结构的基础,对篇章分析意义重大。从篇章衔接性视角来看,篇章话题结构理论认为,每个EDU都由要表达信息的起始点(主位)和传达的新信息(述位)两部分构成。因此,EDU识别与主述位识别任务的关系密切。基于此,该文给出了一个基于多任务学习的汉语基本篇章单元和主述位联合识别方法。该方法利用双向长短时记忆网络和图卷积网络对基本单元进行序列化和结构化拓扑信息的表征,再利用多任务学习框架让两个任务共享参数,借助不同任务间的相关性来提升模型的性能。实验结果表明,基于多任务学习的EDU和主述位识别性能均优于单任务学习模型中各自的性能,其中基本篇章单元识别的F1值达到91.90%,主述位识别的F1值达到85.65%。

关 键 词:多任务学习  基本篇章单元  主位  述位

Chinese Elementary Discourse Unit and Theme-Rheme JointDetection Based on Multi-task Learning
GE Haizhu,KONG Fang. Chinese Elementary Discourse Unit and Theme-Rheme JointDetection Based on Multi-task Learning[J]. Journal of Chinese Information Processing, 2020, 34(1): 71-79
Authors:GE Haizhu  KONG Fang
Affiliation:School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
Abstract:Elementary discourse unit (EDU) recognition is a fundamental task of discourse analysis. From the perspective of discourse cohesion, the theory of discourse topic structure deems that each EDU is closely related to rheme-theme recognition task. Inspired by this notation, this paper proposes a Chinese elementary discourse unit and theme-rheme joint detection method based on multi-task learning. This method applies BiLSTM and Graph Convolutional Networks to represent the EDU’s sequential and structured topological information, and improved the final performance by sharing the parameters of the two model via multi-task learning framework. The experimental results show that the performance of EDU and theme-rheme detection based on multi-task is better than that of the single-task learning model, with the F1-score of up to 91.90% and 85.65%, respectively.
Keywords:multi-task  elementary discourse unit  theme  rheme  
本文献已被 维普 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号