首页 | 本学科首页   官方微博 | 高级检索  
     

融合角色、结构和语义的口语对话预训练语言模型
引用本文:黄健,李锋.融合角色、结构和语义的口语对话预训练语言模型[J].计算机应用研究,2022,39(8).
作者姓名:黄健  李锋
作者单位:上海浦东发展银行股份有限公司 创新实验室,上海浦东发展银行股份有限公司 创新实验室
摘    要:口语语言理解是任务式对话系统的重要组件,预训练语言模型在口语语言理解中取得了重要突破,然而这些预训练语言模型大多是基于大规模书面文本语料。考虑到口语与书面语在结构、使用条件和表达方式上的明显差异,构建了大规模、双角色、多轮次、口语对话语料,并提出融合角色、结构和语义的四个自监督预训练任务:全词掩码、角色预测、话语内部反转预测和轮次间互换预测,通过多任务联合训练面向口语的预训练语言模型SPD-BERT(SPoken Dialog-BERT)。在金融领域智能客服场景的三个人工标注数据集——意图识别、实体识别和拼音纠错上进行详细的实验测试,实验结果验证了该语言模型的有效性。

关 键 词:对话系统    口语语言理解    预训练语言模型    意图识别    实体识别
收稿时间:2022/1/5 0:00:00
修稿时间:2022/7/18 0:00:00

SPD-BERT: a role, structure and semantic based pre-trained spoken dialog language model
Huang Jian and Li Feng.SPD-BERT: a role, structure and semantic based pre-trained spoken dialog language model[J].Application Research of Computers,2022,39(8).
Authors:Huang Jian and Li Feng
Affiliation:Shanghai Pudong Development Bank,Innovation Lab,Shanghai,200001,
Abstract:Spoken language understanding(SLU) is an important component of dialog system. Recently, pre-trained language model has made breakthrough in various tasks of spoken language understanding. However, these language models are trained with large-scale written language, which are quite different from spoken language in structure, condition and expression pattern. This paper constructed large-scale multi-turn bi-role spoken dialog corpus. Then it proposed four self-supervised pre-trained tasks: masked language model, role prediction, intra-query reverse prediction and inter-query exchange prediction. A BERT-based spoken dialog language model(SPD-BERT) was pre-trained through multi-task learning. Finally, the model was tested with three typical tasks of intelligent customer service in finance domain. The experiment results demonstrate the effectiveness of proposed model.
Keywords:dialog systems  spoken language understanding  pre-trained language model  intent detection  named entity recognition
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号