首页 | 本学科首页   官方微博 | 高级检索  
     

基于声学音素向量和孪生网络的二语者发音偏误确认
引用本文:王振宇,解焱陆,张劲松.基于声学音素向量和孪生网络的二语者发音偏误确认[J].中文信息学报,2019,33(4):127-134.
作者姓名:王振宇  解焱陆  张劲松
作者单位:北京语言大学 语言资源高精尖创新中心,北京 100083
基金项目:国家社会科学基金(18BYY124);语言资源高精尖创新中心项目(KYR17005);国家语委科研项目(ZDI135-51),北京语言大学梧桐创新平台项目(中央高校基本科研业务费专项资金)(16PT05)(18YJ030004);北京语言大学研究生创新基金(17YCX139)
摘    要:随着自动大规模语音识别的不断发展,以自动语音识别为基础的计算机辅助发音教学也随之进步,作为传统教学方法的补充,它极大地弥补了传统教育资源不足以及传统教育方法无法及时给学习者反馈的缺陷。二语学习者的发音偏误确认和评价在计算机辅助发音训练中是较为重要的研究课题之一。针对二语者发音偏误的确认任务中缺少二语偏误发音标注问题,该文提出了一种基于声学音素向量和孪生网络的方法,将带有配对信息的成对的语音特征作为系统输入,通过神经网络将语音特征映射到高层表示,期望将不同的音素区分开。训练过程引入了孪生网络,依照输出的两个音素向量是否来自于同一类音素来调整和优化输出向量之间的距离,并通过相应的损失函数实现优化过程。结果表明使用基于余弦最大间隔距离损失函数的孪生网络获得了89.93%的准确率,优于实验中其它方法。此方法应用在发音偏误确认任务时,不使用标注的二语发音偏误数据训练的情况下,也获得了89.19%的诊断正确率。

关 键 词:发音偏误确认  音素向量  孪生网络

Non-native Mispronunciation Verification Using Acoustic Tonal Phone Embedding and Siamese Networks
WANG Zhenyu,XIE Yanlu,ZHANG Jinsong.Non-native Mispronunciation Verification Using Acoustic Tonal Phone Embedding and Siamese Networks[J].Journal of Chinese Information Processing,2019,33(4):127-134.
Authors:WANG Zhenyu  XIE Yanlu  ZHANG Jinsong
Affiliation:Beijing Advanced Innovation Center for Language Resources, Beijing Language and Culture University, Beijing 100083, China
Abstract:With the continuous development of automatic speech recognition, the pronunciation errors verification and evaluation of second language (L2) learners has become one of the most important research topics in computer assisted pronunciation training. To deal with the lack of labeled mispronunciation speech data, a method based on acoustic phone embedding and Siamese network is proposed in this paper. A pair of acoustic phone segments with a pair-wise label is used as a system input, and speech features are mapped to high level representation through neural network to differentiate different types of phones. The Siamese network is optimized by tell whether two output embeddings are from same type of phones or not. Results show that accuracy of Siamese network based on cosine hinge loss function achieves the best accuracy of 89.93%, and accuracy of diagnosis is 89.19% in pronunciation error verification task.
Keywords:mispronunciations verification  phone embedding  Siamese networks  
本文献已被 维普 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号