首页 | 本学科首页   官方微博 | 高级检索  
     

发音错误检测中基于多数据流的Tandem特征方法
引用本文:袁 桦,蔡 猛,赵军红,张卫强,刘 加.发音错误检测中基于多数据流的Tandem特征方法[J].计算机应用,2014,34(6):1694-1698.
作者姓名:袁 桦  蔡 猛  赵军红  张卫强  刘 加
作者单位:1. 清华大学 电子工程系,北京100084; 2. 清华信息科学与技术国家实验室(清华大学),北京 100084; 3. 中国科学院 电子学研究所,北京100190; 4. 中国科学院大学,北京100190 5. 传感技术国家重点实验室(中国科学院),北京100190;
基金项目:国家自然科学基金资助项目
摘    要:针对发音错误检测中标注的发音数据资源有限的情况,提出在Tandem系统框架下利用其他数据来提高特征的区分性。以中国人的英语发音为研究对象,选取了相对容易获取的无校正发音数据、母语普通话和母语英语作为辅助数据,实验结果表明,这几种数据都能够有效地提高系统性能,其中无校正数据表现出最好的性能。同时,比较了不同的扩展帧长,以多层神经感知(MLP)和深度神经网络(DNN)作为典型的浅层和深层神经网络,以及Tandem特征的不同结构对系统性能的影响。最后,多数据流融合的策略用于进一步提高系统性能,基于DNN的无校正发音数据流和母语英语数据流合并的Tandem特征取得了最好的性能,与基线系统相比,识别正确率提高了7.96%,错误类型诊断正确率提高了14.71%。

关 键 词:发音错误检测  Tandem特征  发音规则  深度神经网络(DNN)  多层神经感知(MLP)
收稿时间:2013-12-16
修稿时间:2014-01-21

Multi-stream based Tandem feature method for mispronunciation detection
YUAN Hua CAI Meng ZHAO Hongjun ZHANG Weiqiang LIU Jia.Multi-stream based Tandem feature method for mispronunciation detection[J].journal of Computer Applications,2014,34(6):1694-1698.
Authors:YUAN Hua CAI Meng ZHAO Hongjun ZHANG Weiqiang LIU Jia
Affiliation:1. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;
2. Tsinghua National Laboratory for Information Science and Technology (Tsinghua University), Beijing 100084, China;
3. University of Chinese Academy of Sciences, Beijing 100190, China
4. Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China;
5. State Key Laboratory of Transducer Technology (Chinese Academy of Sciences), Beijing 100190, China;
Abstract:To deal with the under-resourced labeled pronunciation data in mispronunciation detection, some other data were used to improve the discriminability of feature in the framework of Tandem system. Taking Chinese learning of English as object, unlabeled data, native Mandarin data and native English data which can be relatively easily accessed were selected as the assisted data. The experiments show that these types of data can effectively improve the performance of system, and the unlabeled data performs the best. And the effect to system performance was discussed with different length of frame context, the shallow and deep neural network typically represented by Multi-Layer Perception (MLP) and Deep Neural Network (DNN), and different structure of Tandem feature. Finally the strategy of merging multiple data streams was used to further improve the system performance, and the best system performance was achieved by combining the DNN based unlabeled data stream and native English stream. Compared with the baseline system, the recognition accuracy is increased by 7.96%, and the diagnostic accuracy of mispronunciation type is increased by 14.71%.
Keywords:
本文献已被 CNKI 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号